PostgreSQL Schemas -- Usage Scenario/Case_问答_开发者

I apologize for my newbie like question but I am a newbie to PostgreSQl and schemas. I have a hard time grasping the purpose of schemas in PostgreSQL.. and in general. What are the potential use-cases for schemas?

The book I have states that schemas are like directories that can't be nested. OK, so I take it schemas are a means to group tables in a db. The book does not address how this is implemented, nor does it mention the potential uses except for one trivial example (next paragraph).

A Benefit & Potential use case #1

Thus far I have only understood one (seemingly trivial) benefit to schemas. This benefit being that you can have multiple tables with the same name and as long as each such table is in a different schema, there will be no conflict because the namespace qualifier (the schema name) can be used to address the particular table desired.

I don't really understand why you would have multiple tables of the same name to begin with. I'd think this is an extremely rare case, yet the docs come accross to me as if schemas should be used by everyone. Having multiple tables of the same name seems like poor project management to me and allowing for such bad practice doesn't seem like value. In the end you are just allowing for a mess to be made.

The only scenario I can think of where you could have name conflicts of tables which should be allowed is when you have a class that is learning SQL and the school IT admin wants each student to be able to create tables of their liking. Of course, the question in my mind is why doesn't the admin just create 1 db per student rather than 1 db for all and 1 schema for each student? Q1: Why?

What are some other use cases that show the benefit of schemas?

IMPLEMENTATION - OR - NATURE OF

I'm also confused about the implementation/nature of schemas.

Let us assume that we have 1 DB which has 3 schemas. 1 schema per user as such:

user1='admin' -- tableA, tableU, tableJ, tableK,

user2='joe' ------ tableU, tableJ,

user3='kate ----- tableU, tableK.

In the above example, my intention is for tableU to be shared between the users (accross the schema joe & schema kate). They shouldn't get their own stand-alone copy of the table, they should share common access to this table. This type of usage makes sense to me. Users should be adding/modding/potentially removing records... not adding/modding/removing tables. I am not sure however that this is possible with PostgreSQL schemas. Q2: If I wanted to share tableU as I described above, would schema joe & schema kate each get thair own copy of the table or can I specify that they should not get their own copy and instead just share access to an existing table?

tableJ is the same as tableK... except that Kate can't use tableJ and joe can't use tableK. This seems to be the essence of schemas to me according to my limited understanding of schemas. Q3: What is the purpose of making a开发者_开发技巧 copy of a table for each user? The 2 tables have the same structure (columns and constraints) and it is a waste of storage space to make a stand alone copy of such a table for each user. I'd also think that it would be nightmare if each user had their own copy of every table. We'd be throwing key concepts like normalization out the window. It would be an unmaintainable mess. In my mind each user should be adding/modifying/ removing records to a common table in a common DB.

I've seen people in my Google searches about schemas make a separate schema+tables for each online customer to their website. They have like 100,000 schemas. Q4: What am I missing here? This seems extreme to say the least. They should be adding record(s) to standard table(s) for each customer not making schemas and tables for each customer. This only adds to my confusion.

Anyway, I hope I've stated the key points of my confusion clearly enough.

What I'm looking for is:

(1) To clear up the benefits of schemas via realistic use case examples.

(2) To clear up the implementation detail of schemas.

EDIT v.3

Potential use case #2

If I understood Neville K correctly, he suggest as a use case:

1 DB which has 3 schemas. 1 schema per user as such (username==schema_name in ex.):

user1='admin' -- tableA, tableLogin, tblFin1, tblFin2, tblFin3, tblPrj1, tblPrj2, tblPrj3.

user2='joe' ------ tableLogin, tblFin1, tblFin2, tblFin3

user3='kate ----- tableLogin, tblPrj1, tblPrj2, tblPrj3.

Here, joe is in Fin department, and Kate is project manager. The app that joe uses is restricted to Finance related tables, the application that kate uses is restricted to project management tables. This restriction is enforced via their user name being tied to a schema which is tied to a search path (enforced at DB level).

Q5: Wouldn't the same restriction be in place without schemas? What tables are available to which application is a function of what tables were set to be used by the application at the time the application was made by the dev team. No? Or are we assuming that we are using off the shelf applications that you point to a db and we are concerned that the application can not in its settings be restricted to certain tables (and this restriction locked with a password within the off-the-shelf application)?

Potential use case #3

If I understood Catcall correctly, he suggest as a use case:

A scenario where you wish to rent/lease the services of a database server hosted on 1 physical system to multiple customers needing database service. As such, a multi-tenant like scenario arises. At this point you can choose to have:

(1) A separate database for each tenant (customer)

-More secure and convenient but can support less tenants per system.

(2) One shared database with a schema for each tenant (customer)

-Less secure and slightly less convenient but can support more tenants per system.

Potential use case #4

If I understood Scott Marlowe and Catcall correctly, another use case would be:

Using schemas to isolate the new content of developers during development. Later can merge work to another schema.

I've seen people in my Google searches about schemas make a separate schema+tables for each online customer to their website. They have like 100,000 schemas. Q3: What am I missing here? This seems extreme to say the least. They should be adding record(s) to standard table(s) for each customer not making schemas and tables for each customer.

One database per tenant (customer) is easy to build, and it gives you the strongest isolation between tenants, and the simplest disaster recovery. But it's relatively expensive.

One schema per tenant is also easy to build. There's a smaller degree of isolation between tenants. A dbms can support more tenants per server with one schema per tenant than with one database per tenant. Disaster recovery for one tenant is more complicated than with one database per tenant.

A shared schema requires every row to have a tenant identifier. Hardware and backup are cheaper; disaster recovery for one tenant can be a real bitch. (To recover data for a single tenant, you have to recover some rows in every table. Performance suffers for all tenants when that happens.) Isolation is trickier. Since tenants share tables, making sure no tenant can access other tenants data is a lot harder than with one database or one schema per tenant.

The search term for this stuff is "multi-tenant database design". SO also has a multi-tenant tag.

Another common use is to group database objects that belong together. For example, if you were developing an accounting database, all the objects that implement "accounts payable" features might go in the "ap" schema. I use schemas for PostgreSQL extensions, too. In my db, I installed the hstore extension in the "hst" schema, the tablefunc extension in the "tbf" schema, etc.

Here's another common usage I've seen. Schemas and search_path allow two or more developers to work on the same single copy of a large db without needing their own copies, but without getting in each other's way.

Let's say Joe is working on the comments table, and Jim is working on the workflow table. They both need the users, groups, etc tables. Joe creates a schema, creates his own working version of the comments table, and can muck about with it at will:

create schema joe;
create table joe.comments (rest of new tabledef here);
set search_path='joe','public';

Now when Joe does alter table on the joe.comments table, Jim sees no changes and his development is not impacted by a broken joe.comments table etc. When Joe's code, having a search_path of 'joe','public' runs, it sees joe's comments tables while everyone else sees the original.

After thorough testing, Joe's comments table can be basically put in place of the original in one fell swoop with no interruption to Jim's development. Multiply this time 40 developers and it allows all 40 to work on the same dev db without blowing each other's stuff up (or less often at least).

Neville K's answer captures the essence but is perhaps a bit brief.

Schemas are essentially namespaces. They're useful in the same kinds of situations that namespaces are useful in programming: where you have many things, so many that you want to partition them into separate sub-collections (many -- but fewer than (say) 10,000 such sub-collections, given that there's a single level), while being able to inter-operate among them. Using schemas can allow more 'natural' naming standards for other database objects.

As an aside, the value of namespaces isn't appreciated by new programmers either. That seemingly trivial benefit of allowing multiple objects to have the same name isn't so trivial, it turns out. It's only when one has worked for a while on larger projects (with thousands of 'code objects': tables, views, indices, stored procedures, domains, etc.) that one comes to understand that the benefits of namespaces outweigh their costs.

In an earlier age (and not with PostgreSQL), I worked for a computer bureau with about 20 customers, ranging from 250 concurrent users of the DBMS down to one, each via private leased telephone lines. (This was before the internet.) Each customer had its own schema, with an admin user (role) that could create and drop other users as employees came and went, grant and revoke privileges, and do a limited amount of data definition work and import/export in their own schema. As a developer I had to use schemas because there was only one DBMS instance and one database. So... if the above is not convincing, you could say that schemas are in the SQL standard (for historical reasons), therefore PostgreSQL has support for schemas.

Today, a somewhat similar situation to mine might arise in a lab environment, where several researchers (or hundreds of students) each want to work on their own data while having access to a common set in the public schema. Using schemas helps stop accidents: inadvertently trampling on each others' data.

The key benefit for schemas is to provide a logical grouping for your database tables. The most likely use cases are:

To run different logical applications within the same DB - for instance, you might have an enterprise system, and want to create schemas called "userprofiles", "projects", "finance" etc.
To create similar versions of the same group of tables, for instance for "development", "qa" and "production"