Thread: Creating schema best practices
Dear list,
I’m new to PostgreSQL, planning now a migration to PostgreSQL and would appreciate your help.
One aspect of the migration is re-thinking our DB structure.
The application considered contains several modules (let’s say ten), each one uses and manages a small number of tables (maximum 10 tables per module). Today all tables are located on the same DB, which makes management a bit uncomfortable. What comes to mind is grouping each module’s tables on a separate schema. From you experience, is there any performance impact for grouping tables into schemas? In general, what is the best practice for grouping tables in schemas vs. locating several tables (that might be logically separated) into the same schema? Is there any advantage \ disadvantage of using schemas vs naming standards that includes prefix for each module’s tables?
In the considered application there are no name duplications among tables. In addition, there are there are no queries that involve tables managed by different modules. In addition, since all modules are owned by the same application, currently there is no interest in limiting access for tables (it is all or nothing).
Thanks in advance,
Hava
Dear list,
I’m new to PostgreSQL, planning now a migration to PostgreSQL and would appreciate your help.
One aspect of the migration is re-thinking our DB structure.
The application considered contains several modules (let’s say ten), each one uses and manages a small number of tables (maximum 10 tables per module). Today all tables are located on the same DB, which makes management a bit uncomfortable. What comes to mind is grouping each module’s tables on a separate schema. From you experience, is there any performance impact for grouping tables into schemas? In general, what is the best practice for grouping tables in schemas vs. locating several tables (that might be logically separated) into the same schema? Is there any advantage \ disadvantage of using schemas vs naming standards that includes prefix for each module’s tables?
In the considered application there are no name duplications among tables. In addition, there are there are no queries that involve tables managed by different modules. In addition, since all modules are owned by the same application, currently there is no interest in limiting access for tables (it is all or nothing).
Since you have no security needs and no naming conflicts, the answer to your question is that it's purely for your convenience. Schemas are primarily useful for separating namespaces and for security. Separating your modules into schemas will have no impact on performance.
pg_dump and pg_restore have support for single-schema dumps, which might make your backup procedures easier.
If I were in your shoes, I'd use schemas just to make management easier. Renaming all of your tables with a module prefix is a waste of time -- that's what schemas are for.
Craig
Thanks Craig for the useful information.
On the same regard – Some of the mentioned modules in the mentioned application use a set of tables which is logically separate (there are no join statements with tables of other modules). What are the pros\cons of using a separate database instead of a separate schema for maintaining such tables?
I understand that resources are shared among multiple databases on the same cluster, so in terms of performance, are there resources that are dedicated for each database and would benefit performance?
I’d appreciate a best practice also regarding to using database vs schema.
Hava
From: Craig James [mailto:cjames@emolecules.com]
Sent: יום ד 03 אוקטובר 2012 19:00
To: Babay Adi, Hava
Cc: pgsql-admin@postgresql.org
Subject: Re: [ADMIN] Creating schema best practices
On Tue, Oct 2, 2012 at 11:54 AM, Babay Adi, Hava <hava.babay@hp.com> wrote:
Dear list,
I’m new to PostgreSQL, planning now a migration to PostgreSQL and would appreciate your help.
One aspect of the migration is re-thinking our DB structure.
The application considered contains several modules (let’s say ten), each one uses and manages a small number of tables (maximum 10 tables per module). Today all tables are located on the same DB, which makes management a bit uncomfortable. What comes to mind is grouping each module’s tables on a separate schema. From you experience, is there any performance impact for grouping tables into schemas? In general, what is the best practice for grouping tables in schemas vs. locating several tables (that might be logically separated) into the same schema? Is there any advantage \ disadvantage of using schemas vs naming standards that includes prefix for each module’s tables?
In the considered application there are no name duplications among tables. In addition, there are there are no queries that involve tables managed by different modules. In addition, since all modules are owned by the same application, currently there is no interest in limiting access for tables (it is all or nothing).
Since you have no security needs and no naming conflicts, the answer to your question is that it's purely for your convenience. Schemas are primarily useful for separating namespaces and for security. Separating your modules into schemas will have no impact on performance.
pg_dump and pg_restore have support for single-schema dumps, which might make your backup procedures easier.
If I were in your shoes, I'd use schemas just to make management easier. Renaming all of your tables with a module prefix is a waste of time -- that's what schemas are for.
Craig
Thanks Craig for the useful information.
On the same regard – Some of the mentioned modules in the mentioned application use a set of tables which is logically separate (there are no join statements with tables of other modules). What are the pros\cons of using a separate database instead of a separate schema for maintaining such tables?
I understand that resources are shared among multiple databases on the same cluster, so in terms of performance, are there resources that are dedicated for each database and would benefit performance?
I’d appreciate a best practice also regarding to using database vs schema.
Regarding multiple databases: it depends entirely on your needs. If you separate your table into two databases, then your application will have to make two connections rather than one. That might be a performance issue depending on how many connections per second you get.
When you do backups, you'll have to do two instead of one. It's hard to see why two databases would be better than one in your case.
Everything (database, schema, table, metadata, ....) is managed by the same database cluster, so there's no performance advantage to building separate databases. If you have several file systems on separate disks, you can improve performance by using them, but you don't need separate databases for that. You can create tablespaces and use that to assign tables or schemas to a particular file system.
Craig
* Babay Adi, Hava wrote: > On the same regard – Some of the mentioned modules in the mentioned > application use a set of tables which is logically separate (there are > no join statements with tables of other modules). What are the pros\cons > of using a separate database instead of a separate schema for > maintaining such tables? For one thing, with multiple databases you cannot easily take consistent backups using pg_dump, because a transaction is local to a single database. -- Christian
Thanks again for the helpful information. Another question on the same regard – how commonly used are PostgreSQL schemas? I want to ensure that I choose a solution that is common and wildly used.
From: Craig James [mailto:cjames@emolecules.com]
Sent: יום ד 03 אוקטובר 2012 22:49
To: Babay Adi, Hava
Cc: pgsql-admin@postgresql.org
Subject: Re: [ADMIN] Creating schema best practices
On Wed, Oct 3, 2012 at 10:58 AM, Babay Adi, Hava <hava.babay@hp.com> wrote:
Thanks Craig for the useful information.
On the same regard – Some of the mentioned modules in the mentioned application use a set of tables which is logically separate (there are no join statements with tables of other modules). What are the pros\cons of using a separate database instead of a separate schema for maintaining such tables?
I understand that resources are shared among multiple databases on the same cluster, so in terms of performance, are there resources that are dedicated for each database and would benefit performance?
I’d appreciate a best practice also regarding to using database vs schema.
Best practice is more about opinion than anything else.
Regarding multiple databases: it depends entirely on your needs. If you separate your table into two databases, then your application will have to make two connections rather than one. That might be a performance issue depending on how many connections per second you get.
When you do backups, you'll have to do two instead of one. It's hard to see why two databases would be better than one in your case.
Everything (database, schema, table, metadata, ....) is managed by the same database cluster, so there's no performance advantage to building separate databases. If you have several file systems on separate disks, you can improve performance by using them, but you don't need separate databases for that. You can create tablespaces and use that to assign tables or schemas to a particular file system.
Craig