Thread: Postgres architecture for multiple instances
Howdy, I am looking for advice on migrating to postgres from another database system. Without going into too much detail, my company offers a software solution which we self host ourselves in our data center. We have gotten a green light from management to start using postgres as a free and cheaper database. Normally the application connects to a DB2 database. The current architecture has multiple DB2 databases installed on a single linux host (on top of vmware with 4 vcpus). Each DB2 instance runs as a different local linux user and only manages one database. Normally there are less than 5 DB2 instances per host. My question is, if I am going to replace DB2 as the database with Postgres, should I keep to this architecture of each customer application gets their own database instance? What I was thinking would be much more simpler would be to have multiple databases under one postgres instance (which is how most linux distros install postgres anyway). Having multiple databases under one instance seems to be a much easier way. Of course then the issue becomes if there is an issue with this single instance, multiple customers will be affected. For the most part, I am completely on the fence about deciding either way. What does the postgres community recommend in this situation? All docs and books that I read seem to suggest that a postgres instance manages databases (plural!) and normally there would be only one postgres instance running per host. I am also interested in the best way forward with failover and high availability. Looking forward to comments, Sam
On 02/21/2015 02:01 PM, Samuel Smith wrote: > Howdy, > I am looking for advice on migrating to postgres from another database > system. > > Without going into too much detail, my company offers a software > solution which we self host ourselves in our data center. We have gotten > a green light from management to start using postgres as a free and > cheaper database. Normally the application connects to a DB2 database. > > The current architecture has multiple DB2 databases installed on a > single linux host (on top of vmware with 4 vcpus). Each DB2 instance > runs as a different local linux user and only manages one database. > Normally there are less than 5 DB2 instances per host. > > My question is, if I am going to replace DB2 as the database with > Postgres, should I keep to this architecture of each customer > application gets their own database instance? What I was thinking would > be much more simpler would be to have multiple databases under one > postgres instance (which is how most linux distros install postgres > anyway). Having multiple databases under one instance seems to be a much > easier way. Of course then the issue becomes if there is an issue with > this single instance, multiple customers will be affected. The issue that has come up in previous discussions on this subject is that there is global meta data available to all users in a Postgres cluster(instance), in particular the database users. So while it is possible to restrict access to the database itself, it is more difficult if not impossible to restrict access to information about the other databases in the cluster. > > For the most part, I am completely on the fence about deciding either > way. What does the postgres community recommend in this situation? All > docs and books that I read seem to suggest that a postgres instance > manages databases (plural!) and normally there would be only one > postgres instance running per host. > > I am also interested in the best way forward with failover and high > availability. Currently the built in replication solutions work at the cluster level, not at the database level. There are third party tools, Slony and Bucardo come to mind, that can work at a more focused level. > > Looking forward to comments, > Sam > > -- Adrian Klaver adrian.klaver@aklaver.com
On 2/21/15 6:08 PM, Adrian Klaver wrote: > On 02/21/2015 02:01 PM, Samuel Smith wrote: >> Howdy, >> I am looking for advice on migrating to postgres from another database >> system. >> >> Without going into too much detail, my company offers a software >> solution which we self host ourselves in our data center. We have gotten >> a green light from management to start using postgres as a free and >> cheaper database. Normally the application connects to a DB2 database. >> >> The current architecture has multiple DB2 databases installed on a >> single linux host (on top of vmware with 4 vcpus). Each DB2 instance >> runs as a different local linux user and only manages one database. >> Normally there are less than 5 DB2 instances per host. >> >> My question is, if I am going to replace DB2 as the database with >> Postgres, should I keep to this architecture of each customer >> application gets their own database instance? What I was thinking would >> be much more simpler would be to have multiple databases under one >> postgres instance (which is how most linux distros install postgres >> anyway). Having multiple databases under one instance seems to be a much >> easier way. Of course then the issue becomes if there is an issue with >> this single instance, multiple customers will be affected. Certainly more than one customer may be affected by an outage, but there are also fewer instances to be managed. This can be particularly important for upgrades - fewer systems to upgrade can be a good thing. > The issue that has come up in previous discussions on this subject is > that there is global meta data available to all users in a Postgres > cluster(instance), in particular the database users. So while it is > possible to restrict access to the database itself, it is more difficult > if not impossible to restrict access to information about the other > databases in the cluster. This is only a problem if the customer has direct access to the database. If they are connecting through a software layer then multi-tenant solutions are practical. >> For the most part, I am completely on the fence about deciding either >> way. What does the postgres community recommend in this situation? All >> docs and books that I read seem to suggest that a postgres instance >> manages databases (plural!) and normally there would be only one >> postgres instance running per host. >> >> I am also interested in the best way forward with failover and high >> availability. > > Currently the built in replication solutions work at the cluster level, > not at the database level. There are third party tools, Slony and > Bucardo come to mind, that can work at a more focused level. Again, it depends in what kind of access your customers have to the database. Logical backups can be done on a per-database basis (and are very practical for small datasets), but physical backups (which are more efficient) must backup the entire cluster (except for the third-party solutions noted above that have their own challenges). You said you don't want to give a lot of information, but the optimal solution depends on how your customers connect to their database: 1) Entirely through an application or API 2) Directly to the db via psql, PgAdmin, JDBC, etc. Without that information it's hard to give really solid advice. -- - David Steele david@pgmasters.net
Attachment
Il giorno sab, 21/02/2015 alle 16.01 -0600, Samuel Smith ha scritto: > Howdy, > I am looking for advice on migrating to postgres from another database > system. [...] People already wrote you some comments, here are two more. DB2 instances run as different OS users, so if you need the same approach for security reasons, you will need to create different postgresql clusters. Another important fact is about large objects, if you happen to use them: their OID is not just unique to the database, but to the whole cluster. This means that when you move a database in a cluster from a production system to a database on a test cluster, you may get errors when same OID already exists in target cluster (even if it is used in a different database). Bye, Giuseppe
>Howdy, >I am looking for advice on migrating to postgres from another database >system. > >Without going into too much detail, my company offers a software >solution which we self host ourselves in our data center. We have gotten >a green light from management to start using postgres as a free and >cheaper database. Normally the application connects to a DB2 database. > >The current architecture has multiple DB2 databases installed on a >single linux host (on top of vmware with 4 vcpus). Each DB2 instance >runs as a different local linux user and only manages one database. >Normally there are less than 5 DB2 instances per host. > >My question is, if I am going to replace DB2 as the database with >Postgres, should I keep to this architecture of each customer >application gets their own database instance? What I was thinking would >be much more simpler would be to have multiple databases under one >postgres instance (which is how most linux distros install postgres >anyway). Having multiple databases under one instance seems to be a much >easier way. Of course then the issue becomes if there is an issue with >this single instance, multiple customers will be affected. Hello, Another possible approach is to have only one database but one schema per user. This allows e.g. to share resources like stored procedures upon the public schema or to run queries against all customers as super user. The caveat being that users are less isolated (everybody can read the catalog). regards, Marc Mamin
Giuseppe Sacco wrote: > Another important fact is about large objects, if you happen to use > them: their OID is not just unique to the database, but to the whole > cluster. This means that when you move a database in a cluster from a > production system to a database on a test cluster, you may get errors > when same OID already exists in target cluster (even if it is used in a > different database). Well, I'm doing this frequently and it doesn't cause any error. Demo with PG 9.1 (import a large object, dump-reload into another database, verify that the copy shares the same OID): $ createdb dbtest1 $ psql dbtest1 dbtest1=# \lo_import /path/to/picture.jpg lo_import 2497765 dbtest1=# select distinct loid from pg_largeobject; loid --------- 2497765 (1 row) dbtest1=# \q $ pg_dump dbtest1 >dbtest1.sql $ createdb dbtest2 $ psql dbtest2 dbtest2=# \i dbtest1.sql SET SET SET SET SET CREATE EXTENSION COMMENT lo_create ----------- 2497765 (1 row) ALTER LARGE OBJECT SET BEGIN lo_open --------- 0 (1 row) lowrite --------- 16384 (1 row) lowrite --------- 5923 (1 row) lo_close ---------- 0 (1 row) COMMIT REVOKE REVOKE GRANT GRANT dbtest2=# select distinct loid from pg_largeobject; loid --------- 2497765 (1 row) So there's no error and this is the same OID in both databases. It gets forced in the dump with lo_create(). It happens however, that when importing large objects with lo_import, PostgreSQL returns OIDs that appear to be unique across databases. Incidentally that allows to copy them easily between databases. In the example above I'm importing into an empty db, so there's no potential for conflict, but if the second database had previously imported large objects too, they wouldn't conflict since they would have got unique OIDs. So these OIDs tend to be unique but that does not imply that they must be unique. Overall it's very convenient. Best regards, -- Daniel PostgreSQL-powered mail user agent and storage: http://www.manitou-mail.org
On Sat, Feb 21, 2015 at 3:01 PM, Samuel Smith <pgsql@net153.net> wrote: > Howdy, > I am looking for advice on migrating to postgres from another database > system. > > Without going into too much detail, my company offers a software solution > which we self host ourselves in our data center. We have gotten a green > light from management to start using postgres as a free and cheaper > database. Normally the application connects to a DB2 database. > > The current architecture has multiple DB2 databases installed on a single > linux host (on top of vmware with 4 vcpus). Each DB2 instance runs as a > different local linux user and only manages one database. Normally there are > less than 5 DB2 instances per host. > > My question is, if I am going to replace DB2 as the database with Postgres, > should I keep to this architecture of each customer application gets their > own database instance? What I was thinking would be much more simpler would > be to have multiple databases under one postgres instance (which is how most > linux distros install postgres anyway). Having multiple databases under one > instance seems to be a much easier way. Of course then the issue becomes if > there is an issue with this single instance, multiple customers will be > affected. I'd run a debian based distro (Ubuntu or Debian work well) and use the pg_* commands to create the clusters the same way. Gives you the maximum separation for clients. pg_createcluster Usage: /usr/bin/pg_createcluster [options] <version> <cluster name> [-- <initdb options>] Options: -u <uid> cluster owner and superuser (default: 'postgres') -g <gid> group for data files (default: primary group of owner) -d <dir> data directory (default: /var/lib/postgresql/<version>/<cluster name>) -s <dir> socket directory (default: /var/run/postgresql for clusters owned by 'postgres', /tmp for other clusters) -l <dir> path to desired log file (default: /var/log/postgresql/postgresql-<version>-<cluster>.log) --locale <encoding> set cluster locale (default: inherit from environment) --lc-collate/ctype/messages/monetary/numeric/time <locale> like --locale, but only set for a particular category -e <encoding> Default encoding (default: derived from locale) -p <port> port number (default: next free port starting from 5432) --start start the cluster after creating it --start-conf auto|manual|disabled Set automatic startup behaviour in start.conf (default: 'auto') --createclusterconf=file alternative createcluster.conf to use --environment=file alternative environment file to use <initdb options> other options to pass to initdb Just use -u and -d to put it where you want, -l to setup logging and -p to set the port you want. -- To understand recursion, one must first understand recursion.
"Daniel Verite" <daniel@manitou-mail.org> writes: > Giuseppe Sacco wrote: >> Another important fact is about large objects, if you happen to use >> them: their OID is not just unique to the database, but to the whole >> cluster. This means that when you move a database in a cluster from a >> production system to a database on a test cluster, you may get errors >> when same OID already exists in target cluster (even if it is used in a >> different database). > Well, I'm doing this frequently and it doesn't cause any error. That's because the above claim is nonsense. pg_largeobject is not shared across databases of a cluster. You could well have collisions against large objects in the same database, though, if you're adding more large objects to an existing database and expecting to preserve their OIDs. regards, tom lane
On 02/21/2015 05:25 PM, David Steele wrote: > On 2/21/15 6:08 PM, Adrian Klaver wrote: >> >> Currently the built in replication solutions work at the cluster level, >> not at the database level. There are third party tools, Slony and >> Bucardo come to mind, that can work at a more focused level. > > Again, it depends in what kind of access your customers have to the > database. Logical backups can be done on a per-database basis (and are > very practical for small datasets), but physical backups (which are more > efficient) must backup the entire cluster (except for the third-party > solutions noted above that have their own challenges). > > You said you don't want to give a lot of information, but the optimal > solution depends on how your customers connect to their database: > > 1) Entirely through an application or API > 2) Directly to the db via psql, PgAdmin, JDBC, etc. > > Without that information it's hard to give really solid advice. > Sorry, to be more specific our application is on top of a java framework. Similar to Ruby On Rails. Customers don't interface with the database at all, only through the website which provides our services. --Sam
On 23/02/15 15:53, Samuel Smith wrote: > On 02/21/2015 05:25 PM, David Steele wrote: >> On 2/21/15 6:08 PM, Adrian Klaver wrote: >>> >>> Currently the built in replication solutions work at the cluster level, >>> not at the database level. There are third party tools, Slony and >>> Bucardo come to mind, that can work at a more focused level. >> >> Again, it depends in what kind of access your customers have to the >> database. Logical backups can be done on a per-database basis (and are >> very practical for small datasets), but physical backups (which are more >> efficient) must backup the entire cluster (except for the third-party >> solutions noted above that have their own challenges). >> >> You said you don't want to give a lot of information, but the optimal >> solution depends on how your customers connect to their database: >> >> 1) Entirely through an application or API >> 2) Directly to the db via psql, PgAdmin, JDBC, etc. >> >> Without that information it's hard to give really solid advice. >> > > Sorry, to be more specific our application is on top of a java > framework. Similar to Ruby On Rails. Customers don't interface with > the database at all, only through the website which provides our > services. > > --Sam > > In that case I'd go for one instance of pg. With either 1. different DB's or , 2. or same db, but fields identifying which customer is allowed access. Depending what factors are important in your situation. The second option might be best, if there is a lot in common between customers (tables of data that are not customer specific, and/or common pg configuration options). Cheers, Gavin
On 02/22/2015 01:53 PM, Scott Marlowe wrote:> > I'd run a debian based distro (Ubuntu or Debian work well) and use the > pg_* commands to create the clusters the same way. Gives you the > maximum separation for clients. > > pg_createcluster > > Usage: /usr/bin/pg_createcluster [options] <version> <cluster name> > [-- <initdb options>] > > Options: > -u <uid> cluster owner and superuser (default: 'postgres') > -g <gid> group for data files (default: primary group of owner) > -d <dir> data directory (default: > /var/lib/postgresql/<version>/<cluster name>) > -s <dir> socket directory (default: /var/run/postgresql for clusters > owned by 'postgres', /tmp for other clusters) > -l <dir> path to desired log file (default: > /var/log/postgresql/postgresql-<version>-<cluster>.log) > --locale <encoding> > set cluster locale (default: inherit from environment) > --lc-collate/ctype/messages/monetary/numeric/time <locale> > like --locale, but only set for a particular category > -e <encoding> Default encoding (default: derived from locale) > -p <port> port number (default: next free port starting from 5432) > --start start the cluster after creating it > --start-conf auto|manual|disabled > Set automatic startup behaviour in start.conf (default: 'auto') > --createclusterconf=file alternative createcluster.conf to use > --environment=file alternative environment file to use > <initdb options> other options to pass to initdb > > Just use -u and -d to put it where you want, -l to setup logging and > -p to set the port you want. > I am stuck with redhat as the OS so I'll only have initdb. But this is a good point. I assume there is nothing wrong with having multiple postgres instances (clusters) all running under a single postgres user on different ports on the same machine? But then what is the best way to handle connecting to an individual cluster (local only for admin purposes) with psql? Doesn't look like I could go by the cluster name alone, but would have to manage (memorize) connections by port number? Thanks, Sam
On 23/02/15 17:21, Samuel Smith wrote: > On 02/22/2015 01:53 PM, Scott Marlowe wrote:> > > I'd run a debian based distro (Ubuntu or Debian work well) and use the > > pg_* commands to create the clusters the same way. Gives you the > > maximum separation for clients. > > > > pg_createcluster > > > > Usage: /usr/bin/pg_createcluster [options] <version> <cluster name> > > [-- <initdb options>] > > > > Options: > > -u <uid> cluster owner and superuser (default: 'postgres') > > -g <gid> group for data files (default: primary group of owner) > > -d <dir> data directory (default: > > /var/lib/postgresql/<version>/<cluster name>) > > -s <dir> socket directory (default: /var/run/postgresql for > clusters > > owned by 'postgres', /tmp for other clusters) > > -l <dir> path to desired log file (default: > > /var/log/postgresql/postgresql-<version>-<cluster>.log) > > --locale <encoding> > > set cluster locale (default: inherit from environment) > > --lc-collate/ctype/messages/monetary/numeric/time <locale> > > like --locale, but only set for a particular category > > -e <encoding> Default encoding (default: derived from locale) > > -p <port> port number (default: next free port starting from > 5432) > > --start start the cluster after creating it > > --start-conf auto|manual|disabled > > Set automatic startup behaviour in start.conf > (default: 'auto') > > --createclusterconf=file alternative createcluster.conf to use > > --environment=file alternative environment file to use > > <initdb options> other options to pass to initdb > > > > Just use -u and -d to put it where you want, -l to setup logging and > > -p to set the port you want. > > > > I am stuck with redhat as the OS so I'll only have initdb. But this is > a good point. I assume there is nothing wrong with having multiple > postgres instances (clusters) all running under a single postgres user > on different ports on the same machine? > But then what is the best way to handle connecting to an individual > cluster (local only for admin purposes) with psql? Doesn't look like > I could go by the cluster name alone, but would have to manage > (memorize) connections by port number? > > Thanks, > Sam > > pg_archivecleanup pg_basebackup pg_config pg_controldata pg_ctl pg_dump pg_dumpall pg_isready pg_receivexlog pg_recvlogical pg_resetxlog pg_restore pg_standby pg_test_fsync pg_test_timing pg_upgrade pg_xlogdump Are the pg_* commands I found on my Fedora 21 box - if its of any help! Cheers, Gavn
On Feb 23, 2015 5:29 AM, "Samuel Smith" <pgsql@net153.net> wrote:
>
> On 02/22/2015 01:53 PM, Scott Marlowe wrote:>
> > I'd run a debian based distro (Ubuntu or Debian work well) and use the
> > pg_* commands to create the clusters the same way. Gives you the
> > maximum separation for clients.
> >
> > pg_createcluster
> >
> > Usage: /usr/bin/pg_createcluster [options] <version> <cluster name>
> > [-- <initdb options>]
> >
> > Options:
> > -u <uid> cluster owner and superuser (default: 'postgres')
> > -g <gid> group for data files (default: primary group of owner)
> > -d <dir> data directory (default:
> > /var/lib/postgresql/<version>/<cluster name>)
> > -s <dir> socket directory (default: /var/run/postgresql for clusters
> > owned by 'postgres', /tmp for other clusters)
> > -l <dir> path to desired log file (default:
> > /var/log/postgresql/postgresql-<version>-<cluster>.log)
> > --locale <encoding>
> > set cluster locale (default: inherit from environment)
> > --lc-collate/ctype/messages/monetary/numeric/time <locale>
> > like --locale, but only set for a particular category
> > -e <encoding> Default encoding (default: derived from locale)
> > -p <port> port number (default: next free port starting from 5432)
> > --start start the cluster after creating it
> > --start-conf auto|manual|disabled
> > Set automatic startup behaviour in start.conf (default: 'auto')
> > --createclusterconf=file alternative createcluster.conf to use
> > --environment=file alternative environment file to use
> > <initdb options> other options to pass to initdb
> >
> > Just use -u and -d to put it where you want, -l to setup logging and
> > -p to set the port you want.
> >
>
> I am stuck with redhat as the OS so I'll only have initdb. But this is a good point. I assume there is nothing wrong with having multiple postgres instances (clusters) all running under a single postgres user on different ports on the same machine?
> But then what is the best way to handle connecting to an individual cluster (local only for admin purposes) with psql? Doesn't look like I could go by the cluster name alone, but would have to manage (memorize) connections by port number?
>
You can use shell aliases and name it as you please.
Petr
> Thanks,
> Sam
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
Samuel Smith schrieb am 23.02.2015 um 05:21: > I am stuck with redhat as the OS so I'll only have initdb. But this is a good point. > I assume there is nothing wrong with having multiple postgres instances (clusters) all running under a > single postgres user on different ports on the same machine? No, there isn't. One important advantage of that is that you can do a PITR for each customer individually. If you keep all your databases in a single instance/cluster you can only do PITR for all of them. For a multi-tenant system I don't think that will be acceptable. Another advantage is, that you can have control superuser access on a per-cluster basis > But then what is the best way to handle connecting to an individual cluster (local only for admin purposes) with psql? psql uses several environment variables for the default connection information: http://www.postgresql.org/docs/current/static/libpq-envars.html You can either create one shell script for each instance/cluster or use different OS users that have different default environmentvariables - that would be closer to what you have now. Thomas
Il giorno dom, 22/02/2015 alle 14.53 -0500, Tom Lane ha scritto: [...] > That's because the above claim is nonsense. pg_largeobject is not shared > across databases of a cluster. > > You could well have collisions against large objects in the same database, > though, if you're adding more large objects to an existing database and > expecting to preserve their OIDs. The problem is that when you use large objects, you have a table that contain the OIDs or the large objects, and you need to keep the same link table->LOB when moving the database. So, when you export the database using pg_dump, it create an sql script that restore the db using the same OIDs. If you run that script on any cluster, you may possibly have the OID already used, and the import process does not work. Basically, you cannot use pg_dump for moving databases (that use large objects), because there is no guarantee that your import succeed. I normally import such dumps in new clusters, and it works. Bye, Giuseppe