Thread: Postgres architecture for multiple instances

Postgres architecture for multiple instances

From
Samuel Smith
Date:
Howdy,
I am looking for advice on migrating to postgres from another database
system.

Without going into too much detail, my company offers a software
solution which we self host ourselves in our data center. We have gotten
a green light from management to start using postgres as a free and
cheaper database. Normally the application connects to a DB2 database.

The current architecture has multiple DB2 databases installed on a
single linux host (on top of vmware with 4 vcpus). Each DB2 instance
runs as a different local linux user and only manages one database.
Normally there are less than 5 DB2 instances per host.

My question is, if I am going to replace DB2 as the database with
Postgres, should I keep to this architecture of each customer
application gets their own database instance? What I was thinking would
be much more simpler would be to have multiple databases under one
postgres instance (which is how most linux distros install postgres
anyway). Having multiple databases under one instance seems to be a much
easier way. Of course then the issue becomes if there is an issue with
this single instance, multiple customers will be affected.

For the most part, I am completely on the fence about deciding either
way. What does the postgres community recommend in this situation? All
docs and books that I read seem to suggest that a postgres instance
manages databases (plural!) and normally there would be only one
postgres instance running per host.

I am also interested in the best way forward with failover and high
availability.

Looking forward to comments,
Sam


Re: Postgres architecture for multiple instances

From
Adrian Klaver
Date:
On 02/21/2015 02:01 PM, Samuel Smith wrote:
> Howdy,
> I am looking for advice on migrating to postgres from another database
> system.
>
> Without going into too much detail, my company offers a software
> solution which we self host ourselves in our data center. We have gotten
> a green light from management to start using postgres as a free and
> cheaper database. Normally the application connects to a DB2 database.
>
> The current architecture has multiple DB2 databases installed on a
> single linux host (on top of vmware with 4 vcpus). Each DB2 instance
> runs as a different local linux user and only manages one database.
> Normally there are less than 5 DB2 instances per host.
>
> My question is, if I am going to replace DB2 as the database with
> Postgres, should I keep to this architecture of each customer
> application gets their own database instance? What I was thinking would
> be much more simpler would be to have multiple databases under one
> postgres instance (which is how most linux distros install postgres
> anyway). Having multiple databases under one instance seems to be a much
> easier way. Of course then the issue becomes if there is an issue with
> this single instance, multiple customers will be affected.

The issue that has come up in previous discussions on this subject is
that there is global meta data available to all users in a Postgres
cluster(instance), in particular the database users. So while it is
possible to restrict access to the database itself, it is more difficult
if not impossible to restrict access to information about the other
databases in the cluster.

>
> For the most part, I am completely on the fence about deciding either
> way. What does the postgres community recommend in this situation? All
> docs and books that I read seem to suggest that a postgres instance
> manages databases (plural!) and normally there would be only one
> postgres instance running per host.
>
> I am also interested in the best way forward with failover and high
> availability.

Currently the built in replication solutions work at the cluster level,
not at the database level. There are third party tools, Slony and
Bucardo come to mind, that can work at a more focused level.

>
> Looking forward to comments,
> Sam
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: Postgres architecture for multiple instances

From
David Steele
Date:
On 2/21/15 6:08 PM, Adrian Klaver wrote:
> On 02/21/2015 02:01 PM, Samuel Smith wrote:
>> Howdy,
>> I am looking for advice on migrating to postgres from another database
>> system.
>>
>> Without going into too much detail, my company offers a software
>> solution which we self host ourselves in our data center. We have gotten
>> a green light from management to start using postgres as a free and
>> cheaper database. Normally the application connects to a DB2 database.
>>
>> The current architecture has multiple DB2 databases installed on a
>> single linux host (on top of vmware with 4 vcpus). Each DB2 instance
>> runs as a different local linux user and only manages one database.
>> Normally there are less than 5 DB2 instances per host.
>>
>> My question is, if I am going to replace DB2 as the database with
>> Postgres, should I keep to this architecture of each customer
>> application gets their own database instance? What I was thinking would
>> be much more simpler would be to have multiple databases under one
>> postgres instance (which is how most linux distros install postgres
>> anyway). Having multiple databases under one instance seems to be a much
>> easier way. Of course then the issue becomes if there is an issue with
>> this single instance, multiple customers will be affected.

Certainly more than one customer may be affected by an outage, but there
are also fewer instances to be managed.  This can be particularly
important for upgrades - fewer systems to upgrade can be a good thing.

> The issue that has come up in previous discussions on this subject is
> that there is global meta data available to all users in a Postgres
> cluster(instance), in particular the database users. So while it is
> possible to restrict access to the database itself, it is more difficult
> if not impossible to restrict access to information about the other
> databases in the cluster.

This is only a problem if the customer has direct access to the
database.  If they are connecting through a software layer then
multi-tenant solutions are practical.

>> For the most part, I am completely on the fence about deciding either
>> way. What does the postgres community recommend in this situation? All
>> docs and books that I read seem to suggest that a postgres instance
>> manages databases (plural!) and normally there would be only one
>> postgres instance running per host.
>>
>> I am also interested in the best way forward with failover and high
>> availability.
>
> Currently the built in replication solutions work at the cluster level,
> not at the database level. There are third party tools, Slony and
> Bucardo come to mind, that can work at a more focused level.

Again, it depends in what kind of access your customers have to the
database.  Logical backups can be done on a per-database basis (and are
very practical for small datasets), but physical backups (which are more
efficient) must backup the entire cluster (except for the third-party
solutions noted above that have their own challenges).

You said you don't want to give a lot of information, but the optimal
solution depends on how your customers connect to their database:

1) Entirely through an application or API
2) Directly to the db via psql, PgAdmin, JDBC, etc.

Without that information it's hard to give really solid advice.

--
- David Steele
david@pgmasters.net


Attachment

Re: Postgres architecture for multiple instances

From
Giuseppe Sacco
Date:
Il giorno sab, 21/02/2015 alle 16.01 -0600, Samuel Smith ha scritto:
> Howdy,
> I am looking for advice on migrating to postgres from another database
> system.
[...]

People already wrote you some comments, here are two more.

DB2 instances run as different OS users, so if you need the same
approach for security reasons, you will need to create different
postgresql clusters.

Another important fact is about large objects, if you happen to use
them: their OID is not just unique to the database, but to the whole
cluster. This means that when you move a database in a cluster from a
production system to a database on a test cluster, you may get errors
when same OID already exists in target cluster (even if it is used in a
different database).

Bye,
Giuseppe



Re: Postgres architecture for multiple instances

From
Marc Mamin
Date:
>Howdy,
>I am looking for advice on migrating to postgres from another database
>system.
>
>Without going into too much detail, my company offers a software
>solution which we self host ourselves in our data center. We have gotten
>a green light from management to start using postgres as a free and
>cheaper database. Normally the application connects to a DB2 database.
>
>The current architecture has multiple DB2 databases installed on a
>single linux host (on top of vmware with 4 vcpus). Each DB2 instance
>runs as a different local linux user and only manages one database.
>Normally there are less than 5 DB2 instances per host.
>
>My question is, if I am going to replace DB2 as the database with
>Postgres, should I keep to this architecture of each customer
>application gets their own database instance? What I was thinking would
>be much more simpler would be to have multiple databases under one
>postgres instance (which is how most linux distros install postgres
>anyway). Having multiple databases under one instance seems to be a much
>easier way. Of course then the issue becomes if there is an issue with
>this single instance, multiple customers will be affected.

Hello,
Another possible approach is to have only one database
but one schema per user.
This allows e.g. to share resources like stored procedures upon the public schema
or to run queries against all customers as super user.
The caveat being that users are less isolated (everybody can read the catalog).

regards,

Marc Mamin

Re: Postgres architecture for multiple instances

From
"Daniel Verite"
Date:
    Giuseppe Sacco wrote:

> Another important fact is about large objects, if you happen to use
> them: their OID is not just unique to the database, but to the whole
> cluster. This means that when you move a database in a cluster from a
> production system to a database on a test cluster, you may get errors
> when same OID already exists in target cluster (even if it is used in a
> different database).

Well, I'm doing this frequently and it doesn't cause any error.

Demo with PG 9.1 (import a large object, dump-reload into another database,
verify that the copy shares the same OID):

$ createdb dbtest1
$ psql dbtest1
dbtest1=# \lo_import /path/to/picture.jpg
lo_import 2497765

dbtest1=# select distinct loid from pg_largeobject;
  loid
---------
 2497765
(1 row)

dbtest1=# \q

$ pg_dump dbtest1 >dbtest1.sql

$ createdb dbtest2

$ psql dbtest2
dbtest2=# \i dbtest1.sql
SET
SET
SET
SET
SET
CREATE EXTENSION
COMMENT
 lo_create
-----------
   2497765
(1 row)

ALTER LARGE OBJECT
SET
BEGIN
 lo_open
---------
       0
(1 row)

 lowrite
---------
   16384
(1 row)

 lowrite
---------
    5923
(1 row)

 lo_close
----------
    0
(1 row)

COMMIT
REVOKE
REVOKE
GRANT
GRANT

dbtest2=# select distinct loid from pg_largeobject;
  loid
---------
 2497765
(1 row)

So there's no error and this is the same OID in both databases. It gets
forced in the dump with lo_create().

It happens however, that when importing large objects with lo_import,
PostgreSQL returns OIDs that appear to be unique across databases.
Incidentally that allows to copy them easily between databases.
In the example above I'm importing into an empty db, so there's no potential
for conflict, but if the second database had previously imported large
objects too, they wouldn't conflict since they would have got unique OIDs.

So these OIDs tend to be unique but that does not imply that they must be
unique.  Overall it's very convenient.

Best regards,
--
Daniel
PostgreSQL-powered mail user agent and storage: http://www.manitou-mail.org


Re: Postgres architecture for multiple instances

From
Scott Marlowe
Date:
On Sat, Feb 21, 2015 at 3:01 PM, Samuel Smith <pgsql@net153.net> wrote:
> Howdy,
> I am looking for advice on migrating to postgres from another database
> system.
>
> Without going into too much detail, my company offers a software solution
> which we self host ourselves in our data center. We have gotten a green
> light from management to start using postgres as a free and cheaper
> database. Normally the application connects to a DB2 database.
>
> The current architecture has multiple DB2 databases installed on a single
> linux host (on top of vmware with 4 vcpus). Each DB2 instance runs as a
> different local linux user and only manages one database. Normally there are
> less than 5 DB2 instances per host.
>
> My question is, if I am going to replace DB2 as the database with Postgres,
> should I keep to this architecture of each customer application gets their
> own database instance? What I was thinking would be much more simpler would
> be to have multiple databases under one postgres instance (which is how most
> linux distros install postgres anyway). Having multiple databases under one
> instance seems to be a much easier way. Of course then the issue becomes if
> there is an issue with this single instance, multiple customers will be
> affected.

I'd run a debian based distro (Ubuntu or Debian work well) and use the
pg_* commands to create the clusters the same way. Gives you the
maximum separation for clients.

pg_createcluster

Usage: /usr/bin/pg_createcluster [options] <version> <cluster name>
[-- <initdb options>]

Options:
  -u <uid>      cluster owner and superuser (default: 'postgres')
  -g <gid>      group for data files (default: primary group of owner)
  -d <dir>      data directory (default:
                /var/lib/postgresql/<version>/<cluster name>)
  -s <dir>      socket directory (default: /var/run/postgresql for clusters
                owned by 'postgres', /tmp for other clusters)
  -l <dir>      path to desired log file (default:
                /var/log/postgresql/postgresql-<version>-<cluster>.log)
  --locale <encoding>
                set cluster locale (default: inherit from environment)
  --lc-collate/ctype/messages/monetary/numeric/time <locale>
                like --locale, but only set for a particular category
  -e <encoding> Default encoding (default: derived from locale)
  -p <port>     port number (default: next free port starting from 5432)
  --start       start the cluster after creating it
  --start-conf auto|manual|disabled
                Set automatic startup behaviour in start.conf (default: 'auto')
  --createclusterconf=file alternative createcluster.conf to use
  --environment=file alternative environment file to use
  <initdb options> other options to pass to initdb

Just use -u and -d to put it where you want, -l to setup logging and
-p to set the port you want.

--
To understand recursion, one must first understand recursion.


Re: Postgres architecture for multiple instances

From
Tom Lane
Date:
"Daniel Verite" <daniel@manitou-mail.org> writes:
>     Giuseppe Sacco wrote:
>> Another important fact is about large objects, if you happen to use
>> them: their OID is not just unique to the database, but to the whole
>> cluster. This means that when you move a database in a cluster from a
>> production system to a database on a test cluster, you may get errors
>> when same OID already exists in target cluster (even if it is used in a
>> different database).

> Well, I'm doing this frequently and it doesn't cause any error.

That's because the above claim is nonsense.  pg_largeobject is not shared
across databases of a cluster.

You could well have collisions against large objects in the same database,
though, if you're adding more large objects to an existing database and
expecting to preserve their OIDs.

            regards, tom lane


Re: Postgres architecture for multiple instances

From
Samuel Smith
Date:
On 02/21/2015 05:25 PM, David Steele wrote:
> On 2/21/15 6:08 PM, Adrian Klaver wrote:
>>
>> Currently the built in replication solutions work at the cluster level,
>> not at the database level. There are third party tools, Slony and
>> Bucardo come to mind, that can work at a more focused level.
>
> Again, it depends in what kind of access your customers have to the
> database.  Logical backups can be done on a per-database basis (and are
> very practical for small datasets), but physical backups (which are more
> efficient) must backup the entire cluster (except for the third-party
> solutions noted above that have their own challenges).
>
> You said you don't want to give a lot of information, but the optimal
> solution depends on how your customers connect to their database:
>
> 1) Entirely through an application or API
> 2) Directly to the db via psql, PgAdmin, JDBC, etc.
>
> Without that information it's hard to give really solid advice.
>

Sorry, to be more specific our application is on top of a java
framework. Similar to Ruby On Rails. Customers don't interface with the
database at all, only through the website which provides our services.

--Sam


Re: Postgres architecture for multiple instances

From
Gavin Flower
Date:
On 23/02/15 15:53, Samuel Smith wrote:
> On 02/21/2015 05:25 PM, David Steele wrote:
>> On 2/21/15 6:08 PM, Adrian Klaver wrote:
>>>
>>> Currently the built in replication solutions work at the cluster level,
>>> not at the database level. There are third party tools, Slony and
>>> Bucardo come to mind, that can work at a more focused level.
>>
>> Again, it depends in what kind of access your customers have to the
>> database.  Logical backups can be done on a per-database basis (and are
>> very practical for small datasets), but physical backups (which are more
>> efficient) must backup the entire cluster (except for the third-party
>> solutions noted above that have their own challenges).
>>
>> You said you don't want to give a lot of information, but the optimal
>> solution depends on how your customers connect to their database:
>>
>> 1) Entirely through an application or API
>> 2) Directly to the db via psql, PgAdmin, JDBC, etc.
>>
>> Without that information it's hard to give really solid advice.
>>
>
> Sorry, to be more specific our application is on top of a java
> framework. Similar to Ruby On Rails. Customers don't interface with
> the database at all, only through the website which provides our
> services.
>
> --Sam
>
>
In that case I'd go for one instance of pg.

With either

 1. different DB's or ,
 2. or same db, but fields identifying which customer is allowed access.

Depending what factors are important in your situation.

The second option might be best, if there is a lot in common between
customers (tables of data that are not customer specific, and/or common
pg configuration options).



Cheers,
Gavin




Re: Postgres architecture for multiple instances

From
Samuel Smith
Date:
On 02/22/2015 01:53 PM, Scott Marlowe wrote:>
 > I'd run a debian based distro (Ubuntu or Debian work well) and use the
 > pg_* commands to create the clusters the same way. Gives you the
 > maximum separation for clients.
 >
 > pg_createcluster
 >
 > Usage: /usr/bin/pg_createcluster [options] <version> <cluster name>
 > [-- <initdb options>]
 >
 > Options:
 >    -u <uid>      cluster owner and superuser (default: 'postgres')
 >    -g <gid>      group for data files (default: primary group of owner)
 >    -d <dir>      data directory (default:
 >                  /var/lib/postgresql/<version>/<cluster name>)
 >    -s <dir>      socket directory (default: /var/run/postgresql for
clusters
 >                  owned by 'postgres', /tmp for other clusters)
 >    -l <dir>      path to desired log file (default:
 >                  /var/log/postgresql/postgresql-<version>-<cluster>.log)
 >    --locale <encoding>
 >                  set cluster locale (default: inherit from environment)
 >    --lc-collate/ctype/messages/monetary/numeric/time <locale>
 >                  like --locale, but only set for a particular category
 >    -e <encoding> Default encoding (default: derived from locale)
 >    -p <port>     port number (default: next free port starting from 5432)
 >    --start       start the cluster after creating it
 >    --start-conf auto|manual|disabled
 >                  Set automatic startup behaviour in start.conf
(default: 'auto')
 >    --createclusterconf=file alternative createcluster.conf to use
 >    --environment=file alternative environment file to use
 >    <initdb options> other options to pass to initdb
 >
 > Just use -u and -d to put it where you want, -l to setup logging and
 > -p to set the port you want.
 >

I am stuck with redhat as the OS so I'll only have initdb. But this is a
good point. I assume there is nothing wrong with having multiple
postgres instances (clusters) all running under a single postgres user
on different ports on the same machine?
But then what is the best way to handle connecting to an individual
cluster  (local only for admin purposes) with psql?  Doesn't look like I
could go by the cluster name alone, but would have to manage (memorize)
connections by port number?

Thanks,
Sam


Re: Postgres architecture for multiple instances

From
Gavin Flower
Date:
On 23/02/15 17:21, Samuel Smith wrote:
> On 02/22/2015 01:53 PM, Scott Marlowe wrote:>
> > I'd run a debian based distro (Ubuntu or Debian work well) and use the
> > pg_* commands to create the clusters the same way. Gives you the
> > maximum separation for clients.
> >
> > pg_createcluster
> >
> > Usage: /usr/bin/pg_createcluster [options] <version> <cluster name>
> > [-- <initdb options>]
> >
> > Options:
> >    -u <uid>      cluster owner and superuser (default: 'postgres')
> >    -g <gid>      group for data files (default: primary group of owner)
> >    -d <dir>      data directory (default:
> > /var/lib/postgresql/<version>/<cluster name>)
> >    -s <dir>      socket directory (default: /var/run/postgresql for
> clusters
> >                  owned by 'postgres', /tmp for other clusters)
> >    -l <dir>      path to desired log file (default:
> > /var/log/postgresql/postgresql-<version>-<cluster>.log)
> >    --locale <encoding>
> >                  set cluster locale (default: inherit from environment)
> >    --lc-collate/ctype/messages/monetary/numeric/time <locale>
> >                  like --locale, but only set for a particular category
> >    -e <encoding> Default encoding (default: derived from locale)
> >    -p <port>     port number (default: next free port starting from
> 5432)
> >    --start       start the cluster after creating it
> >    --start-conf auto|manual|disabled
> >                  Set automatic startup behaviour in start.conf
> (default: 'auto')
> >    --createclusterconf=file alternative createcluster.conf to use
> >    --environment=file alternative environment file to use
> >    <initdb options> other options to pass to initdb
> >
> > Just use -u and -d to put it where you want, -l to setup logging and
> > -p to set the port you want.
> >
>
> I am stuck with redhat as the OS so I'll only have initdb. But this is
> a good point. I assume there is nothing wrong with having multiple
> postgres instances (clusters) all running under a single postgres user
> on different ports on the same machine?
> But then what is the best way to handle connecting to an individual
> cluster  (local only for admin purposes) with psql? Doesn't look like
> I could go by the cluster name alone, but would have to manage
> (memorize) connections by port number?
>
> Thanks,
> Sam
>
>
pg_archivecleanup
pg_basebackup
pg_config
pg_controldata
pg_ctl
pg_dump
pg_dumpall
pg_isready
pg_receivexlog
pg_recvlogical
pg_resetxlog
pg_restore
pg_standby
pg_test_fsync
pg_test_timing
pg_upgrade
pg_xlogdump

Are the pg_* commands I found on my Fedora 21 box - if its of any help!


Cheers,
Gavn


Re: Postgres architecture for multiple instances

From
Novák, Petr
Date:


On Feb 23, 2015 5:29 AM, "Samuel Smith" <pgsql@net153.net> wrote:
>
> On 02/22/2015 01:53 PM, Scott Marlowe wrote:>
> > I'd run a debian based distro (Ubuntu or Debian work well) and use the
> > pg_* commands to create the clusters the same way. Gives you the
> > maximum separation for clients.
> >
> > pg_createcluster
> >
> > Usage: /usr/bin/pg_createcluster [options] <version> <cluster name>
> > [-- <initdb options>]
> >
> > Options:
> >    -u <uid>      cluster owner and superuser (default: 'postgres')
> >    -g <gid>      group for data files (default: primary group of owner)
> >    -d <dir>      data directory (default:
> >                  /var/lib/postgresql/<version>/<cluster name>)
> >    -s <dir>      socket directory (default: /var/run/postgresql for clusters
> >                  owned by 'postgres', /tmp for other clusters)
> >    -l <dir>      path to desired log file (default:
> >                  /var/log/postgresql/postgresql-<version>-<cluster>.log)
> >    --locale <encoding>
> >                  set cluster locale (default: inherit from environment)
> >    --lc-collate/ctype/messages/monetary/numeric/time <locale>
> >                  like --locale, but only set for a particular category
> >    -e <encoding> Default encoding (default: derived from locale)
> >    -p <port>     port number (default: next free port starting from 5432)
> >    --start       start the cluster after creating it
> >    --start-conf auto|manual|disabled
> >                  Set automatic startup behaviour in start.conf (default: 'auto')
> >    --createclusterconf=file alternative createcluster.conf to use
> >    --environment=file alternative environment file to use
> >    <initdb options> other options to pass to initdb
> >
> > Just use -u and -d to put it where you want, -l to setup logging and
> > -p to set the port you want.
> >
>
> I am stuck with redhat as the OS so I'll only have initdb. But this is a good point. I assume there is nothing wrong with having multiple postgres instances (clusters) all running under a single postgres user on different ports on the same machine?
> But then what is the best way to handle connecting to an individual cluster  (local only for admin purposes) with psql?  Doesn't look like I could go by the cluster name alone, but would have to manage (memorize) connections by port number?
>

You can use shell aliases and name it as you please.

Petr
> Thanks,
> Sam
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general

Re: Postgres architecture for multiple instances

From
Thomas Kellerer
Date:
Samuel Smith schrieb am 23.02.2015 um 05:21:
> I am stuck with redhat as the OS so I'll only have initdb. But this is a good point.
> I assume there is nothing wrong with having multiple postgres instances (clusters) all running under a
> single postgres user on different ports on the same machine?

No, there isn't. One important advantage of that is that you can do a PITR for each customer individually.
If you keep all your databases in a single instance/cluster you can only do PITR for all of them.
For a multi-tenant system I don't think that will be acceptable.

Another advantage is, that you can have control superuser access on a per-cluster basis

> But then what is the best way to handle connecting to an individual cluster  (local only for admin purposes) with
psql?  

psql uses several environment variables for the default connection information:

  http://www.postgresql.org/docs/current/static/libpq-envars.html

You can either create one shell script for each instance/cluster or use different OS users that have different default
environmentvariables - that would be closer to what you have now.  

Thomas

Re: Postgres architecture for multiple instances

From
Giuseppe Sacco
Date:
Il giorno dom, 22/02/2015 alle 14.53 -0500, Tom Lane ha scritto:
[...]
> That's because the above claim is nonsense.  pg_largeobject is not shared
> across databases of a cluster.
>
> You could well have collisions against large objects in the same database,
> though, if you're adding more large objects to an existing database and
> expecting to preserve their OIDs.

The problem is that when you use large objects, you have a table that
contain the OIDs or the large objects, and you need to keep the same
link table->LOB when moving the database. So, when you export the
database using pg_dump, it create an sql script that restore the db
using the same OIDs. If you run that script on any cluster, you may
possibly have the OID already used, and the import process does not
work.

Basically, you cannot use pg_dump for moving databases (that use large
objects), because there is no guarantee that your import succeed.

I normally import such dumps in new clusters, and it works.

Bye,
Giuseppe