Thread: How PostgreSQL handles multiple DDBB instances?

How PostgreSQL handles multiple DDBB instances?

From
Arnau
Date:
Hi all,

   I have a doubt/problem about how PostgreSQL handles multiple DDBB
instances running on a same server and how I should design the
architecture of an application.

   I have an application that works with multiple customers. Thinking in
scalability we are thinking in applying the following approaches:

   - Create a separate database instance for each customer.
   - We think that customer's DB will be quite small, about 200MB as
average.
   - The number of clients, then DDBB, can be significant(thousands).
   - Have as many customers as possible on the same server, so a single
server could have more than 300 DDBB instances.


   Do you think this makes sense? or taking into account that the
expected DDBB size, would be better to join several customers DDBB in
just one instance. What I'm worried about is, if having so many DDBB
instances PostgreSQL's performance would be worse.

  I have been following the list and one of the advises that appears
more often is keep your DB in memory, so if I have just one instance
instead of "hundreds" the performance will be better?

Thank you very much
--
Arnau

Re: How PostgreSQL handles multiple DDBB instances?

From
Tom Lane
Date:
Arnau <arnaulist@andromeiberica.com> writes:
>    I have an application that works with multiple customers. Thinking in
> scalability we are thinking in applying the following approaches:

>    - Create a separate database instance for each customer.
>    - We think that customer's DB will be quite small, about 200MB as
> average.
>    - The number of clients, then DDBB, can be significant(thousands).
>    - Have as many customers as possible on the same server, so a single
> server could have more than 300 DDBB instances.

This is probably a bad idea, unless each customer's performance demands
are so low that you can afford to use very small shared-memory settings
for each instance.  But even small settings will probably eat ~10MB per
instance --- can you afford to build these machines with multiple GB of
RAM?

Can you instead run things with one postmaster per machine and one
database per customer within that instance?  From a performance
perspective this is likely to work much better.

If you desire to give the customers database-superuser capability then
this probably won't do, but if they are restricted users it might be OK.

            regards, tom lane

Re: How PostgreSQL handles multiple DDBB instances?

From
Arnau
Date:
Hi Tom,

> Arnau <arnaulist@andromeiberica.com> writes:
>>    I have an application that works with multiple customers. Thinking in
>> scalability we are thinking in applying the following approaches:
>
>>    - Create a separate database instance for each customer.
>>    - We think that customer's DB will be quite small, about 200MB as
>> average.
>>    - The number of clients, then DDBB, can be significant(thousands).
>>    - Have as many customers as possible on the same server, so a single
>> server could have more than 300 DDBB instances.
>
> This is probably a bad idea, unless each customer's performance demands
> are so low that you can afford to use very small shared-memory settings
> for each instance.  But even small settings will probably eat ~10MB per
> instance --- can you afford to build these machines with multiple GB of
> RAM?
>
> Can you instead run things with one postmaster per machine and one
> database per customer within that instance?  From a performance
> perspective this is likely to work much better.

   What I meant is just have only one postmaster per server and a lot of
databases running in it. Something like that:

   template1=# \l
             List of databases
        Name        |   Owner   | Encoding
-------------------+-----------+----------
  alertwdv2         | gguridi   | LATIN1
  postgres          | postgres  | LATIN1
  template0         | postgres  | LATIN1
  template1         | postgres  | LATIN1
  voicexml          | root      | LATIN1
  wikidb            | root      | LATIN1
(6 rows)

   Here I just have 6 databases, so my doubt is if instead having 6
databases have 300/600 bases running on the same postmaster how this
will impact the performance e.g.

   template1=# \l
             List of databases
        Name        |   Owner   | Encoding
-------------------+-----------+----------
  template0         | postgres  | LATIN1
  template1         | postgres  | LATIN1
  customers_group_1 | root      | LATIN1
(3 rows)

Instead of:

   template1=# \l
             List of databases
        Name        |   Owner   | Encoding
-------------------+-----------+----------
  template0         | postgres  | LATIN1
  template1         | postgres  | LATIN1
  customers_1       | root      | LATIN1
  customers_2       | root      | LATIN1
  customers_3       | root      | LATIN1
  ...
  customers_500     | root      | LATIN1
(502 rows)


> If you desire to give the customers database-superuser capability then
> this probably won't do, but if they are restricted users it might be OK.

   The users won't have superuser access just execute plain queries.

Thank you very much
--
Arnau

Re: How PostgreSQL handles multiple DDBB instances?

From
Tom Lane
Date:
Arnau <arnaulist@andromeiberica.com> writes:
>> Can you instead run things with one postmaster per machine and one
>> database per customer within that instance?  From a performance
>> perspective this is likely to work much better.

>    What I meant is just have only one postmaster per server and a lot of
> databases running in it.

OK, we are on the same page then.  Should work fine.  I think I've heard
of people running installations with thousands of DBs in them.  You'll
want to test it a bit of course ...

            regards, tom lane

Re: How PostgreSQL handles multiple DDBB instances?

From
Arnau
Date:
Tom Lane wrote:
> Arnau <arnaulist@andromeiberica.com> writes:
>>> Can you instead run things with one postmaster per machine and one
>>> database per customer within that instance?  From a performance
>>> perspective this is likely to work much better.
>
>>    What I meant is just have only one postmaster per server and a lot of
>> databases running in it.
>
> OK, we are on the same page then.  Should work fine.  I think I've heard
> of people running installations with thousands of DBs in them.  You'll
> want to test it a bit of course ...

I'm worried about performance, I have done some tests and I have on a
server more than 400 DBs, so it's possible to run such amount of DBs in
a single postmaster.

   The point I'm worried is performance. Do you think the performance
would be better executing exactly the same queries only adding an extra
column to all the tables e.g. customer_id, than open a connection to the
only one customers DB and execute the query there?

   I don't know if PostgreSQL cache's mechanism works as good as
querying to 400 possible DBs or just to one possible DB.

Thank you very much for your help :)
--
Arnau

Re: How PostgreSQL handles multiple DDBB instances?

From
Tom Lane
Date:
Arnau <arnaulist@andromeiberica.com> writes:
>    The point I'm worried is performance. Do you think the performance
> would be better executing exactly the same queries only adding an extra
> column to all the tables e.g. customer_id, than open a connection to the
> only one customers DB and execute the query there?

[ shrug... ]  That's going to depend on enough factors that I don't
think anyone could give you a generic answer.  You'd have to test it for
yourself under your own application conditions.

However: doing it that way seems to me to create severe risks that the
customers might be able to look at each others' data.  You probably want
to go with separate databases just as a security matter.

            regards, tom lane

Re: How PostgreSQL handles multiple DDBB instances?

From
Jeff Davis
Date:
On Fri, 2007-05-25 at 20:16 +0200, Arnau wrote:
>    The point I'm worried is performance. Do you think the performance
> would be better executing exactly the same queries only adding an extra
> column to all the tables e.g. customer_id, than open a connection to the
> only one customers DB and execute the query there?

Have you already considered using views with specific privileges to
separate your customers?

Regards,
    Jeff Davis


Re: How PostgreSQL handles multiple DDBB instances?

From
Craig James
Date:
On Fri, 2007-05-25 at 20:16 +0200, Arnau wrote:
>    The point I'm worried is performance. Do you think the performance
> would be better executing exactly the same queries only adding an extra
> column to all the tables e.g. customer_id, than open a connection to the
> only one customers DB and execute the query there?

There is no simple answer to this question; it depends too much on your data.  In many cases, adding a customer_id to
everytable, and perhaps also per-customer views (per Jeff's suggestion), can work really well. 

However, performance is not the only consideration, or even the main consideration.  We operate with about 150 separate
databases. In our cases, administration issues and software design outweighed performance issues. 

For example, with separate databases, security is simpler, *and* it's easy to convince the customer that their data is
protected. Creating views only helps for read-only access.  When the customer wants to modify their data, how will you
keepthem from accessing and overwriting one another's data?  Even with views, can you convince the customer you've done
itright?  With separate databases, you use the built-in security of Postgres, and don't have to duplicate it in your
schemaand apps. 

With separate databases, it's really easy to discard a customer.  This can be particularly important for a big customer
withmillions of linked records.  In a database-for-everyone design, you'll have lots of foreign keys, indexes, etc.
thatmake deleting a whole customer a REALLY big job.  Contrast that with just discarding a whole database, which
typicallytakes a couple seconds. 

But even more important (to us) is the simplicity of the applications and management.  It's far more than just an extra
"... and customer = xyz" added to every query.  Throwing the customers together means every application has to
understandsecurity, and many operations that would be simple become horribly tangled.  Want to back up a customer's
data? You can't use pg_dump, you have to write your own dump app.  Want to restore a customer's data?  Same.  Want to
doa big update?  Your whole database is affected and probably needs to be vacuum/analyzed.  On and on, at every turn,
managementand applications are more complex. 

If you have hundreds of separate databases, it's also easy to scale: Just buy more servers, and move some of the
databases. With a single monster database, as load increases, you may hit the wall sooner or later. 

Postgres is really good at maintaining many separate databases.  Why do it yourself?

There are indeed performance issues, but even that's not black and white.  Depending on the specifics of your queries
andthe load on your servers, you may get better performance from a single monster database, or from hundreds of
separatedatabases. 

So, your question has no simple answer.  You should indeed evaluate the performance, but other issues may dominate your
decision.

Craig