Thread: Which gives good performance? separate database vs separate schema

Which gives good performance? separate database vs separate schema

From

Divakar Singh

Date:

25 November 2010, 07:37:45

Hello Friends,
I have many instances of my software running on a server (Solaris SPARC). Each software instance requires some DB tables (same DDL for all instances' tables) to store data.
It essentially means that some processes from each instance of the software connect to these tables.
Now, should I put these tables in 1 Database's different schemas or in separate databases itself for good performance?
I am using libpq for connection.

Pictorial Representation:

Process1 -> DB1.schema1.table1

Process2 -> DB1.schema2.table1

Vs.

Process1 -> DB1.default.table1

Process2 -> DB2.default.table1

Which one is better?

thanks in advance

Re: Which gives good performance? separate database vs separate schema

From

tv@fuzzy.cz

Date:

25 November 2010, 08:02:19

Hello,

> Now, should I put these tables in 1 Database's different schemas or in
> separate
> databases itself for good performance?
> I am using libpq for connection.
>
> Pictorial Representation:
>
> Process1 -> DB1.schema1.table1
>
> Process2 -> DB1.schema2.table1
>
>   Vs.
>
> Process1 -> DB1.default.table1
>
> Process2 -> DB2.default.table1
>
> Which one is better?

Well, that depends on what you mean by "database." In many other products
each database is completely separate (with it's own cache, processes etc).
In PostgreSQL, there's a cluster of databases, and all of them share the
same cache (shared buffers) etc.

I don't think you'll get performance improvement from running two
PostgreSQL clusters (one for DB1, one for DB2). And when running two
databases within the same cluster, there's no measurable performance
difference AFAIK.

So the two options are exactly the same.

Tomas

Re: Which gives good performance? separate database vs separate schema

From

Thomas Kellerer

Date:

25 November 2010, 08:03:52

Divakar Singh, 25.11.2010 12:37:
> Hello Friends,
> I have many instances of my software running on a server (Solaris SPARC). Each software instance requires some DB
tables(same DDL for all instances' tables) to store data. 
> It essentially means that some processes from each instance of the software connect to these tables.
> Now, should I put these tables in 1 Database's different schemas or in separate databases itself for good
performance?
> I am using libpq for connection.
>

I don't think it will make a big difference in performance.

The real question is: do you need queries that "cross boundaries"? If that is the case you have to use schema, because
Postgresdoes not support cross-database queries. 

Regards
Thomas

Re: Which gives good performance? separate database vs separate schema

From

tv@fuzzy.cz

Date:

25 November 2010, 08:09:14

> I don't think it will make a big difference in performance.
>
> The real question is: do you need queries that "cross boundaries"? If that
> is the case you have to use schema, because Postgres does not support
> cross-database queries.

Well, there's dblink contrib module, but that won't improve performance.

Tomas

Re: Which gives good performance? separate database vs separate schema

From

Andres Freund

Date:

25 November 2010, 08:10:28

On Thursday 25 November 2010 13:02:08 tv@fuzzy.cz wrote:
> I don't think you'll get performance improvement from running two
> PostgreSQL clusters (one for DB1, one for DB2). And when running two
> databases within the same cluster, there's no measurable performance
> difference AFAIK.
That one is definitely not true in many circumstances. As soon as you start to
hit contention (shared memory, locks) you may very well be better of with two
separate clusters.

Andres

Re: Which gives good performance? separate database vs separate schema

From

tv@fuzzy.cz

Date:

25 November 2010, 08:25:41

> On Thursday 25 November 2010 13:02:08 tv@fuzzy.cz wrote:
>> I don't think you'll get performance improvement from running two
>> PostgreSQL clusters (one for DB1, one for DB2). And when running two
>> databases within the same cluster, there's no measurable performance
>> difference AFAIK.
> That one is definitely not true in many circumstances. As soon as you
> start to
> hit contention (shared memory, locks) you may very well be better of with
> two
> separate clusters.
>
> Andres
>
Good point, I forgot about that. Anyway it's hard to predict what kind of
performance issue he's facing and whether two clusters would fix it.

regards
Tomas

Re: Which gives good performance? separate database vs separate schema

From

Divakar Singh

Date:

25 November 2010, 10:53:49

I am not facing any issues, but yes I want to have optimal performance for SELECT and INSERT, especially when I am doing these ops repeatedly.
Actually I am porting from Oracle to PG. Oracle starts a lot of processes when it needs to run many schemas. I do not think PG would need much more resources (mem, cpu) if I go for different database for each process..? Also, is there any limit on number of databases I can start using a PG server?

Best Regards,
Divakar

From: "tv@fuzzy.cz" <tv@fuzzy.cz>
To: Andres Freund <andres@anarazel.de>
Cc: pgsql-performance@postgresql.org; tv@fuzzy.cz; Divakar Singh <dpsmails@yahoo.com>
Sent: Thu, November 25, 2010 5:55:33 PM
Subject: Re: [PERFORM] Which gives good performance? separate database vs separate schema

> On Thursday 25 November 2010 13:02:08 tv@fuzzy.cz wrote:
>> I don't think you'll get performance improvement from running two
>> PostgreSQL clusters (one for DB1, one for DB2). And when running two
>> databases within the same cluster, there's no measurable performance
>> difference AFAIK.
> That one is definitely not true in many circumstances. As soon as you
> start to
> hit contention (shared memory, locks) you may very well be better of with
> two
> separate clusters.
>
> Andres
>
Good point, I forgot about that. Anyway it's hard to predict what kind of
performance issue he's facing and whether two clusters would fix it.

regards
Tomas

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: Which gives good performance? separate database vs separate schema

From

tv@fuzzy.cz

Date:

25 November 2010, 11:46:43

> I am not facing any issues, but yes I want to have optimal performance for
> SELECT and INSERT, especially when I am doing these ops repeatedly.
> Actually I am porting from Oracle to PG. Oracle starts a lot of processes
> when
> it needs to run many schemas. I do not think PG would need much more
> resources
> (mem, cpu) if I go for different database for each process..? Also, is
> there any
> limit on number of databases I can start using a PG server?

Hm, I would try to run that using single cluster, and only if that does
not perform well I'd try multiple clusters. Yes, Oracle starts a lot of
processes for an instance, and then some processes for each connection.

But again - in PostgreSQL, you do not start databases. You start a
cluster, containing databases and then there are connections. This is
similar to Oracle where you start instances (something like cluster in
PostgreSQL) containing schemas (something like databases in PostgreSQL).
And then you create connections, which is the object consuming processes
and memory.

PostgreSQL will create one process for each connection (roughly the same
as Oracle in case of dedicated server). And yes, the number of connections
is limited - see max_connections parameter in postgresql.conf.

Tomas

>
>
>  Best Regards,
> Divakar
>
>
>
>
> ________________________________
> From: "tv@fuzzy.cz" <tv@fuzzy.cz>
> To: Andres Freund <andres@anarazel.de>
> Cc: pgsql-performance@postgresql.org; tv@fuzzy.cz; Divakar Singh
> <dpsmails@yahoo.com>
> Sent: Thu, November 25, 2010 5:55:33 PM
> Subject: Re: [PERFORM] Which gives good performance? separate database vs
> separate schema
>
>> On Thursday 25 November 2010 13:02:08 tv@fuzzy.cz wrote:
>>> I don't think you'll get performance improvement from running two
>>> PostgreSQL clusters (one for DB1, one for DB2). And when running two
>>> databases within the same cluster, there's no measurable performance
>>> difference AFAIK.
>> That one is definitely not true in many circumstances. As soon as you
>> start to
>> hit contention (shared memory, locks) you may very well be better of
>> with
>> two
>> separate clusters.
>>
>> Andres
>>
> Good point, I forgot about that. Anyway it's hard to predict what kind of
> performance issue he's facing and whether two clusters would fix it.
>
> regards
> Tomas
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>
>
>
>

Re: Which gives good performance? separate database vs separate schema

From

Robert Klemme

Date:

26 November 2010, 07:38:53

On Thu, Nov 25, 2010 at 4:46 PM,  <tv@fuzzy.cz> wrote:
>> I am not facing any issues, but yes I want to have optimal performance for
>> SELECT and INSERT, especially when I am doing these ops repeatedly.
>> Actually I am porting from Oracle to PG. Oracle starts a lot of processes
>> when
>> it needs to run many schemas. I do not think PG would need much more
>> resources
>> (mem, cpu) if I go for different database for each process..? Also, is
>> there any
>> limit on number of databases I can start using a PG server?
>
> Hm, I would try to run that using single cluster, and only if that does
> not perform well I'd try multiple clusters. Yes, Oracle starts a lot of
> processes for an instance, and then some processes for each connection.
>
> But again - in PostgreSQL, you do not start databases. You start a
> cluster, containing databases and then there are connections. This is
> similar to Oracle where you start instances (something like cluster in
> PostgreSQL) containing schemas (something like databases in PostgreSQL).
> And then you create connections, which is the object consuming processes
> and memory.
>
> PostgreSQL will create one process for each connection (roughly the same
> as Oracle in case of dedicated server). And yes, the number of connections
> is limited - see max_connections parameter in postgresql.conf.

I think this is a pretty common trade off that is frequently made:
basically the question is whether one wants to reserve resources or
share resources.  In this case resources would be memory and maybe
also disk IO.  With two separate clusters each one has its own memory.
 Which means that if one instance is idle and the other one has high
load then the idle instance's memory cannot be used by the other one.
With a single cluster all the memory is shared which has the downside
that high load of one instance can affect the other instance's memory.

It depends on the usage patterns (load) and the user's policy which
way to go.  Since the OP mentioned "many instances" the aspect of
overhead of many instances (even if idle) may come into play as well.
Plus, a single cluster is likely easier to administer than multiple.
But of course the more DB there are in a single cluster the higher the
likeliness of bottlenecks (see the other thread "Performance under
contention").

Kind regards

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/