Thread: pg_attribute, pg_class, pg_depend grow huge in count and size withmultiple tenants.

Hi experts,

Our application serves multiple tenants. Each tenant has the schema with a few hundreds of tables and few functions.
We have 2000 clients so we have to create 2000 schemas in a single database.

While doing this, i observed that the catalog tables pg_attribute, pg_class, pg_depend grow huge in count and size.

Do you think this will be a challenge during execution of every query ? 

When Postgres parses an sql to find the best execution plan, does it scan any of these catalogs that could eventually take more time? 

Any other challenges you have come across or foresee in such cases ? 

Thanks,
Sammy. 

 
On Thu, May 7, 2020 at 1:05 PM samhitha g <samhithagarudadri@gmail.com> wrote:
Our application serves multiple tenants. Each tenant has the schema with a few hundreds of tables and few functions.
We have 2000 clients so we have to create 2000 schemas in a single database.

That is one option but I wouldn't say you must.  If you cannot get individual tables to be multi-tenant you are probably better off having one database per client on a shared cluster - at least given the size of the schema and number of clients.

David J.

Hi,

On Thu, May 7, 2020 at 5:18 PM David G. Johnston <david.g.johnston@gmail.com> wrote:
On Thu, May 7, 2020 at 1:05 PM samhitha g <samhithagarudadri@gmail.com> wrote:
Our application serves multiple tenants. Each tenant has the schema with a few hundreds of tables and few functions.
We have 2000 clients so we have to create 2000 schemas in a single database.

That is one option but I wouldn't say you must.  If you cannot get individual tables to be multi-tenant you are probably better off having one database per client on a shared cluster - at least given the size of the schema and number of clients.
I am working on a similar problem.
1 database per each client may be a killer when you have a connection pooler that creates a pool for a unique combination of (user,database). 

David J.



--
Regards,
Avinash Vallarapu
+1-902-221-5976
On 07/05/20, Avinash Kumar (avinash.vallarapu@gmail.com) wrote:
> >> Our application serves multiple tenants. Each tenant has the schema
> >> with a few hundreds of tables and few functions.
> >> We have 2000 clients so we have to create 2000 schemas in a single
> >> database.

> > That is one option but I wouldn't say you must.  If you cannot get
> > individual tables to be multi-tenant you are probably better off having one
> > database per client on a shared cluster - at least given the size of the
> > schema and number of clients.
> >
> I am working on a similar problem.
> 1 database per each client may be a killer when you have a connection
> pooler that creates a pool for a unique combination of (user,database).

One of our clusters has well over 500 databases fronted by pg_bouncer.

We get excellent connection "flattening" using pg_bouncer with
per-database connection spikes dealt with through a reserve pool.

The nice thing about separate databases is that it is easy to scale
horizontally.

Rory



Hi,

On Thu, May 7, 2020 at 6:08 PM Rory Campbell-Lange <rory@campbell-lange.net> wrote:
On 07/05/20, Avinash Kumar (avinash.vallarapu@gmail.com) wrote:
> >> Our application serves multiple tenants. Each tenant has the schema
> >> with a few hundreds of tables and few functions.
> >> We have 2000 clients so we have to create 2000 schemas in a single
> >> database.

> > That is one option but I wouldn't say you must.  If you cannot get
> > individual tables to be multi-tenant you are probably better off having one
> > database per client on a shared cluster - at least given the size of the
> > schema and number of clients.
> >
> I am working on a similar problem.
> 1 database per each client may be a killer when you have a connection
> pooler that creates a pool for a unique combination of (user,database).

One of our clusters has well over 500 databases fronted by pg_bouncer.

We get excellent connection "flattening" using pg_bouncer with
per-database connection spikes dealt with through a reserve pool.
What if you see at least 4 connections being established by each client during peak ? And if you serve 4 or 2  connections per each DB, then you are creating 1000 or more reserved connections with 500 DBs in a cluster. 

The nice thing about separate databases is that it is easy to scale
horizontally.
Agreed. But, how about autovacuum ? Workers shift from DB to DB and 500 clusters means you may have to have a lot of manual vacuuming in place as well. 

Rory


--
Regards,
Avinash Vallarapu
+1-902-221-5976
On Thu, 2020-05-07 at 18:17 -0300, Avinash Kumar wrote:
> > The nice thing about separate databases is that it is easy to scale
> > horizontally.
> 
> Agreed. But, how about autovacuum ? Workers shift from DB to DB and 500 clusters
> means you may have to have a lot of manual vacuuming in place as well.

Just set "autovacuum_max_workers" higher.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com




Hi,

On Fri, May 8, 2020 at 3:31 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Thu, 2020-05-07 at 18:17 -0300, Avinash Kumar wrote:
> > The nice thing about separate databases is that it is easy to scale
> > horizontally.
>
> Agreed. But, how about autovacuum ? Workers shift from DB to DB and 500 clusters
> means you may have to have a lot of manual vacuuming in place as well.

Just set "autovacuum_max_workers" higher.
No, that wouldn't help. If you just increase autovacuum_max_workers, the total cost limit of autovacuum_vacuum_cost_limit (or vacuum_cost_limit) is shared by so many workers and it further delays autovacuum per each worker. Instead you need to increase autovacuum_vacuum_cost_limit as well when you increase the number of workers. But, if you do that and also increase workers, well, you would easily reach the limitations of the disk. I am not sure it is anywhere advised to have 20 autovacuum_max_workers unless i have a disk with lots of IOPS and with very tiny tables across all the databases. 

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com



--
Regards,
Avinash Vallarapu
On Fri, 2020-05-08 at 03:47 -0300, Avinash Kumar wrote:
> > Just set "autovacuum_max_workers" higher.
> 
> No, that wouldn't help. If you just increase autovacuum_max_workers, the total cost limit of
> autovacuum_vacuum_cost_limit (or vacuum_cost_limit) is shared by so many workers and it
> further delays autovacuum per each worker. Instead you need to increase autovacuum_vacuum_cost_limit
> as well when you increase the number of workers.

True, I should have mentioned that.

> But, if you do that and also increase workers, well, you would easily reach the limitations
> of the disk. I am not sure it is anywhere advised to have 20 autovacuum_max_workers unless
> i have a disk with lots of IOPS and with very tiny tables across all the databases.

Sure, if you have a high database load, you will at some point exceed the limits of
the machine, which is not surprising.  What I am trying to say is that you have to ramp
up the resources for autovacuum together with increasing the overall workload.
You should consider autovacuum as part of that workload.

If your machine cannot cope with the workload any more, you have to scale, which
is easily done by adding more machines if you have many databases.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com




Hi,

On Fri, May 8, 2020 at 3:53 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Fri, 2020-05-08 at 03:47 -0300, Avinash Kumar wrote:
> > Just set "autovacuum_max_workers" higher.
>
> No, that wouldn't help. If you just increase autovacuum_max_workers, the total cost limit of
> autovacuum_vacuum_cost_limit (or vacuum_cost_limit) is shared by so many workers and it
> further delays autovacuum per each worker. Instead you need to increase autovacuum_vacuum_cost_limit
> as well when you increase the number of workers.

True, I should have mentioned that.

> But, if you do that and also increase workers, well, you would easily reach the limitations
> of the disk. I am not sure it is anywhere advised to have 20 autovacuum_max_workers unless
> i have a disk with lots of IOPS and with very tiny tables across all the databases.

Sure, if you have a high database load, you will at some point exceed the limits of
the machine, which is not surprising.  What I am trying to say is that you have to ramp
up the resources for autovacuum together with increasing the overall workload.
You should consider autovacuum as part of that workload.

If your machine cannot cope with the workload any more, you have to scale, which
is easily done by adding more machines if you have many databases.
Agreed. Getting back to the original question asked by Sammy, i think it is still bad to create 2000 databases for storing 2000 clients/(schemas) for a multi-tenant setup.

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com



--
Regards,
Avinash Vallarapu
Hi all,

Since we are talking about multi-tenant databases, the citus extension fits in neatly with that using horizontal partitioning/shards. 

Regards,
Michael Vitale

Avinash Kumar wrote on 5/8/2020 6:14 AM:
Hi,

On Fri, May 8, 2020 at 3:53 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Fri, 2020-05-08 at 03:47 -0300, Avinash Kumar wrote:
> > Just set "autovacuum_max_workers" higher.
>
> No, that wouldn't help. If you just increase autovacuum_max_workers, the total cost limit of
> autovacuum_vacuum_cost_limit (or vacuum_cost_limit) is shared by so many workers and it
> further delays autovacuum per each worker. Instead you need to increase autovacuum_vacuum_cost_limit
> as well when you increase the number of workers.

True, I should have mentioned that.

> But, if you do that and also increase workers, well, you would easily reach the limitations
> of the disk. I am not sure it is anywhere advised to have 20 autovacuum_max_workers unless
> i have a disk with lots of IOPS and with very tiny tables across all the databases.

Sure, if you have a high database load, you will at some point exceed the limits of
the machine, which is not surprising.  What I am trying to say is that you have to ramp
up the resources for autovacuum together with increasing the overall workload.
You should consider autovacuum as part of that workload.

If your machine cannot cope with the workload any more, you have to scale, which
is easily done by adding more machines if you have many databases.
Agreed. Getting back to the original question asked by Sammy, i think it is still bad to create 2000 databases for storing 2000 clients/(schemas) for a multi-tenant setup.

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com



--
Regards,
Avinash Vallarapu

On Thu, May 7, 2020 at 4:05 PM samhitha g <samhithagarudadri@gmail.com> wrote:
Hi experts,

Our application serves multiple tenants. Each tenant has the schema with a few hundreds of tables and few functions.
We have 2000 clients so we have to create 2000 schemas in a single database.

While doing this, i observed that the catalog tables pg_attribute, pg_class, pg_depend grow huge in count and size.

Please attach numbers to "huge".  We don't know what "huge" means to you.

"2000  * a few hundred" tables is certainly getting to the point where it makes sense to be concerned.  But my concern would be more about backup and recovery, version upgrades, pg_dump, etc. not about daily operations.

Cheers,

Jeff
On Thu, May 7, 2020 at 5:17 PM Avinash Kumar <avinash.vallarapu@gmail.com> wrote:
Hi,

On Thu, May 7, 2020 at 6:08 PM Rory Campbell-Lange <rory@campbell-lange.net> wrote:
One of our clusters has well over 500 databases fronted by pg_bouncer.

We get excellent connection "flattening" using pg_bouncer with
per-database connection spikes dealt with through a reserve pool.
What if you see at least 4 connections being established by each client during peak ? And if you serve 4 or 2  connections per each DB, then you are creating 1000 or more reserved connections with 500 DBs in a cluster. 

Does every database spike at the same time?
 

The nice thing about separate databases is that it is easy to scale
horizontally.
Agreed. But, how about autovacuum ? Workers shift from DB to DB and 500 clusters means you may have to have a lot of manual vacuuming in place as well.

Why would having difference schemas in different DBs change your manual vacuuming needs?  And if anything, having separate DBs will make autovacuuming more efficient, as it keeps the statistics collectors stats files smaller.
 
Cheers,

Jeff