Thousands databases or schemas - Mailing list pgsql-performance

From Denis
Subject Thousands databases or schemas
Date
Msg-id 1352367376704-5731189.post@n5.nabble.com
Whole thread Raw
Responses Re: Thousands databases or schemas
Re: Thousands databases or schemas
Re: Thousands databases or schemas
List pgsql-performance
We have a web application where we create a schema or a database with a
number of tables in it for each customer. Now we have about 2600 clients.

The problem we met using a separate DB for each client is that the creation
of new DB can take up to 2 minutes, that is absolutely unacceptable. Using
schemes instead (one DB with a number of schemes containing similar tables
in it) solved this problem (schemes are created in a couple of seconds), but
created two other blocking points:
1. sometimes creation of a new table in a schema takes up to 5 seconds. In
case when we have create up to 40 tables in a schema this takes way too much
time.
2. "pg_dump -n schema_name db_name" takes from 30 to 60 seconds, no matter
how big is the amount of data in the schema. Also, the dump of the tables
structure only takes at least 30 seconds. Basing on this topic
http://postgresql.1045698.n5.nabble.com/pg-dump-and-thousands-of-schemas-td5709766i60.html,
pg_dump always analyses ALL the tables in the DB, i.e. in my case more than
100 000 tables.

I know you guys will ask me about selecting this particular application
architecture.
This architecture was chosen to ease the process of backup/restoring data
and isolating client's data from each other. Sometimes clients ask us to
restore data for the last month or roll back to last week's state. This task
is easy to accomplish then the client's data is isolated in a schema/DB. If
we put all the clients data in one table - operations of this kind will be
much harder to perform. We will have to restore a huge DB with an enormously
large tables in it to find the requested data. Sometime client even doesn't
remember the exact date, he or she just say "I lost my data somewhere
between Tuesday and Friday last week" and I have to restore backups for
several days. If I have one huge table instead of small tables it will be a
nightmare!

Different clients have different activity rate and we can select different
backup strategies according to it. This would be impossible in case we keep
all the clients data in one table.
Besides all the above mentioned, the probability of massive data corruption
(if an error in our web application occurs) is much higher.


P.S.
Not to start a holywar, but FYI: in a similar project where we used MySQL
now we have about 6000 DBs and everything works like a charm.



--
View this message in context: http://postgresql.1045698.n5.nabble.com/Thousands-databases-or-schemas-tp5731189.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


pgsql-performance by date:

Previous
From: Denis
Date:
Subject: Re: [HACKERS] pg_dump and thousands of schemas
Next
From: Samuel Gendler
Date:
Subject: Re: Thousands databases or schemas