Re: [HACKERS] pg_dump and thousands of schemas - Mailing list pgsql-performance

From Denis
Subject Re: [HACKERS] pg_dump and thousands of schemas
Date
Msg-id 1352284972611-5730998.post@n5.nabble.com
Whole thread Raw
In response to Re: [HACKERS] pg_dump and thousands of schemas  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [HACKERS] pg_dump and thousands of schemas
List pgsql-performance
Tom Lane-2 wrote
> Denis <

> socsam@

> > writes:
>> Here is the output of EXPLAIN ANALYZE. It took 5 seconds but usually it
>> takes from 10 to 15 seconds when I am doing backup.
>
>> Sort  (cost=853562.04..854020.73 rows=183478 width=219) (actual
>> time=5340.477..5405.604 rows=183924 loops=1)
>
> Hmmm ... so the problem here isn't that you've got 2600 schemas, it's
> that you've got 183924 tables.  That's going to take some time no matter
> what.
>
> It does seem like we could make some small changes to optimize that
> query a little bit, but they're not going to result in any amazing
> improvement overall, because pg_dump still has to deal with all the
> tables it's getting back.  Fundamentally, I would ask whether you really
> need so many tables.  It seems pretty likely that you have lots and lots
> of basically-identical tables.  Usually it would be better to redesign
> such a structure into fewer tables with more index columns.
>
>> Here is the output of "pg_dump -s"  test.dump
>> <http://postgresql.1045698.n5.nabble.com/file/n5730877/test.dump>
>
> This dump contains only 1 schema and 43 tables, so I don't think it's
> for the database you're having trouble with ...
>
>             regards, tom lane
>
>
> --
> Sent via pgsql-performance mailing list (

> pgsql-performance@

> )
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance

I wonder why pg_dump has to have deal with all these 183924 tables, if I
specified to dump only one scheme: "pg_dump -n schema_name" or even like
this to dump just one table "pg_dump -t 'schema_name.comments' "  ?

We have a web application where we create a schema with a number of tables
in it for each customer. This architecture was chosen to ease the process of
backup/restoring data. Sometimes clients ask us to restore data for the last
month or roll back to last week's state. This task is easy to accomplish
then the client's data is isolated in a schema/DB. If we put all the clients
data in one table - operations of this kind will be much harder to perform.
We will have to restore a huge DB with an enormously large tables in it to
find the requested data.
Different clients have different activity rate and we can select different
backup strategies according to it. This would be impossible in case we keep
all the clients data in one table.
Besides all the above mentioned, the probability of massive data corruption
(if an error in our web application occurs) is much higher.





--
View this message in context:
http://postgresql.1045698.n5.nabble.com/pg-dump-and-thousands-of-schemas-tp5709766p5730998.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


pgsql-performance by date:

Previous
From: Mark Kirkwood
Date:
Subject: Re: HT on or off for E5-26xx ?
Next
From: Rodrigo Rosenfeld Rosas
Date:
Subject: Re: Query completed in < 1s in PG 9.1 and ~ 700s in PG 9.2