pg_dump with 1100 schemas being a bit slow - Mailing list pgsql-general

From Loic d'Anterroches
Subject pg_dump with 1100 schemas being a bit slow
Date
Msg-id 8e2f2cb20910070351l6150c49eh9c5a44c6f74aedc3@mail.gmail.com
Whole thread Raw
Responses Re: pg_dump with 1100 schemas being a bit slow
Re: pg_dump with 1100 schemas being a bit slow
Re: pg_dump with 1100 schemas being a bit slow
List pgsql-general
Hello,

After a series of sessions to search the web for information, I am
asking the help of people having a bit more knowledge of the internals
of pg_dump to try to solve a performance problem I have. I am running
PostgreSQL version 8.3.8 both server and pg_dump,

The context is a farm hosting of a web application
(http://www.indefero.net) where each installation get its own schema.
Each table in the schema is a "real" table, not a view of the same
table in the public schema with "WHERE schema='currentschema'" clause.
This setup allows me to easily run the web application nearly
unmodified between the downloadable version and the "farm" version
(schemas rock!). Now you have the background.

Each night I am running:
pg_dump --blobs --schema=%s --no-acl -U postgres indefero | gzip >
/path/to/backups/%s/%s-%s.sql.gz
this for each installation, so 1100 times. Substitution strings are to
timestamp and get the right schema.

My problem is that the dump increased steadily with the number of
schemas (now about 20s from about 12s with 850 schemas) and pg_dump is
now ballooning at 120MB of memory usage when running the dump.

The thing is that my overage schema size is a bit more than 1MB, with
the schema just after installation being 850kB.
Max size: 2.8MB
Min size: 0.85MB
Avg: 1.0MB
Total size: ~1GB

To get the size I run the following command with sum off the "size":

SELECT relname, pg_total_relation_size(CAST(relname AS
TEXT)) AS size FROM pg_class AS pgc, pg_namespace AS pgn
     WHERE pg_table_is_visible(pgc.oid) IS TRUE AND relkind = 'r'
     AND pgc.relnamespace = pgn.oid
     AND pgn.nspname NOT IN ('information_schema', 'pg_catalog')';


I think that pg_dump, when looking at the objects to dump, also it is
limited to a given schema, is scanning the complete database in one
those calls:
http://doxygen.postgresql.org/pg__dump_8c-source.html#l00717

Is there an option: "I know what I am doing, do not look outside of
the schema" available which can help in my case? Because running
SELECT is snappy, vmstats shows no swapping and the complete system is
basically running very well with a load average below 1.

So, your help is very welcomed,
loïc

--
Loïc d'Anterroches - Céondo Ltd - http://www.ceondo.com

pgsql-general by date:

Previous
From: Karina Guardado
Date:
Subject: problems with encoding
Next
From: mezgani ali
Date:
Subject: current_query stat is idle