> Yeah, Jeff's experiments indicated that the remaining bottleneck is lock
> management in the server. What I fixed so far on the pg_dump side
> should be enough to let partial dumps run at reasonable speed even if
> the whole database contains many tables. But if psql is taking
> AccessShareLock on lots of tables, there's still a problem.
Yes, I saw this kind of lines:
29260 2012-05-30 09:39:19 JST LOG: statement: LOCK TABLE public.t10 IN ACCESS SHARE MODE
It seems this is not very efficient query since LOCK TABLE can take
multiple tables as an argument and we could pass as many tables as
possible to one LOCK TABLE query. This way we could reduce the
communication between pg_dump and backend.
Also I noticed lots of queries like these:
29260 2012-05-30 09:39:19 JST LOG: statement: SELECT attname, attacl FROM pg_catalog.pg_attribute WHERE attrelid =
'516391'AND NOT attisdropped AND attacl IS NOT NULL ORDER BY attnum
I guess this is for each table and if there are tones of tables these
queries are major bottle neck as well as LOCK. I think we could
optimize somewhat this in that we issue queries to extract info of
multiple tables rather than extracting only one table inof as current
implementation does.
Or even better we could create a temp table which contains target
table oids to join the query above.
In my opinion, particular use case such as multi tenancy would create
tons of objects in a database cluster and the performance of pg_dump
might be highlighted more in the future.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp