Re: Further pg_upgrade analysis for many tables - Mailing list pgsql-hackers

Jeff Janes <jeff.janes@gmail.com> writes:
> Are sure the server you are dumping out of is head?

I experimented a bit with dumping/restoring 16000 tables matching
Bruce's test case (ie, one serial column apiece).  The pg_dump profile
seems fairly flat, without any easy optimization targets.  But
restoring the dump script shows a rather interesting backend profile:

samples  %        image name               symbol name
30861    39.6289  postgres                 AtEOXact_RelationCache
9911     12.7268  postgres                 hash_seq_search
2682      3.4440  postgres                 init_sequence
2218      2.8482  postgres                 _bt_compare
2120      2.7223  postgres                 hash_search_with_hash_value
1976      2.5374  postgres                 XLogInsert
1429      1.8350  postgres                 CatalogCacheIdInvalidate
1282      1.6462  postgres                 LWLockAcquire
973       1.2494  postgres                 LWLockRelease
702       0.9014  postgres                 hash_any

The hash_seq_search time is probably mostly associated with
AtEOXact_RelationCache, which is run during transaction commit and scans
the relcache hashtable looking for tables created in the current
transaction.  So that's about 50% of the runtime going into that one
activity.

There are at least three ways we could whack that mole:

* Run the psql script in --single-transaction mode, as I was mumbling
about the other day.  If we were doing AtEOXact_RelationCache only once,
rather than once per CREATE TABLE statement, it wouldn't be a problem.
Easy but has only a narrow scope of applicability.

* Keep a separate list (or data structure of your choice) so that
relcache entries created in the current xact could be found directly
rather than having to scan the whole relcache.  That'd add complexity
though, and could perhaps be a net loss for cases where the relcache
isn't so bloated.

* Limit the size of the relcache (eg by aging out
not-recently-referenced entries) so that we aren't incurring O(N^2)
costs for scripts touching N tables.  Again, this adds complexity and
could be counterproductive in some scenarios.
        regards, tom lane



pgsql-hackers by date:

Previous
From: Jesper Krogh
Date:
Subject: Re: Enabling Checksums
Next
From: Ants Aasma
Date:
Subject: Re: Further pg_upgrade analysis for many tables