Re: Further pg_upgrade analysis for many tables - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: Further pg_upgrade analysis for many tables |
Date | |
Msg-id | 722.1352440234@sss.pgh.pa.us Whole thread Raw |
In response to | Re: Further pg_upgrade analysis for many tables (Jeff Janes <jeff.janes@gmail.com>) |
Responses |
Re: Further pg_upgrade analysis for many tables
Re: Further pg_upgrade analysis for many tables Re: Further pg_upgrade analysis for many tables Re: Further pg_upgrade analysis for many tables Re: Further pg_upgrade analysis for many tables |
List | pgsql-hackers |
Jeff Janes <jeff.janes@gmail.com> writes: > Are sure the server you are dumping out of is head? I experimented a bit with dumping/restoring 16000 tables matching Bruce's test case (ie, one serial column apiece). The pg_dump profile seems fairly flat, without any easy optimization targets. But restoring the dump script shows a rather interesting backend profile: samples % image name symbol name 30861 39.6289 postgres AtEOXact_RelationCache 9911 12.7268 postgres hash_seq_search 2682 3.4440 postgres init_sequence 2218 2.8482 postgres _bt_compare 2120 2.7223 postgres hash_search_with_hash_value 1976 2.5374 postgres XLogInsert 1429 1.8350 postgres CatalogCacheIdInvalidate 1282 1.6462 postgres LWLockAcquire 973 1.2494 postgres LWLockRelease 702 0.9014 postgres hash_any The hash_seq_search time is probably mostly associated with AtEOXact_RelationCache, which is run during transaction commit and scans the relcache hashtable looking for tables created in the current transaction. So that's about 50% of the runtime going into that one activity. There are at least three ways we could whack that mole: * Run the psql script in --single-transaction mode, as I was mumbling about the other day. If we were doing AtEOXact_RelationCache only once, rather than once per CREATE TABLE statement, it wouldn't be a problem. Easy but has only a narrow scope of applicability. * Keep a separate list (or data structure of your choice) so that relcache entries created in the current xact could be found directly rather than having to scan the whole relcache. That'd add complexity though, and could perhaps be a net loss for cases where the relcache isn't so bloated. * Limit the size of the relcache (eg by aging out not-recently-referenced entries) so that we aren't incurring O(N^2) costs for scripts touching N tables. Again, this adds complexity and could be counterproductive in some scenarios. regards, tom lane
pgsql-hackers by date: