Re: Further pg_upgrade analysis for many tables - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Further pg_upgrade analysis for many tables
Date
Msg-id CA+TgmoZ383n+06T1Em1UGPSP276=uQM_Vbf6ZCHsA_7nYXpf3g@mail.gmail.com
Whole thread Raw
In response to Re: Further pg_upgrade analysis for many tables  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Further pg_upgrade analysis for many tables  (Simon Riggs <simon@2ndQuadrant.com>)
List pgsql-hackers
On Fri, Nov 9, 2012 at 12:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Jeff Janes <jeff.janes@gmail.com> writes:
>> Are sure the server you are dumping out of is head?
>
> I experimented a bit with dumping/restoring 16000 tables matching
> Bruce's test case (ie, one serial column apiece).  The pg_dump profile
> seems fairly flat, without any easy optimization targets.  But
> restoring the dump script shows a rather interesting backend profile:
>
> samples  %        image name               symbol name
> 30861    39.6289  postgres                 AtEOXact_RelationCache
> 9911     12.7268  postgres                 hash_seq_search
> 2682      3.4440  postgres                 init_sequence
> 2218      2.8482  postgres                 _bt_compare
> 2120      2.7223  postgres                 hash_search_with_hash_value
> 1976      2.5374  postgres                 XLogInsert
> 1429      1.8350  postgres                 CatalogCacheIdInvalidate
> 1282      1.6462  postgres                 LWLockAcquire
> 973       1.2494  postgres                 LWLockRelease
> 702       0.9014  postgres                 hash_any
>
> The hash_seq_search time is probably mostly associated with
> AtEOXact_RelationCache, which is run during transaction commit and scans
> the relcache hashtable looking for tables created in the current
> transaction.  So that's about 50% of the runtime going into that one
> activity.
>
> There are at least three ways we could whack that mole:
>
> * Run the psql script in --single-transaction mode, as I was mumbling
> about the other day.  If we were doing AtEOXact_RelationCache only once,
> rather than once per CREATE TABLE statement, it wouldn't be a problem.
> Easy but has only a narrow scope of applicability.
>
> * Keep a separate list (or data structure of your choice) so that
> relcache entries created in the current xact could be found directly
> rather than having to scan the whole relcache.  That'd add complexity
> though, and could perhaps be a net loss for cases where the relcache
> isn't so bloated.
>
> * Limit the size of the relcache (eg by aging out
> not-recently-referenced entries) so that we aren't incurring O(N^2)
> costs for scripts touching N tables.  Again, this adds complexity and
> could be counterproductive in some scenarios.

Although there may be some workloads that access very large numbers of
tables repeatedly, I bet that's not typical.  Rather, I bet that a
session which accesses 10,000 tables is most likely to access them
just once each - and right now we don't handle that case very well;
this is not the first complaint about big relcaches causing problems.
On the flip side, we don't want workloads that exceed some baked-in
cache size to fall off a cliff.  So I think we should be looking for a
solution that doesn't put a hard limit on the size of the relcache,
but does provide at least some latitude to get rid of old entries.

So maybe something like this.  Add a flag to each relcache entry
indicating whether or not it has been used.  After adding 1024 entries
to the relcache, scan all the entries: clear the flag if it's set,
flush the entry if it's already clear.  This allows the size of the
relcache to grow without bound, but only if we're continuing to access
the old tables in between adding new ones to the mix.  As an
additional safeguard, we could count the number of toplevel SQL
commands that have been executed and require that a flush not be
performed more often than, say, every 64 toplevel SQL commands.  That
way, if a single operation on an inheritance parent with many children
sucks a lot of stuff into the relcache, we'll avoid cleaning it out
too quickly.

Maybe this is all too ad-hoc, but I feel like we don't need to
overengineer this.  The existing system is fine in 99% of the cases,
so we really only need to find a way to detect the really egregious
case where we are doing a neverending series of one-time table
accesses and apply a very light tap to avoid the pain point in that
case.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Doc patch: Document names of automatically created constraints and indexes
Next
From: Bruce Momjian
Date:
Subject: Re: Further pg_upgrade analysis for many tables