Re: Further pg_upgrade analysis for many tables - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Further pg_upgrade analysis for many tables
Date
Msg-id CA+TgmoZrCiZjM03H72rGjBOGbqRUtLf13AbqSqR5Ur5NASD45A@mail.gmail.com
Whole thread Raw
In response to Re: Further pg_upgrade analysis for many tables  (Jeff Janes <jeff.janes@gmail.com>)
List pgsql-hackers
On Fri, Nov 23, 2012 at 5:34 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> On Thu, Nov 15, 2012 at 7:05 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
>> On Wed, Nov 14, 2012 at 11:49 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Jeff Janes <jeff.janes@gmail.com> writes:
>>>> On Thu, Nov 8, 2012 at 9:50 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>>> There are at least three ways we could whack that mole: ...
>>>>>
>>>>> * Keep a separate list (or data structure of your choice) so that
>>>>> relcache entries created in the current xact could be found directly
>>>>> rather than having to scan the whole relcache.  That'd add complexity
>>>>> though, and could perhaps be a net loss for cases where the relcache
>>>>> isn't so bloated.
>>>
>>>> Maybe a static list that can overflow, like the ResourceOwner/Lock
>>>> table one recently added.  The overhead of that should be very low.
>>>
>>>> Are the three places where "need_eoxact_work = true;" the only places
>>>> where things need to be added to the new structure?
>>>
>>> Yeah.  The problem is not so much the number of places that do that,
>>> as that places that flush entries from the relcache would need to know
>>> to remove them from the separate list, else you'd have dangling
>>> pointers.
>>
>> If the list is of hash-tags rather than pointers, all we would have to
>> do is ignore entries that are not still in the hash table, right?
>>
>
> I've attached a proof-of-concept patch to implement this.
>
> I got rid of need_eoxact_work entirely and replaced it with a short
> list that fulfills the functions of indicating that work is needed,
> and suggesting which rels might need that work.  There is no attempt
> to prevent duplicates, nor to remove invalidated entries from the
> list.   Invalid entries are skipped when the hash entry is not found,
> and processing is idempotent so duplicates are not a problem.
>
> Formally speaking, if MAX_EOXACT_LIST were 0, so that the list
> overflowed the first time it was accessed, then it would be identical
> to the current behavior or having only a flag.  So formally all I did
> was increase the max from 0 to 10.
>
> I wasn't so sure about the idempotent nature of Sub transaction
> processing, so I chickened out and left that part alone.  I know of no
> workflow for which that was a bottleneck.
>
> AtEOXact_release is oddly indented because that makes the patch
> smaller and easier to read.
>
> This makes the non "-1" restore of large dumps very much faster (and
> makes them faster than "-1" restores, as well)
>
> I added a "create temp table foo (x integer) on commit drop;" line to
> the default pgbench transaction and tested that.  I was hoping to see
> a performance improvement there was well (the transaction has ~110
> entries in the RelationIdCache at eoxact each time), but the
> performance was too variable (probably due to the intense IO it
> causes) to detect any changes.  At least it is not noticeably slower.
> If I hack pgbench to bloat the RelationIdCache by touching 20,000
> useless tables as part of the connection start up process, then this
> patch does show a win.
>
> It is not obvious what value to set the MAX list size to.  Since this
> array is only allocated once per back-end, and since it not groveled
> through to invalidate relations at each invalidation, there is no
> particular reason it must be small.  But if the same table is assigned
> new filenodes (or forced index lists, whatever those are) repeatedly
> within a transaction, the list could become bloated with replicate
> entries, potentially becoming even larger than the hash table whose
> scan it is intended to short-cut.
>
> In any event, 10 seems to be large enough to overcome the currently
> known bottle-neck.  Maybe 100 would be a more principled number, as
> that is about where the list could start to become as big as the basal
> size of the RelationIdCache table.
>
> I don't think this patch replaces having some mechanism for
> restricting how large RelationIdCache can get or how LRU entries in it
> can get as Robert suggested.  But this approach seems like it is
> easier to implement and agree upon; and doesn't preclude doing other
> optimizations later.

I haven't reviewed this terribly closely, but I think this is likely
worth pursuing.  I see you already added it to the next CommitFest,
which is good.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: change in LOCK behavior
Next
From: Andres Freund
Date:
Subject: Re: Upcoming back-branch releases