Re: Further pg_upgrade analysis for many tables - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Further pg_upgrade analysis for many tables |
Date | |
Msg-id | CA+TgmoZrCiZjM03H72rGjBOGbqRUtLf13AbqSqR5Ur5NASD45A@mail.gmail.com Whole thread Raw |
In response to | Re: Further pg_upgrade analysis for many tables (Jeff Janes <jeff.janes@gmail.com>) |
List | pgsql-hackers |
On Fri, Nov 23, 2012 at 5:34 PM, Jeff Janes <jeff.janes@gmail.com> wrote: > On Thu, Nov 15, 2012 at 7:05 PM, Jeff Janes <jeff.janes@gmail.com> wrote: >> On Wed, Nov 14, 2012 at 11:49 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Jeff Janes <jeff.janes@gmail.com> writes: >>>> On Thu, Nov 8, 2012 at 9:50 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>>>> There are at least three ways we could whack that mole: ... >>>>> >>>>> * Keep a separate list (or data structure of your choice) so that >>>>> relcache entries created in the current xact could be found directly >>>>> rather than having to scan the whole relcache. That'd add complexity >>>>> though, and could perhaps be a net loss for cases where the relcache >>>>> isn't so bloated. >>> >>>> Maybe a static list that can overflow, like the ResourceOwner/Lock >>>> table one recently added. The overhead of that should be very low. >>> >>>> Are the three places where "need_eoxact_work = true;" the only places >>>> where things need to be added to the new structure? >>> >>> Yeah. The problem is not so much the number of places that do that, >>> as that places that flush entries from the relcache would need to know >>> to remove them from the separate list, else you'd have dangling >>> pointers. >> >> If the list is of hash-tags rather than pointers, all we would have to >> do is ignore entries that are not still in the hash table, right? >> > > I've attached a proof-of-concept patch to implement this. > > I got rid of need_eoxact_work entirely and replaced it with a short > list that fulfills the functions of indicating that work is needed, > and suggesting which rels might need that work. There is no attempt > to prevent duplicates, nor to remove invalidated entries from the > list. Invalid entries are skipped when the hash entry is not found, > and processing is idempotent so duplicates are not a problem. > > Formally speaking, if MAX_EOXACT_LIST were 0, so that the list > overflowed the first time it was accessed, then it would be identical > to the current behavior or having only a flag. So formally all I did > was increase the max from 0 to 10. > > I wasn't so sure about the idempotent nature of Sub transaction > processing, so I chickened out and left that part alone. I know of no > workflow for which that was a bottleneck. > > AtEOXact_release is oddly indented because that makes the patch > smaller and easier to read. > > This makes the non "-1" restore of large dumps very much faster (and > makes them faster than "-1" restores, as well) > > I added a "create temp table foo (x integer) on commit drop;" line to > the default pgbench transaction and tested that. I was hoping to see > a performance improvement there was well (the transaction has ~110 > entries in the RelationIdCache at eoxact each time), but the > performance was too variable (probably due to the intense IO it > causes) to detect any changes. At least it is not noticeably slower. > If I hack pgbench to bloat the RelationIdCache by touching 20,000 > useless tables as part of the connection start up process, then this > patch does show a win. > > It is not obvious what value to set the MAX list size to. Since this > array is only allocated once per back-end, and since it not groveled > through to invalidate relations at each invalidation, there is no > particular reason it must be small. But if the same table is assigned > new filenodes (or forced index lists, whatever those are) repeatedly > within a transaction, the list could become bloated with replicate > entries, potentially becoming even larger than the hash table whose > scan it is intended to short-cut. > > In any event, 10 seems to be large enough to overcome the currently > known bottle-neck. Maybe 100 would be a more principled number, as > that is about where the list could start to become as big as the basal > size of the RelationIdCache table. > > I don't think this patch replaces having some mechanism for > restricting how large RelationIdCache can get or how LRU entries in it > can get as Robert suggested. But this approach seems like it is > easier to implement and agree upon; and doesn't preclude doing other > optimizations later. I haven't reviewed this terribly closely, but I think this is likely worth pursuing. I see you already added it to the next CommitFest, which is good. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: