Re: hint bit cache v5 - Mailing list pgsql-hackers

From Merlin Moncure
Subject Re: hint bit cache v5
Date
Msg-id BANLkTinR7UTuUmcZuc4SksDrKFU4zYEXwQ@mail.gmail.com
Whole thread Raw
In response to Re: hint bit cache v5  (Simon Riggs <simon@2ndQuadrant.com>)
List pgsql-hackers
On Wed, May 11, 2011 at 12:40 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Wed, May 11, 2011 at 4:38 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
>
>> Following are results that are fairly typical of the benefits you
>> might see when the optimization kicks in.  The attached benchmark just
>
>> [hbcache]
>> real    3m35.549s
>
>> [HEAD]
>> real    4m24.216s
>
> These numbers look very good. Thanks for responding to my request.
>
> What people have said historically at this point is "ah, but you've
> just deferred the pain from clog lookups".

Deferred, or eliminated. If any tuple on the page gets updated,
deleted, etc or the the table itself is dropped then you've
'won'...the page with rhw hint bit only change was never booted out to
the heap before another substantive change happened.  This is exactly
what happens in certain common workloads -- you insert a bunch of
records, scan them with some batch process, then delete them.  Let's
say a million records were inserted under a single transaction and you
are getting bounced out of the transam.c cache, you just made a
million calls to TransactionIdIsCurrentTransactionId and (especially)
TransactionIdIsInProgress for the *exact same* transaction_id, over
and over.  That stuff adds up even before looking at the i/o incurred.

Put another way, the tuple hint bits have a lot of usefulness when the
tuples on the page are coming from all kinds of differently aged
transactions.  When all the tuples have the same or similar xid, the
information value is quite low, and the i/o price isn't worth it.  The
cache neatly haircuts the downside case.  If the cache isn't helping
(any tuple fetch on the page faults through it), the page is dirtied
and the next time it's fetched all the bits will be set.

> The best way to show this does what we hope is to run a normal-ish
> OLTP access to the table that would normally thrash the clog and show
> no ill effects there either. Run that immediately after the above
> tests so that the cache and hint bits are both primed.

yeah.  the only easy way I know of to do this extremely long pgbench
runs, and getting good results is harder than it sounds...if the tuple
hint bits make it to disk (via vacuum or a cache fault), they stay
there and that tuple is no longer interesting from the cache point of
view.

If you make the scale really large the test will just take forever
just to get the tables primed (like a week).  Keep in mind, autovacuum
can roll around at any time and set the bits under you (you can of
course disable it, but who really does than on OLTP?).  Small scale
oltp tests are not real world realistic because anybody sane would
just let autovacuum loose on the table. clog thrashing systems are
typically mature, high load oltp databases...not fun to test on your
single 7200 rpm drive.

I'm going to boldly predict that with all the i/o flying around in
cases like that, the paltry cpu cycles spent dealing with the cache
are the least of your problems.  Not discounting the need to verify
that though.

merlin


pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: pg_upgrade and PGPORT
Next
From: Peter Eisentraut
Date:
Subject: Tables cannot have INSTEAD OF triggers