Re: hint bit cache v5 - Mailing list pgsql-hackers
From | Merlin Moncure |
---|---|
Subject | Re: hint bit cache v5 |
Date | |
Msg-id | BANLkTinR7UTuUmcZuc4SksDrKFU4zYEXwQ@mail.gmail.com Whole thread Raw |
In response to | Re: hint bit cache v5 (Simon Riggs <simon@2ndQuadrant.com>) |
List | pgsql-hackers |
On Wed, May 11, 2011 at 12:40 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Wed, May 11, 2011 at 4:38 PM, Merlin Moncure <mmoncure@gmail.com> wrote: > >> Following are results that are fairly typical of the benefits you >> might see when the optimization kicks in. The attached benchmark just > >> [hbcache] >> real 3m35.549s > >> [HEAD] >> real 4m24.216s > > These numbers look very good. Thanks for responding to my request. > > What people have said historically at this point is "ah, but you've > just deferred the pain from clog lookups". Deferred, or eliminated. If any tuple on the page gets updated, deleted, etc or the the table itself is dropped then you've 'won'...the page with rhw hint bit only change was never booted out to the heap before another substantive change happened. This is exactly what happens in certain common workloads -- you insert a bunch of records, scan them with some batch process, then delete them. Let's say a million records were inserted under a single transaction and you are getting bounced out of the transam.c cache, you just made a million calls to TransactionIdIsCurrentTransactionId and (especially) TransactionIdIsInProgress for the *exact same* transaction_id, over and over. That stuff adds up even before looking at the i/o incurred. Put another way, the tuple hint bits have a lot of usefulness when the tuples on the page are coming from all kinds of differently aged transactions. When all the tuples have the same or similar xid, the information value is quite low, and the i/o price isn't worth it. The cache neatly haircuts the downside case. If the cache isn't helping (any tuple fetch on the page faults through it), the page is dirtied and the next time it's fetched all the bits will be set. > The best way to show this does what we hope is to run a normal-ish > OLTP access to the table that would normally thrash the clog and show > no ill effects there either. Run that immediately after the above > tests so that the cache and hint bits are both primed. yeah. the only easy way I know of to do this extremely long pgbench runs, and getting good results is harder than it sounds...if the tuple hint bits make it to disk (via vacuum or a cache fault), they stay there and that tuple is no longer interesting from the cache point of view. If you make the scale really large the test will just take forever just to get the tables primed (like a week). Keep in mind, autovacuum can roll around at any time and set the bits under you (you can of course disable it, but who really does than on OLTP?). Small scale oltp tests are not real world realistic because anybody sane would just let autovacuum loose on the table. clog thrashing systems are typically mature, high load oltp databases...not fun to test on your single 7200 rpm drive. I'm going to boldly predict that with all the i/o flying around in cases like that, the paltry cpu cycles spent dealing with the cache are the least of your problems. Not discounting the need to verify that though. merlin
pgsql-hackers by date: