Re: Do we need so many hint bits? - Mailing list pgsql-hackers
From | Jeff Davis |
---|---|
Subject | Re: Do we need so many hint bits? |
Date | |
Msg-id | 1353113930.14335.185.camel@sussancws0025 Whole thread Raw |
In response to | Re: Do we need so many hint bits? (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Do we need so many hint bits?
|
List | pgsql-hackers |
On Fri, 2012-11-16 at 11:58 -0500, Robert Haas wrote: > > Also, I am wondering about PD_ALL_VISIBLE. It was originally introduced > > in the visibility map patch, apparently as a way to know when to clear > > the VM bit when doing an update. It was then also used for scans, which > > showed a significant speedup. But I wonder: why not just use the > > visibilitymap directly from those places? > > Well, you'd have to look up, lock and pin the page to do that. I > suspect that overhead is pretty significant. The benefit of noticing > that the flag is set is that you need not call HeapTupleSatisfiesMVCC > for each tuple on the page: checking one bit in the page header is a > lot cheaper than calling that function for every tuple. However, if > you had to lock and pin a second page in order to check whether the > page is all-visible, I suspect it wouldn't be a win; you'd probably be > better off just doing the HeapTupleSatisfiesMVCC checks for each > tuple. That's pretty easy to test. Here's what I got on a 10M record table (Some runs got some strangely high numbers around 1700ms, which I assume is because it's difficult to keep the data in shared buffers, so I took the lower numbers.): PD_ALL_VISIBLE: 661ms VM Lookup: 667ms Neither: 740ms Even if pinning the vm buffer were slow, we could keep the pin longer during a scan (it seems like the VM API is designed for that kind of a use case), so I don't think scans are a problem at all, even if there is a lot of concurrency. > One of the main advantages of PD_ALL_VISIBLE is that if you do an > insert, update, or delete on a page where that bit isn't set, you need > not lock and pin the visibility map page, because you already know > that the bit will be clear in the visibility map. If the data is > being rapidly modified, you'll get the benefit of this optimization > most of the time, only losing it when vacuum has visited recently. I > hope that's not premature optimization because I sure sweat a lot of > blood last release cycle to keep it working like that. I had a few > doubts at the time about how much we were winning there, but I don't > actually have any hard data either way, so I would be reluctant to > assume it doesn't matter. This is a more plausible concern, because it's per-tuple rather than per-page. However, I think we might be able to save the current VM buffer across multiple calls of heap_insert|update|delete in nodeModifyTable. At first glance, it doesn't look very difficult. It's unfortunate that I didn't think much about this while you were working on it before; but difficult-to-write code is also difficult to maintain, so simplifying it would be nice. > Even if it doesn't, the sequential-scan optimization definitely > matters a LOT, as you can easily verify. I am in no way suggesting that we remove that optimization; I am suggesting that we can get the exact same benefit by looking at the VM directly. > One approach that I've been hoping to pursue is to find a way to make > CLOG lookups cheaper and more concurrent. I started to work on some > concurrent hash table code, which you can find here: > > http://git.postgresql.org/gitweb/?p=users/rhaas/postgres.git;a=shortlog;h=refs/heads/chash > > The concurrency properties of this code are vastly better than what we > have now, but there are cases where it loses vs. dynahash when there's > no concurrency. That might be fixable or just not a big deal, though. > A bigger problem is that I got sucked off into other things before I > was able to get as far with it as I wanted to; in particular, I have > only unit test results for it, and haven't tried to integrate it into > the SLRU code yet. Great work! > But I'm not sure any of this is going to fundamentally chip away at > the need for hint bits all that much. Making CLOG lookups cheaper or > less frequent is all to the good, but the prognosis for improving > things enough that we can dump some or all of the hint bits completely > seems uncertain at best. Even if we COULD dump everything but > heap-xmin-committed, how much would that really help with the > disk-write problem? I bet heap-xmin-committed gets set far more > often than the other three put together. Right now I'm setting my sights on PD_ALL_VISIBLE. I know that causes a lot of extra page dirties, and it seems like it doesn't serve all that much purpose after the VM became crash-safe. And it would simplify the code. If we can't get rid of the other hint bits, then so be it. > > But the other tuple hint bits seem to be there just for symmetry, > > because they shouldn't last long. If HEAP_XMIN_INVALID or > > HEAP_XMAX_COMMITTED is set, then it's (hopefully) going to be vacuumed > > soon, and gone completely. And if HEAP_XMAX_INVALID is set, then it > > should just be changed to InvalidTransactionId. > > However, that solution only works for invalid-xmin. For > committed-xmax, there could actually be quite a long time before the > page can be pruned, because there can be some other backend holding an > old snapshot open. I think I'll set aside this part of the idea for now, because it doesn't seem like a huge win after I thought about it, anyway. If HEAP_XMIN_INVALID or HEAP_XMAX_COMMITTED are set, it's likely to be dirty anyway, unless it's a bulk delete or something. Regards,Jeff Davis
pgsql-hackers by date: