Thread: Visibility map and hint bits
There has been a lot of recent discussion about the visibility map (for index-only scans) and hint bits (trying to avoid double-writing a table). I wonder if we could fix both of these at the same time. Once the visibility map is reliable, can we use that to avoid updating the hint bits on all rows on a page? For bulk loads, all the pages are going have the same xid and all be visible, so instead of writing the entire table, we just write the visibility map. I think the problem is that we have the PD_ALL_VISIBLE page flag, which requires a write of the page as well. Could we get by with only the visibility bits and remove PD_ALL_VISIBLE? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Thu, May 5, 2011 at 11:59 AM, Bruce Momjian <bruce@momjian.us> wrote: > There has been a lot of recent discussion about the visibility map (for > index-only scans) and hint bits (trying to avoid double-writing a > table). I still think a small tqual.c maintained cache of hint bits will effectively eliminate hint bit i/o issues surrounding bulk loads. Tom fired a shot across the bow regarding the general worthiness of that technique though (see: http://postgresql.1045698.n5.nabble.com/Process-local-hint-bit-cache-td4270229.html) :(. I can rig up a cleaned up version of the patch pretty easily...it's a local change and fairly simple. I don't think there is any way to remove the hint bits without suffering some other problem. merlin
Merlin Moncure wrote: > On Thu, May 5, 2011 at 11:59 AM, Bruce Momjian <bruce@momjian.us> wrote: > > There has been a lot of recent discussion about the visibility map (for > > index-only scans) and hint bits (trying to avoid double-writing a > > table). > > I still think a small tqual.c maintained cache of hint bits will > effectively eliminate hint bit i/o issues surrounding bulk loads. Tom > fired a shot across the bow regarding the general worthiness of that > technique though (see: > http://postgresql.1045698.n5.nabble.com/Process-local-hint-bit-cache-td4270229.html) > :(. I can rig up a cleaned up version of the patch pretty > easily...it's a local change and fairly simple. > > I don't think there is any way to remove the hint bits without > suffering some other problem. Was that the idea that the pages had to fit in the cache and be updated with hint bits before being written to disk? Restricting that to the size of the buffer cache seemed very limiting. One 8k visibilty map page can hold bits for 1/2 gig of heap pages so I thought that would be a better all-visible indictor and avoid many all-visible page writes in bulk load cases. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Thu, May 5, 2011 at 12:59 PM, Bruce Momjian <bruce@momjian.us> wrote: > I wonder if we could fix both of these at the same time. Once the > visibility map is reliable, can we use that to avoid updating the hint > bits on all rows on a page? I don't think so. There are two problems: 1. If there is a long-running transaction on the system, it will not be possible to set PD_ALL_VISIBLE, but hint bits can still be set. So there could be a significant performance regression if we don't set hint bits in that case. 2. Making the visibility map crash-safe will mean making setting hint bits emit XLOG records, so it can't be done on Hot Standby servers at all, and it's much more expensive than just setting a hint bit on the master. > For bulk loads, all the pages are going have the same xid and all be > visible, so instead of writing the entire table, we just write the > visibility map. > > I think the problem is that we have the PD_ALL_VISIBLE page flag, which > requires a write of the page as well. Could we get by with only the > visibility bits and remove PD_ALL_VISIBLE? In some ways, that would make things much simpler. But to make that work, every insert/update/delete to a page would have to pin the visibility map page and clear PD_ALL_VISIBLE if appropriate, so it might not be good from a performance standpoint, especially in high-concurrency workloads. Right now, if PD_ALL_VISIBLE isn't set, we don't bother touching the visibility map page, which seems like a possibly important optimization. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, May 5, 2011 at 1:34 PM, Bruce Momjian <bruce@momjian.us> wrote: > Merlin Moncure wrote: >> On Thu, May 5, 2011 at 11:59 AM, Bruce Momjian <bruce@momjian.us> wrote: >> > There has been a lot of recent discussion about the visibility map (for >> > index-only scans) and hint bits (trying to avoid double-writing a >> > table). >> >> I still think a small tqual.c maintained cache of hint bits will >> effectively eliminate hint bit i/o issues surrounding bulk loads. Tom >> fired a shot across the bow regarding the general worthiness of that >> technique though (see: >> http://postgresql.1045698.n5.nabble.com/Process-local-hint-bit-cache-td4270229.html) >> :(. I can rig up a cleaned up version of the patch pretty >> easily...it's a local change and fairly simple. >> >> I don't think there is any way to remove the hint bits without >> suffering some other problem. > > Was that the idea that the pages had to fit in the cache and be updated > with hint bits before being written to disk? Restricting that to the > size of the buffer cache seemed very limiting. > > One 8k visibilty map page can hold bits for 1/2 gig of heap pages so I > thought that would be a better all-visible indictor and avoid many > all-visible page writes in bulk load cases. no, that was my first idea -- check visibility when you evict. that helps a different problem but not bulk loads. One way it could help is for marking PD_ALL_VISIBLE. This might also be a winner but there is some valid skepticism that adding more work for bgwriter is really a good idea. The tqual cache idea is such that there is a small cache that remembers the commit/cancel status of recently seen transactions. If scan a tuple and the status is known via cache, you set the bit but don't mark the page dirty. That way, if you are scanning a lot of unhinted tuples with similar xid, you don't need to jam out i/o. I think the general concept is clean, but it might need some buy in from tom and some performance testing for justification. The alternate 'cleaner' approach of maintaining larger transam.c cache had some downsides I saw no simple workaround for. merlin
Merlin Moncure <mmoncure@gmail.com> wrote: > a small cache that remembers the commit/cancel status of recently > seen transactions. How is that different from the head of the clog SLRU? -Kevin
On Thu, May 5, 2011 at 2:00 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: > Merlin Moncure <mmoncure@gmail.com> wrote: > >> a small cache that remembers the commit/cancel status of recently >> seen transactions. > > How is that different from the head of the clog SLRU? several things: *) any slru access requires lock (besides the lock itself, you are spending cycles in critical path) *) cache access happens at different stage of processing in HeapTupleSatisfiesMVCC: both TransactionIdIsCurrentTransactionId and TransactionIdIsInProgress have to be checked first. Logically, it's extension of hint bit check itself, not expansion of lower levels of caching *) in tqual.c you can sneak in some small optimizations like only caching the bit if it's known good in the WAL (XlogNeedsFlush). That way you don't need to keep checking it over and over for the same trasaction *) slru level accesses happen too late to give much benefit: I can't stress enough how tight HeapTupleSatisfiesMVCC is. On my workstation VM, each non inline function call shows up measurably in profiling. I think anything you do here has to be inline, hand rolled, and very tight (you can forget anything around dynahash). Delegating the cache management to transam or (even worse) slru level penalizes some workloads non-trivially. merlin
On Thu, May 5, 2011 at 2:20 PM, Merlin Moncure <mmoncure@gmail.com> wrote: > On Thu, May 5, 2011 at 2:00 PM, Kevin Grittner > <Kevin.Grittner@wicourts.gov> wrote: >> Merlin Moncure <mmoncure@gmail.com> wrote: >> >>> a small cache that remembers the commit/cancel status of recently >>> seen transactions. >> >> How is that different from the head of the clog SLRU? > > several things: > *) any slru access requires lock (besides the lock itself, you are > spending cycles in critical path) > *) cache access happens at different stage of processing in > HeapTupleSatisfiesMVCC: both TransactionIdIsCurrentTransactionId and > TransactionIdIsInProgress have to be checked first. Logically, it's > extension of hint bit check itself, not expansion of lower levels of > caching > *) in tqual.c you can sneak in some small optimizations like only > caching the bit if it's known good in the WAL (XlogNeedsFlush). That > way you don't need to keep checking it over and over for the same > trasaction > *) slru level accesses happen too late to give much benefit: > > I can't stress enough how tight HeapTupleSatisfiesMVCC is. On my > workstation VM, each non inline function call shows up measurably in > profiling. I think anything you do here has to be inline, hand > rolled, and very tight (you can forget anything around dynahash). > Delegating the cache management to transam or (even worse) slru level > penalizes some workloads non-trivially. An updated patch is attached. It's still WIP, but I need a little guidance before going further. What I did: *) Added a lot of source level comments that should explain better what's happening and why *) Fixed a significant number of goofs in the earlier patch. *) Reorganized the interaction with HeapTupleSatisfiesMVCC. In particular SetHintBits() is returning if it actually set the bit because I can use that information. What's not done: *) Only commit bits are cached, and caching action is only happening in HeapTupleSatisfiesMVCC. I'm not sure yet if it's better to store invalid bits in the same cache or in a separate one. I'm not sure if the other satisfies routines should also be engaging the cache. Translated from nerd speak, that means I haven't yet done the research to see when they are fired and if they are bottlenecks :-). *) I'd like to reach some sort of consensus with Tom if there is any point in going further in direction. Not so much on how the mechanics of how the cache work, but that it is at the tqual.c level and the changes to HeapTuplesSatisfiesMVCC. In particular. I think caching at transam.c level is a dead end on performance grounds regardless of how you implement the cache. Some points of note: *) Is it acceptable to use static definition of memory like that. If not, should there be a more standard allocation under CacheMemoryContext? *) Testing for the benefit is simple: just create a bunch of records and seqscan the table (select count(*)). Without the patch the first scan is slower and does a bunch of i/o. With it, it does not. *) The cache overhead is *almost* not measurable. As best I can tell we are looking at maybe 1% ish overhead in synthetic scan heavy workloads (i think this is a fair price to pay for all the i/o savings). The degenerate case of repeated 'rollups' is really difficult to generate, even synthetically -- if the cache is performing lousily the regular hint bit action tends to protect it. Performance testing under real workloads is going to give better info here. merlin