Do we need so many hint bits? - Mailing list pgsql-hackers
From | Jeff Davis |
---|---|
Subject | Do we need so many hint bits? |
Date | |
Msg-id | 1353026577.14335.91.camel@sussancws0025 Whole thread Raw |
Responses |
Re: Do we need so many hint bits?
Re: Do we need so many hint bits? Re: Do we need so many hint bits? Re: Do we need so many hint bits? Re: Do we need so many hint bits? Re: Do we need so many hint bits? |
List | pgsql-hackers |
Related to discussion here: http://archives.postgresql.org/message-id/CAHyXU0zn5emePLedoZUGrAQiF92F-YjvFr-P5vUh6n0WpKZ6PQ@mail.gmail.com It occurred to me recently that many of the hint bits aren't terribly important (at least it's not obvious to me). HEAP_XMIN_COMMITTED clearly has a purpose, and we'd expect it to be used many times following the initial CLOG lookup. But the other tuple hint bits seem to be there just for symmetry, because they shouldn't last long. If HEAP_XMIN_INVALID or HEAP_XMAX_COMMITTED is set, then it's (hopefully) going to be vacuumed soon, and gone completely. And if HEAP_XMAX_INVALID is set, then it should just be changed to InvalidTransactionId. Removing those 3 hints would give us 3 more flag bits (eventually, after we are sure they aren't just leftover), and it would also reduce the chance that a page is dirtied for no other reason than to set them. It might even take a few cycles out of the tqual.c routines, or at least reduce the code size. Not a huge win, but I don't see much downside either. Also, I am wondering about PD_ALL_VISIBLE. It was originally introduced in the visibility map patch, apparently as a way to know when to clear the VM bit when doing an update. It was then also used for scans, which showed a significant speedup. But I wonder: why not just use the visibilitymap directly from those places? It can be used for the scan because it is crash safe now (not possible before). And since it's only one lookup per scanned page, then I don't think it would be a measurable performance loss there. Inserts/updates/deletes also do a significant amount of work, so again, I doubt it's a big drop in performance there -- maybe under a lot of concurrency or something. The benefit of removing PD_ALL_VISIBLE would be significantly higher. It's quite common to load a lot of data, and then do some reads for a while (setting hint bits and flushing them to disk), and then do a VACUUM a while later, setting PD_ALL_VISIBLE and writing all of the pages again. Also, if I remember correctly, Robert went to significant effort when making the VM crash-safe to keep the PD_ALL_VISIBLE and VM bits consistent. Maybe this was all discussed before? All of these hint bits will have a bit more of a performance impact after checksums are introduced (for those that use them in conjunction with large data loads), so I'm looking for some simple ways to mitigate those effects. What kind of worst-case tests could I construct to see if there are worrying performance effects to removing these hint bits? Regards,Jeff Davis
pgsql-hackers by date: