Re: Do we need so many hint bits? - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: Do we need so many hint bits? |
Date | |
Msg-id | 20121116150942.GE6505@awork2.anarazel.de Whole thread Raw |
In response to | Do we need so many hint bits? (Jeff Davis <pgsql@j-davis.com>) |
Responses |
Re: Do we need so many hint bits?
|
List | pgsql-hackers |
On 2012-11-15 16:42:57 -0800, Jeff Davis wrote: > Related to discussion here: > http://archives.postgresql.org/message-id/CAHyXU0zn5emePLedoZUGrAQiF92F-YjvFr-P5vUh6n0WpKZ6PQ@mail.gmail.com > > It occurred to me recently that many of the hint bits aren't terribly > important (at least it's not obvious to me). HEAP_XMIN_COMMITTED clearly > has a purpose, and we'd expect it to be used many times following the > initial CLOG lookup. > But the other tuple hint bits seem to be there just for symmetry, > because they shouldn't last long. If HEAP_XMIN_INVALID or > HEAP_XMAX_COMMITTED is set, then it's (hopefully) going to be vacuumed > soon, and gone completely. And if HEAP_XMAX_INVALID is set, then it > should just be changed to InvalidTransactionId. Wrt HEAP_XMAX_COMMITTED: It can take an *awfully* long time till autovacuum crosses the thresholds the next time for a big table. I also think we cannot dismiss the case of longrunning transactions because vacuum won't be able to cleanup those rows in that case. Wrt HEAP_(XMIN|XMAX)_INVALID: yes, if we are in need of new flag bits those sound like a good target to me. > Also, I am wondering about PD_ALL_VISIBLE. It was originally introduced > in the visibility map patch, apparently as a way to know when to clear > the VM bit when doing an update. It was then also used for scans, which > showed a significant speedup. But I wonder: why not just use the > visibilitymap directly from those places? It can be used for the scan > because it is crash safe now (not possible before). And since it's only > one lookup per scanned page, then I don't think it would be a measurable > performance loss there. Inserts/updates/deletes also do a significant > amount of work, so again, I doubt it's a big drop in performance there > -- maybe under a lot of concurrency or something. > > The benefit of removing PD_ALL_VISIBLE would be significantly higher. > It's quite common to load a lot of data, and then do some reads for a > while (setting hint bits and flushing them to disk), and then do a > VACUUM a while later, setting PD_ALL_VISIBLE and writing all of the > pages again. Also, if I remember correctly, Robert went to significant > effort when making the VM crash-safe to keep the PD_ALL_VISIBLE and VM > bits consistent. Maybe this was all discussed before? As far as I understand the code the crash-safety aspects of the visibilitymap currently rely on on having the knowledge that ALL_VISIBLE has been cleared during a heap_(insert|update|delete). That allows management of the visibilitymap without it being xlogged itself which seems pretty important to me. Greetings, Andres Freund --Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
pgsql-hackers by date: