Re: VACUUM in SP-GiST - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: VACUUM in SP-GiST |
Date | |
Msg-id | 26043.1331523168@sss.pgh.pa.us Whole thread Raw |
In response to | Re: VACUUM in SP-GiST (yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi)) |
List | pgsql-hackers |
yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi) writes: >> [ tgl's redesign of spgist vacuum algorithm ] > with a concurrent split moving leaf tuples between pages, can't it make > bulkdelete fail reporting some TIDs Yeah, you're right :-(. I've been chewing on this problem for awhile and I think I see a solution. We only need to worry about cases where leaf tuples have been moved from one page to a lower-numbered page, not about rearrangements of inner pages. If leaf tuples were moved off a leaf page, the mover should have left a REDIRECT tuple behind. So we can fix things by having VACUUM act like a search scan when it comes across a REDIRECT that was made since since the VACUUM scan started. In detail: 1. When we are scanning a leaf page and we find a REDIRECT whose TID shows it was made since VACUUM started, add that TID to a pending-list of tuple chains we need to recheck. (We can't go visit the TID immediately, because we don't want VACUUM holding lock on more than one buffer at a time. Instead, we'll process the pending-list after finishing with the current page.) 2. Between pages of the normal sequential scan of the index, process pending-list entries until they're all gone. 3. To process a pending-list entry, visit and lock the indicated index page. If it's a leaf page, run the normal VACUUM processing on that page, then mark the pending-list entry done. (Note: we could just process the single tuple chain pointed to by the TID, but it seems better to process the whole page, in hopes that we'll only have to change it once and generate one WAL record, rather than possibly process several chains one-at-a-time. However, REDIRECTs should not be added to the pending-list unless they are right at the TID we intended to visit. A redirect in another chain has or will be processed by the regular scan.) If it's an inner page, examine the inner tuple pointed to by the pending-list TID, and add all the downlink TIDs in that inner tuple to the pending list. The possibility that a leaf REDIRECT leads to an inner tuple (as a consequence of a PickSplit operation) means that we can't apply some apparently-obvious optimizations; for example, if a REDIRECT TID points to a page beyond the current sequential scan point, we cannot simply ignore it because it might be pointing at an inner rather than leaf page. (We could possibly drop such a page without processing it once we'd locked it and verified it's a leaf page, but I tend to think we might as well process it if we've gotten that far.) I complained that the original spgist vacuum algorithm might not terminate in the face of continuous concurrent updates, and this one seems to have some risk of similar misbehavior. However, we can fix that with suitable management of the pending list. Specifically, I have in mind that we'll not remove pending-list entries, only mark them "done", until the entire list is "done" and we're ready to resume the main sequential scan; and that we will check for new TIDs being already in the pending-list and not make duplicate entries. This should be enough to prevent any endless loops. This all sounds pretty messy, but in practice the pending list should almost always be short, so there won't be a lot of performance lost. Comments? regards, tom lane
pgsql-hackers by date: