Re: partial heap only tuples - Mailing list pgsql-hackers
From | Bossart, Nathan |
---|---|
Subject | Re: partial heap only tuples |
Date | |
Msg-id | 962E672B-E874-4EF3-AEE0-85E54AA083A9@amazon.com Whole thread Raw |
In response to | Re: partial heap only tuples (Andres Freund <andres@anarazel.de>) |
List | pgsql-hackers |
On 2/13/21, 8:26 AM, "Andres Freund" <andres@anarazel.de> wrote: > On 2021-02-09 18:48:21 +0000, Bossart, Nathan wrote: >> In order to be eligible for cleanup, the final tuple in the >> corresponding PHOT/HOT chain must also be eligible for cleanup, or all >> indexes must have been updated later in the chain before any visible >> tuples. > > This sounds like it might be prohibitively painful. Adding effectively > unremovable bloat to remove other bloat is not an uncomplicated > premise. I think you'd really need a way to fully remove this as part of > vacuum for this to be viable. Yeah, this is something I'm concerned about. I think adding a bitmap of modified columns to the header of PHOT-updated tuples improves matters quite a bit, even for single-page vacuuming. Following is a strategy I've been developing (there may still be some gaps). Here's a basic PHOT chain where all tuples are visible and the last one has not been deleted or updated: idx1 0 1 2 3 idx2 0 1 2 idx3 0 2 3 lp 1 2 3 4 5 tuple (0,0,0) (0,1,1) (2,2,1) (2,2,2) (3,2,3) bitmap -xx xx- --x x-x For single-page vacuum, we take the following actions: 1. Starting at the root of the PHOT chain, create an OR'd bitmap of the chain. 2. Walk backwards, OR-ing the bitmaps. Stop when the bitmap matches the one from step 1. As we walk backwards, identify "key" tuples, which are tuples where the OR'd bitmap changes as you walk backwards. If the OR'd bitmap does not include all columns for the table, also include the root of the PHOT chain as a key tuple. 3. Redirect each key tuple to the next key tuple. 4. For all but the first key tuple, OR the bitmaps of all pruned tuples from each key tuple (exclusive) to the next key tuple (inclusive) and store the result in the bitmap of the next key tuple. 5. Mark all line pointers for all non-key tuples as dead. Storage can be removed for all tuples except the last one, but we must leave around the bitmap for all key tuples except for the first one. After this, our basic PHOT chain looks like this: idx1 0 1 2 3 idx2 0 1 2 idx3 0 2 3 lp X X 3->5 X 5 tuple (3,2,3) bitmap x-x Without PHOT, this intermediate state would have 15 index tuples, 5 line pointers, and 1 heap tuples. With PHOT, we have 10 index tuples, 5 line pointers, 1 heap tuple, and 1 bitmap. When we vacuum the indexes, we can reclaim the dead line pointers and remove the associated index tuples: idx1 3 idx2 2 idx3 2 3 lp 3->5 5 tuple (3,2,3) bitmap x-x Without PHOT, this final state would have 3 index tuples, 1 line pointer, and 1 heap tuple. With PHOT, we have 4 index tuples, 2 line pointers, 1 heap tuple, and 1 bitmap. Overall, we still end up keeping around more line pointers and tuple headers (for the bitmaps), but maybe that is good enough. I think the next step here would be to find a way to remove some of the unnecessary index tuples and adjust the remaining ones to point to the last line pointer in the PHOT chain. Nathan
pgsql-hackers by date: