Re: Combine Prune and Freeze records emitted by vacuum - Mailing list pgsql-hackers
From | Heikki Linnakangas |
---|---|
Subject | Re: Combine Prune and Freeze records emitted by vacuum |
Date | |
Msg-id | 4d1bd0d9-ee1d-4a52-9723-ca35ed40c1af@iki.fi Whole thread Raw |
In response to | Re: Combine Prune and Freeze records emitted by vacuum (Melanie Plageman <melanieplageman@gmail.com>) |
Responses |
Re: Combine Prune and Freeze records emitted by vacuum
|
List | pgsql-hackers |
On 29/03/2024 07:04, Melanie Plageman wrote: > On Thu, Mar 28, 2024 at 11:07:10AM -0400, Melanie Plageman wrote: >> These comments could use another pass. I had added some extra >> (probably redundant) content when I thought I was refactoring it a >> certain way and then changed my mind. >> >> Attached is a diff with some ideas I had for a bit of code simplification. >> >> Are you working on cleaning this patch up or should I pick it up? > > Attached v9 is rebased over master. But, more importantly, I took > another pass at heap_prune_chain() and am pretty happy with what I came > up with. See 0021. I simplified the traversal logic and then grouped the > chain processing into three branches at the end. I find it much easier > to understand what we are doing for different types of HOT chains. Ah yes, agreed, that's nicer. The 'survivor' variable is a little confusing, especially here: if (!survivor) { int i; /* * If no DEAD tuple was found, and the root is redirected, mark it as * such. */ if ((i = ItemIdIsRedirected(rootlp))) heap_prune_record_unchanged_lp_redirect(prstate, rootoffnum); /* the rest of tuples in the chain are normal, unchanged tuples */ for (; i < nchain; i++) heap_prune_record_unchanged(dp, prstate, chainitems[i]); } You would think that "if(!survivor)", it means that there is no live tuples on the chain, i.e. no survivors. But in fact it's the opposite; it means that the are all live. Maybe call it 'ndeadchain' instead, meaning the number of dead items in the chain. > I got rid of revisited. We can put it back, but I was thinking: we stash > all HOT tuples and then loop over them later, calling record_unchanged() > on the ones that aren't marked. But, if we have a lot of HOT tuples, is > this really that much better than just looping through all the offsets > and calling record_unchanged() on just the ones that aren't marked? Well, it requires looping through all the offsets one more time, and unless you have a lot of HOT tuples, most items would be marked already. But maybe the overhead is negligible anyway. > I've done that in my version. While testing this, I found that only > on-access pruning needed this final loop calling record_unchanged() on > items not yet marked. I know we can't skip this final loop entirely in > the ON ACCESS case because it calls record_prunable(), but we could > consider moving that back out into heap_prune_chain()? Or what do you > think? Hmm, why is that different with on-access pruning? Here's another idea: In the first loop through the offsets, where we gather the HTSV status of each item, also collect the offsets of all HOT and non-HOT items to two separate arrays. Call heap_prune_chain() for all the non-HOT items first, and then process any remaining HOT tuples that haven't been marked yet. > I haven't finished updating all the comments, but I am really interested > to know what you think about heap_prune_chain() now. Looks much better now, thanks! -- Heikki Linnakangas Neon (https://neon.tech)
pgsql-hackers by date: