Thread: Unintended interaction between bottom-up deletion and deduplication's single value strategy
Unintended interaction between bottom-up deletion and deduplication's single value strategy
From
Peter Geoghegan
Date:
The return value of _bt_bottomupdel_pass() is advisory; it reports "failure" for a deletion pass that was just suboptimal (it rarely reports failure because it couldn't delete anything at all). Bottom-up deletion preemptively falls back on a version-orientated deduplication pass when it didn't quite delete as many items as it hoped to delete. This policy avoids thrashing, particularly with low cardinality indexes, where the number of distinct TIDs per tableam/heapam block tends to drive when and how TIDs get deleted. I have noticed an unintended and undesirable interaction between this new behavior and an older deduplication behavior: it's possible for _bt_bottomupdel_pass() to return false to trigger a preemptive version-orientated deduplication pass that ends up using deduplication's "single value" strategy. This is just contradictory on its face: a version-orientated deduplication pass tries to prevent a version-driven page split altogether, whereas a single value strategy deduplication pass is specifically supposed to set things up for an imminent page split (a split that uses nbtsplitloc.c's single value strategy). Clearly we shouldn't prepare for a page split and try to avoid a page split at the same time! The practical consequence of this oversight is that leaf pages full of duplicates (all duplicates of the same single value) are currently much more likely to have a version-driven page split (from non-HOT updates) than similar pages that have two or three distinct key values. Another undesirable consequence is that we'll waste cycles in affected cases; any future bottom-up index deletion passes will waste time on the tuples that the intervening deduplication pass deliberately declined to merge together (as any single value dedup pass will). In other words, the heuristics described in comments above _bt_bottomupdel_finish_pending() can become confused by this misbehavior (after an initial round of deletion + deduplication, in a later round of deletion). This interaction is clearly a bug. It's easy to avoid. Attached patch fixes the bug by slightly narrowing the conditions under which we'll consider if we should apply deduplication's single value strategy. We were already not even considering it with a unique index, where it was always clear that this is only a version-orientated deduplication pass. It seems natural to also check whether or not we just had a "failed" call to _bt_bottomupdel_pass() -- this is a logical extension of what we do already. Barring objections, I will apply this patch (and backpatch to Postgres 14) in a few days. -- Peter Geoghegan
Attachment
Re: Unintended interaction between bottom-up deletion and deduplication's single value strategy
From
Peter Geoghegan
Date:
On Sun, Sep 19, 2021 at 7:47 PM Peter Geoghegan <pg@bowt.ie> wrote: > Attached patch fixes the bug by slightly narrowing the conditions > under which we'll consider if we should apply deduplication's single > value strategy. Pushed this fix a moment ago. -- Peter Geoghegan