Re: [WiP] B-tree page merge during vacuum to reduce index bloat - Mailing list pgsql-hackers

From Andrey Borodin
Subject Re: [WiP] B-tree page merge during vacuum to reduce index bloat
Date
Msg-id CCD000DB-67CB-4D64-A912-B7514D546058@yandex-team.ru
Whole thread Raw
In response to Re: [WiP] B-tree page merge during vacuum to reduce index bloat  (Andrey Borodin <x4mmm@yandex-team.ru>)
List pgsql-hackers

> On 29 Aug 2025, at 13:39, Andrey Borodin <x4mmm@yandex-team.ru> wrote:
>
> I think to establish baseline for locking correctness we are going to start from writing index scan tests, that fail
withproposed merge patch and pass on current HEAD. I want to observe that forward scan is showing duplicates and
backwardscan misses tuples. 

Well, that was unexpectedly easy. See patch 0001. It brings a test where we create sparse tree, and injection point
thatwill wait on a scan stepping into some middle leaf page. 
Then the test invokes vacuum. There are ~35 leaf pages, most of them will be merged into just a few pages.
As expected, both scans produce incorrect results.
t/008_btree_merge_scan_correctness.pl .. 1/?
#   Failed test 'Forward scan returns correct count'
#   at t/008_btree_merge_scan_correctness.pl line 132.
#          got: '364'
#     expected: '250'

#   Failed test 'Backward scan returns correct count'
#   at t/008_btree_merge_scan_correctness.pl line 133.
#          got: '142'
#     expected: '250'
# Looks like you failed 2 tests of 2.


> From that we will try to design locking that does not affect performance significantly, but allows to merge pages.
Perhaps,we can design a way to switch new index scans to "safe mode" during index vacuum and waiting for existing scans
tocomplete. 

What if we just abort a scan, that stepped on the page where tuples were moved out?
I've prototype this approach, please see patch 0002. Maybe in future we will improve locking protocol if we will
observehigh error rates. 
Unfortunately, this approach leads to default mergefactor 0 instead of 5%.

What do you think? Should we add this to CF or the idea is too wild for a review?


Best regards, Andrey Borodin.


Attachment

pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: Adding REPACK [concurrently]
Next
From: Florents Tselai
Date:
Subject: Add xicorr(X, Y): support for the xi (ξ) correlation coefficient by Chatterjee