[GSoC 2026] - B-tree Index Bloat Reduction - Approach & Questions - Mailing list pgsql-hackers

From Salma El-Sayed
Subject [GSoC 2026] - B-tree Index Bloat Reduction - Approach & Questions
Date
Msg-id CANBEAPFq3YAOydjUS3xwcUG9L6e3WE5Z4nGPk_Q3RsjSFWTJNA@mail.gmail.com
Whole thread
List pgsql-hackers
Hello Hackers,

My name is Salma El-Sayed, and I am working on B-tree Index Bloat Reduction as part of GSoC 2026. This is my introduction email[1]. I would like to share my current approach for page merging and get your feedback.

*The Core Idea*
Two sibling B-tree pages L (left) and R (right) are candidates for merging when both pages are below a size threshold (less than 10% full for example) and their combined size fits within the fill factor.

The merge copies all data from L into R, then:
1- R is marked BTP_MERGED --> it now contains L's data.
2- L is marked BTP_MERGED_AWAY --> it is logically gone but kept around temporarily for any active scan that still holds a reference to it.
3- L is unlinked from its parent. VACUUM will handle updating the right-link of L's left sibling as part of normal cleanup.

Once all transactions that were active at the time of the merge have committed, L can transition to HALF_DEAD and be reclaimed by VACUUM.

*Handling Concurrent Scans*
The tricky part is ensuring correctness for scans that were already in progress when the merge happened.

-Forward Scans-
A forward scanner that arrives at L after the merge sees BTP_MERGED_AWAY and follows through to R.

If the scanner had already read L before the merge, it would have saved L's high key. It can then binary-search R to skip the tuples it already saw and continue from where it left off.

-Backward Scans-
A backward scanner that arrives at R after the merge sees BTP_MERGED, reads R (which now contains L's data), and skips L entirely.

If the scanner had already read R before the merge, when it later arrives at L it sees BTP_MERGED_AWAY. Here we need a small piece of state on the scanner:
- If the scanner has not previously seen a BTP_MERGED flag, it knows it visited R before the merge and has not yet seen L's data, so it must read L.
- If the scanner has previously seen BTP_MERGED, it knows it read R after the merge, meaning R already contained L's data, so it should skip L and continue left.

*Questions for the Community*

1- Triggering the merge: how aggressive should we be?

How should the merge process be triggered? I see two broad options:
a- Full-index scan: aggressively find all merge candidates in one pass.
b- Bounded scan: a separate dedicated scan that stops after a controlled number of pages, giving the user control over how much of the index to process at a time.

A possible interface for the candidate scanner might look something like:
bt_merge_pages(index regclass, min_fillfactor float8, max_fillfactor float8, max_pages int4)

** This would allow the process to run over a controlled subset of the index rather than always scanning the whole thing, which seems important for large tables where a full scan could run for a very long time. What is the community's preferred approach here?

2- Flag naming and free bits

I am proposing two new flags in btpo_flags:
- BTP_MERGED: set on R to indicate it absorbed L's data.
- BTP_MERGED_AWAY: set on L to indicate it has been merged away.

 ** Are there free bits available in btpo_flags suitable for this purpose? And does the community see any objection to consuming two bits in this field for the merge mechanism?

Thank you for your time. I look forward to your feedback!

[1] https://www.postgresql.org/message-id/CANBEAPFW64QUGmMYoPekV6jGrn5Yx8kTNfeEtcTG9CCBUUBoLQ%40mail.gmail.com

Best regards,
Salma El-Sayed

pgsql-hackers by date:

Previous
From: Ayush Tiwari
Date:
Subject: Re: Disallow whole-row index references with virtual generated columns?
Next
From: Matthias van de Meent
Date:
Subject: Re: Disallow whole-row index references with virtual generated columns?