Hello Hackers,
My name is Salma El-Sayed, and I am working on B-tree Index Bloat Reduction as part of GSoC 2026. This is my introduction email[1]. I would like to share my current approach for page merging and get your feedback.
*The Core Idea*
Two sibling B-tree pages L (left) and R (right) are candidates for merging when both pages are below a size threshold (less than 10% full for example) and their combined size fits within the fill factor.
The merge copies all data from L into R, then:
1- R is marked BTP_MERGED --> it now contains L's data.
2- L is marked BTP_MERGED_AWAY --> it is logically gone but kept around temporarily for any active scan that still holds a reference to it.
3- L is unlinked from its parent. VACUUM will handle updating the right-link of L's left sibling as part of normal cleanup.
Once all transactions that were active at the time of the merge have committed, L can transition to HALF_DEAD and be reclaimed by VACUUM.
*Handling Concurrent Scans*
The tricky part is ensuring correctness for scans that were already in progress when the merge happened.
-Forward Scans-
A forward scanner that arrives at L after the merge sees BTP_MERGED_AWAY and follows through to R.
If the scanner had already read L before the merge, it would have saved L's high key. It can then binary-search R to skip the tuples it already saw and continue from where it left off.
-Backward Scans-
A backward scanner that arrives at R after the merge sees BTP_MERGED, reads R (which now contains L's data), and skips L entirely.
If the scanner had already read R before the merge, when it later arrives at L it sees BTP_MERGED_AWAY. Here we need a small piece of state on the scanner:
- If the scanner has not previously seen a BTP_MERGED flag, it knows it visited R before the merge and has not yet seen L's data, so it must read L.
- If the scanner has previously seen BTP_MERGED, it knows it read R after the merge, meaning R already contained L's data, so it should skip L and continue left.
*Questions for the Community*
1- Triggering the merge: how aggressive should we be?
How should the merge process be triggered? I see two broad options:
a- Full-index scan: aggressively find all merge candidates in one pass.
b- Bounded scan: a separate dedicated scan that stops after a controlled number of pages, giving the user control over how much of the index to process at a time.
A possible interface for the candidate scanner might look something like:
bt_merge_pages(index regclass, min_fillfactor float8, max_fillfactor float8, max_pages int4)
** This would allow the process to run over a controlled subset of the index rather than always scanning the whole thing, which seems important for large tables where a full scan could run for a very long time. What is the community's preferred approach here?
2- Flag naming and free bits
I am proposing two new flags in btpo_flags:
- BTP_MERGED: set on R to indicate it absorbed L's data.
- BTP_MERGED_AWAY: set on L to indicate it has been merged away.
** Are there free bits available in btpo_flags suitable for this purpose? And does the community see any objection to consuming two bits in this field for the merge mechanism?
Thank you for your time. I look forward to your feedback!
[1]
https://www.postgresql.org/message-id/CANBEAPFW64QUGmMYoPekV6jGrn5Yx8kTNfeEtcTG9CCBUUBoLQ%40mail.gmail.comBest regards,
Salma El-Sayed