Re: [BUG + PATCH] DSA pagemap out-of-bounds in make_new_segment odd-sized path - Mailing list pgsql-hackers
| From | Chao Li |
|---|---|
| Subject | Re: [BUG + PATCH] DSA pagemap out-of-bounds in make_new_segment odd-sized path |
| Date | |
| Msg-id | 753FFCD4-D37F-4451-9942-A8B590090CA1@gmail.com Whole thread Raw |
| In response to | RE: [BUG + PATCH] DSA pagemap out-of-bounds in make_new_segment odd-sized path (<paul.bunn@icloud.com>) |
| List | pgsql-hackers |
> On Mar 5, 2026, at 07:33, <paul.bunn@icloud.com> <paul.bunn@icloud.com> wrote:
>
> Attached is a new patch to demonstrate the bug in make_new_segment.
> It’s not meant as a permanent new test (it’s using some hard-coded assumptions about the x64 platform) -- just useful
invalidating the bug and fix.
> From: paul.bunn@icloud.com <paul.bunn@icloud.com>
> Sent: Wednesday, March 4, 2026 12:01 AM
> To: pgsql-hackers@postgresql.org
> Subject: [BUG + PATCH] DSA pagemap out-of-bounds in make_new_segment odd-sized path
> Hi hackers,
> Sorry for the previously poorly-formatted/threaded email.
> We've identified a bug in the DSA (Dynamic Shared Memory Area) allocator
> that causes memory corruption and crashes during parallel hash joins with
> large data sets. The bug has been present since the DSA implementation
> was introduced and affects all supported branches.
> Attached is a minimal fix (5 lines added, 0 changed).
> == Bug Summary ==
> In make_new_segment() (src/backend/utils/mmgr/dsa.c), there are two paths
> for computing segment layout:
> Path 1 (geometric): knows total_size upfront, computes
> total_pages = total_size / FPM_PAGE_SIZE
> pagemap entries = total_pages <-- correct
> Path 2 (odd-sized, when requested > geometric): works forward from
> usable_pages = requested_pages
> pagemap entries = usable_pages <-- BUG
> The pagemap is indexed by absolute page number. The FreePageManager hands
> out pages with indices from metadata_pages to (metadata_pages +
> usable_pages - 1). Since metadata_pages >= 1, page indices at the high
> end of the range exceed usable_pages, making the pagemap accesses
> out-of-bounds.
> == How It Was Found on Postgres 15 ==
> Multiple parallel worker backends crashed simultaneously with SIGSEGV in
> dsa_get_address(), called from dsa_free() in ExecHashTableDetachBatch()
> during parallel hash join batch cleanup.
> Stack trace:
> #0 dsa_get_address (area, dp) at dsa.c:955
> #1 dsa_free (area, dp) at dsa.c:839
> #2 ExecHashTableDetachBatch (hashtable) at nodeHash.c:3189
> #3 ExecParallelHashJoinNewBatch (hjstate) at nodeHashjoin.c:1157
> All crashing workers computed the same pageno (196993), which was the last
> valid FPM page but beyond the pagemap array (usable_pages = 196609).
> == Root Cause (from core dump analysis) ==
> The crashing segment had:
> usable_pages = 196,609 (pagemap has this many entries)
> metadata_pages = 385
> total_pages = 196,994 (= metadata_pages + usable_pages)
> last FPM page = 196,993 (= metadata_pages + usable_pages - 1)
> pagemap valid indices: 0 .. 196,608
> FPM page indices: 385 .. 196,993
> Pages 196,609 through 196,993 (385 pages) have no valid pagemap entry.
> The pagemap array ends 3,072 bytes before the data area starts (padding
> zone). Most out-of-bounds entries fall in this padding and cause silent
> corruption. The last 17 entries fall in the data area itself, causing
> bidirectional corruption: pagemap writes destroy allocated object data,
> and subsequent pagemap reads return garbage, crashing dsa_free().
While reviewing this patch, I took the opportunity to read through the DSA code again.
I think the reason why the bug is hard to trigger is that, even if out-of-bound of pagemap happens, in most cases, the
OOBaccess drops into the padding area, which is actually safe. Only when the padding area is not big enough to hold
metadata_bytes/FPM_PAGE_SIZEdsa_pointer entries, it encounters a segment fault error.
> == The Fix ==
> After computing metadata_bytes with usable_pages pagemap entries, add
> entries for the metadata pages themselves:
> metadata_bytes +=
> ((metadata_bytes / (FPM_PAGE_SIZE - sizeof(dsa_pointer))) + 1) *
> sizeof(dsa_pointer);
> The divisor (FPM_PAGE_SIZE - sizeof(dsa_pointer)) = 4088 accounts for
> the circular dependency: each metadata page costs one pagemap entry
> (8 bytes), so only 4088 of each 4096-byte metadata page is net-free for
> other pagemap entries. The +1 absorbs the ceiling.
I think this fix is very clever. But the problem I see is that, it adds extra bytes to metadata_bytes, then compute
paddingsize. But I believe the padding area is safe to hold pagemap. Thus, if the padding area is big enough, then we
don’tneed to add extra bytes to metadata_bytes. Based on that, I made a fix:
```
diff --git a/src/backend/utils/mmgr/dsa.c b/src/backend/utils/mmgr/dsa.c
index ce9ede4c196..b7a089bc288 100644
--- a/src/backend/utils/mmgr/dsa.c
+++ b/src/backend/utils/mmgr/dsa.c
@@ -2131,6 +2131,7 @@ static dsa_segment_map *
make_new_segment(dsa_area *area, size_t requested_pages)
{
dsa_segment_index new_index;
+ size_t metadata_fixed_bytes;
size_t metadata_bytes;
size_t total_size;
size_t total_pages;
@@ -2180,9 +2181,11 @@ make_new_segment(dsa_area *area, size_t requested_pages)
area->control->total_segment_size);
total_pages = total_size / FPM_PAGE_SIZE;
- metadata_bytes =
+ metadata_fixed_bytes =
MAXALIGN(sizeof(dsa_segment_header)) +
- MAXALIGN(sizeof(FreePageManager)) +
+ MAXALIGN(sizeof(FreePageManager));
+ metadata_bytes =
+ metadata_fixed_bytes +
sizeof(dsa_pointer) * total_pages;
/* Add padding up to next page boundary. */
@@ -2196,19 +2199,36 @@ make_new_segment(dsa_area *area, size_t requested_pages)
/* See if that is enough... */
if (requested_pages > usable_pages)
{
+ size_t pagemap_entries;
+
/*
* We'll make an odd-sized segment, working forward from the requested
* number of pages.
*/
usable_pages = requested_pages;
- metadata_bytes =
- MAXALIGN(sizeof(dsa_segment_header)) +
- MAXALIGN(sizeof(FreePageManager)) +
- usable_pages * sizeof(dsa_pointer);
-
- /* Add padding up to next page boundary. */
- if (metadata_bytes % FPM_PAGE_SIZE != 0)
- metadata_bytes += FPM_PAGE_SIZE - (metadata_bytes % FPM_PAGE_SIZE);
+ pagemap_entries = usable_pages;
+
+ for (;;)
+ {
+ size_t actual_pages;
+ size_t metadata_pages;
+
+ metadata_bytes =
+ metadata_fixed_bytes +
+ pagemap_entries * sizeof(dsa_pointer);
+
+ /* Add padding up to next page boundary. */
+ if (metadata_bytes % FPM_PAGE_SIZE != 0)
+ metadata_bytes += FPM_PAGE_SIZE - (metadata_bytes % FPM_PAGE_SIZE);
+
+ metadata_pages = metadata_bytes / FPM_PAGE_SIZE;
+ actual_pages = metadata_pages + usable_pages;
+ if (pagemap_entries >= actual_pages)
+ break;
+
+ pagemap_entries = actual_pages;
+ }
+
total_size = metadata_bytes + usable_pages * FPM_PAGE_SIZE;
/* Is that too large for dsa_pointer's addressing scheme? */
```
I tested my version of fix with the test script you attached in your next email, and the test passed. And I feel my
versionis easier to understand.
But I haven’t thought deeper if your version really waste the padding area, if not, then I agree your version is
shorterand neater, only concern is a bit hard to understand, but that wouldn't be a problem IMO if a proper comment is
added.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
pgsql-hackers by date: