Re: [BUG + PATCH] DSA pagemap out-of-bounds in make_new_segment odd-sized path - Mailing list pgsql-hackers

From Chao Li
Subject Re: [BUG + PATCH] DSA pagemap out-of-bounds in make_new_segment odd-sized path
Date
Msg-id 753FFCD4-D37F-4451-9942-A8B590090CA1@gmail.com
Whole thread Raw
In response to RE: [BUG + PATCH] DSA pagemap out-of-bounds in make_new_segment odd-sized path  (<paul.bunn@icloud.com>)
List pgsql-hackers

> On Mar 5, 2026, at 07:33, <paul.bunn@icloud.com> <paul.bunn@icloud.com> wrote:
>
> Attached is a new patch to demonstrate the bug in make_new_segment.
> It’s not meant as a permanent new test (it’s using some hard-coded assumptions about the x64 platform) -- just useful
invalidating the bug and fix. 
>   From: paul.bunn@icloud.com <paul.bunn@icloud.com>
> Sent: Wednesday, March 4, 2026 12:01 AM
> To: pgsql-hackers@postgresql.org
> Subject: [BUG + PATCH] DSA pagemap out-of-bounds in make_new_segment odd-sized path
>  Hi hackers,
>  Sorry for the previously poorly-formatted/threaded email.
>  We've identified a bug in the DSA (Dynamic Shared Memory Area) allocator
> that causes memory corruption and crashes during parallel hash joins with
> large data sets.  The bug has been present since the DSA implementation
> was introduced and affects all supported branches.
>  Attached is a minimal fix (5 lines added, 0 changed).
>  == Bug Summary ==
>  In make_new_segment() (src/backend/utils/mmgr/dsa.c), there are two paths
> for computing segment layout:
>    Path 1 (geometric): knows total_size upfront, computes
>     total_pages = total_size / FPM_PAGE_SIZE
>     pagemap entries = total_pages                  <-- correct
>    Path 2 (odd-sized, when requested > geometric): works forward from
>     usable_pages = requested_pages
>     pagemap entries = usable_pages                 <-- BUG
>  The pagemap is indexed by absolute page number.  The FreePageManager hands
> out pages with indices from metadata_pages to (metadata_pages +
> usable_pages - 1).  Since metadata_pages >= 1, page indices at the high
> end of the range exceed usable_pages, making the pagemap accesses
> out-of-bounds.
>  == How It Was Found on Postgres 15 ==
>  Multiple parallel worker backends crashed simultaneously with SIGSEGV in
> dsa_get_address(), called from dsa_free() in ExecHashTableDetachBatch()
> during parallel hash join batch cleanup.
>  Stack trace:
>   #0 dsa_get_address (area, dp) at dsa.c:955
>   #1 dsa_free (area, dp) at dsa.c:839
>   #2 ExecHashTableDetachBatch (hashtable) at nodeHash.c:3189
>   #3 ExecParallelHashJoinNewBatch (hjstate) at nodeHashjoin.c:1157
>  All crashing workers computed the same pageno (196993), which was the last
> valid FPM page but beyond the pagemap array (usable_pages = 196609).
>  == Root Cause (from core dump analysis) ==
>  The crashing segment had:
>   usable_pages   = 196,609   (pagemap has this many entries)
>   metadata_pages = 385
>   total_pages    = 196,994   (= metadata_pages + usable_pages)
>   last FPM page  = 196,993   (= metadata_pages + usable_pages - 1)
>    pagemap valid indices: 0 .. 196,608
>   FPM page indices:      385 .. 196,993
>  Pages 196,609 through 196,993 (385 pages) have no valid pagemap entry.
>  The pagemap array ends 3,072 bytes before the data area starts (padding
> zone).  Most out-of-bounds entries fall in this padding and cause silent
> corruption.  The last 17 entries fall in the data area itself, causing
> bidirectional corruption: pagemap writes destroy allocated object data,
> and subsequent pagemap reads return garbage, crashing dsa_free().

While reviewing this patch, I took the opportunity to read through the DSA code again.

I think the reason why the bug is hard to trigger is that, even if out-of-bound of pagemap happens, in most cases, the
OOBaccess drops into the padding area, which is actually safe. Only when the padding area is not big enough to hold
metadata_bytes/FPM_PAGE_SIZEdsa_pointer entries, it encounters a segment fault error. 

>  == The Fix ==
>  After computing metadata_bytes with usable_pages pagemap entries, add
> entries for the metadata pages themselves:
>    metadata_bytes +=
>       ((metadata_bytes / (FPM_PAGE_SIZE - sizeof(dsa_pointer))) + 1) *
>       sizeof(dsa_pointer);
>  The divisor (FPM_PAGE_SIZE - sizeof(dsa_pointer)) = 4088 accounts for
> the circular dependency: each metadata page costs one pagemap entry
> (8 bytes), so only 4088 of each 4096-byte metadata page is net-free for
> other pagemap entries.  The +1 absorbs the ceiling.

I think this fix is very clever. But the problem I see is that, it adds extra bytes to metadata_bytes, then compute
paddingsize. But I believe the padding area is safe to hold pagemap. Thus, if the padding area is big enough, then we
don’tneed to add extra bytes to metadata_bytes. Based on that, I made a fix: 
```
diff --git a/src/backend/utils/mmgr/dsa.c b/src/backend/utils/mmgr/dsa.c
index ce9ede4c196..b7a089bc288 100644
--- a/src/backend/utils/mmgr/dsa.c
+++ b/src/backend/utils/mmgr/dsa.c
@@ -2131,6 +2131,7 @@ static dsa_segment_map *
 make_new_segment(dsa_area *area, size_t requested_pages)
 {
        dsa_segment_index new_index;
+       size_t          metadata_fixed_bytes;
        size_t          metadata_bytes;
        size_t          total_size;
        size_t          total_pages;
@@ -2180,9 +2181,11 @@ make_new_segment(dsa_area *area, size_t requested_pages)
                                         area->control->total_segment_size);

        total_pages = total_size / FPM_PAGE_SIZE;
-       metadata_bytes =
+       metadata_fixed_bytes =
                MAXALIGN(sizeof(dsa_segment_header)) +
-               MAXALIGN(sizeof(FreePageManager)) +
+               MAXALIGN(sizeof(FreePageManager));
+       metadata_bytes =
+               metadata_fixed_bytes +
                sizeof(dsa_pointer) * total_pages;

        /* Add padding up to next page boundary. */
@@ -2196,19 +2199,36 @@ make_new_segment(dsa_area *area, size_t requested_pages)
        /* See if that is enough... */
        if (requested_pages > usable_pages)
        {
+               size_t pagemap_entries;
+
                /*
                 * We'll make an odd-sized segment, working forward from the requested
                 * number of pages.
                 */
                usable_pages = requested_pages;
-               metadata_bytes =
-                       MAXALIGN(sizeof(dsa_segment_header)) +
-                       MAXALIGN(sizeof(FreePageManager)) +
-                       usable_pages * sizeof(dsa_pointer);
-
-               /* Add padding up to next page boundary. */
-               if (metadata_bytes % FPM_PAGE_SIZE != 0)
-                       metadata_bytes += FPM_PAGE_SIZE - (metadata_bytes % FPM_PAGE_SIZE);
+               pagemap_entries = usable_pages;
+
+               for (;;)
+               {
+                       size_t actual_pages;
+                       size_t metadata_pages;
+
+                       metadata_bytes =
+                               metadata_fixed_bytes +
+                               pagemap_entries * sizeof(dsa_pointer);
+
+                       /* Add padding up to next page boundary. */
+                       if (metadata_bytes % FPM_PAGE_SIZE != 0)
+                               metadata_bytes += FPM_PAGE_SIZE - (metadata_bytes % FPM_PAGE_SIZE);
+
+                       metadata_pages = metadata_bytes / FPM_PAGE_SIZE;
+                       actual_pages = metadata_pages + usable_pages;
+                       if (pagemap_entries >= actual_pages)
+                               break;
+
+                       pagemap_entries = actual_pages;
+               }
+
                total_size = metadata_bytes + usable_pages * FPM_PAGE_SIZE;

                /* Is that too large for dsa_pointer's addressing scheme? */
```

I tested my version of fix with the test script you attached in your next email, and the test passed. And I feel my
versionis easier to understand. 

But I haven’t thought deeper if your version really waste the padding area, if not, then I agree your version is
shorterand neater, only concern is a bit hard to understand, but that wouldn't be a problem IMO if a proper comment is
added.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/







pgsql-hackers by date:

Previous
From: shveta malik
Date:
Subject: Re: [PATCH] Support automatic sequence replication
Next
From: Andrey Borodin
Date:
Subject: Re: amcheck: add index-all-keys-match verification for B-Tree