BTMaxItemSize seems to be subtly incorrect - Mailing list pgsql-hackers

From Robert Haas
Subject BTMaxItemSize seems to be subtly incorrect
Date
Msg-id CA+Tgmoa7UBxivM7f6Ocx_qbq4=ky3uXc+WZNOBcVX+kvJvWOEA@mail.gmail.com
Whole thread Raw
Responses Re: BTMaxItemSize seems to be subtly incorrect
List pgsql-hackers
We have these two definitions in the source code:

#define BTMaxItemSize(page) \
     MAXALIGN_DOWN((PageGetPageSize(page) - \
                    MAXALIGN(SizeOfPageHeaderData + \
                             3*sizeof(ItemIdData)  + \
                             3*sizeof(ItemPointerData)) - \
                    MAXALIGN(sizeof(BTPageOpaqueData))) / 3)
#define BTMaxItemSizeNoHeapTid(page) \
     MAXALIGN_DOWN((PageGetPageSize(page) - \
                    MAXALIGN(SizeOfPageHeaderData + 3*sizeof(ItemIdData)) - \
                    MAXALIGN(sizeof(BTPageOpaqueData))) / 3)

In my tests, PageGetPageSize(page) = 8192, SizeOfPageHeaderData = 24,
sizeof(ItemIdData) = 4, sizeof(ItemPointerData) = 6, and
sizeof(BTPageOpaqueData) = 16. Assuming MAXIMUM_ALIGNOF == 8, I
believe that makes BTMaxItemSize come out to 2704 and
BTMaxItemSizeNoHeapTid come out to 2712. I have no quibble with the
formula for BTMaxItemSizeNoHeapTid. It's just saying that if you
subtract out the page header data, the special space, and enough space
for 3 line pointers, you have a certain amount of space left (8152
bytes) and so a single item shouldn't use more than a third of that
(2717 bytes) but since items use up space in increments of MAXALIGN,
you have to round down to the next such multiple (2712 bytes).

But what's up with BTMaxItemSize? Here, the idea as I understand it is
that we might need to add a TID to the item, so it has to be small
enough to squeeze one in while still fitting under the limit. And at
first glance everything seems to be OK, because BTMaxItemSize comes
out to be 8 bytes less than BTMaxItemSizeNoHeapTid and that's enough
space to fit a heap TID for sure. However, it seems to me that the
formula calculates this as if those additional 6 bytes were being
separately added to the page header or the line pointer array, whereas
in reality they will be part of the tuple itself. I think that we
should be subtracting sizeof(ItemPointerData) at the very end, rather
than subtracting 3*sizeof(ItemPointerData) from the space available to
be divided by three.

To see why, suppose that sizeof(BTPageOpaqueData) were 24 rather than
16. Then we'd have:

BTMaxItemSize = MAXALIGN_DOWN((8192 - MAXALIGN(24 + 3 * 4 + 3 * 6) -
MAXALIGN(24)) / 3) = MAXALIGN_DOWN((8192 - MAXALIGN(54)  - 24) / 3) =
MAXALIGN_DOWN(2704) = 2704
BTMaxItemSizeNoHeapTid = MAXALIGN_DOWN((8192 - MAXALIGN(24 + 3 * 4) -
MAXALIGN(24)) / 3 = MAXALIGN_DOWN((8192 - MAXALIGN(36)  - 24) / 3) =
MAXALIGN_DOWN(2709) = 2704

That's a problem, because if in that scenario you allow three 2704
byte items that don't need a heap TID and later you find you need to
add a heap TID to one of those items, the result will be bigger than
2704 bytes, and then you can't fit three of them into a page.

Applying the attached patch and running 'make check' suffices to
demonstrate the problem for me:

diff -U3 /Users/rhaas/pgsql/src/test/regress/expected/vacuum.out
/Users/rhaas/pgsql/src/test/regress/results/vacuum.out
--- /Users/rhaas/pgsql/src/test/regress/expected/vacuum.out
2022-06-06 14:46:17.000000000 -0400
+++ /Users/rhaas/pgsql/src/test/regress/results/vacuum.out
2022-06-08 17:20:58.000000000 -0400
@@ -137,7 +137,9 @@
     repeat('1234567890',269));
 -- index cleanup option is ignored if VACUUM FULL
 VACUUM (INDEX_CLEANUP TRUE, FULL TRUE) no_index_cleanup;
+ERROR:  cannot insert oversized tuple of size 2712 on internal page
of index "no_index_cleanup_idx"
 VACUUM (FULL TRUE) no_index_cleanup;
+ERROR:  cannot insert oversized tuple of size 2712 on internal page
of index "no_index_cleanup_idx"
 -- Toast inherits the value from its parent table.
 ALTER TABLE no_index_cleanup SET (vacuum_index_cleanup = false);
 DELETE FROM no_index_cleanup WHERE i < 15;

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Attachment

pgsql-hackers by date:

Previous
From: Justin Pryzby
Date:
Subject: Re: [v15 beta] pg_upgrade failed if earlier executed with -c switch
Next
From: Roberto C. Sánchez
Date:
Subject: Re: Request for assistance to backport CVE-2022-1552 fixes to 9.6 and 9.4