BTMaxItemSize seems to be subtly incorrect - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | BTMaxItemSize seems to be subtly incorrect |
Date | |
Msg-id | CA+Tgmoa7UBxivM7f6Ocx_qbq4=ky3uXc+WZNOBcVX+kvJvWOEA@mail.gmail.com Whole thread Raw |
Responses |
Re: BTMaxItemSize seems to be subtly incorrect
|
List | pgsql-hackers |
We have these two definitions in the source code: #define BTMaxItemSize(page) \ MAXALIGN_DOWN((PageGetPageSize(page) - \ MAXALIGN(SizeOfPageHeaderData + \ 3*sizeof(ItemIdData) + \ 3*sizeof(ItemPointerData)) - \ MAXALIGN(sizeof(BTPageOpaqueData))) / 3) #define BTMaxItemSizeNoHeapTid(page) \ MAXALIGN_DOWN((PageGetPageSize(page) - \ MAXALIGN(SizeOfPageHeaderData + 3*sizeof(ItemIdData)) - \ MAXALIGN(sizeof(BTPageOpaqueData))) / 3) In my tests, PageGetPageSize(page) = 8192, SizeOfPageHeaderData = 24, sizeof(ItemIdData) = 4, sizeof(ItemPointerData) = 6, and sizeof(BTPageOpaqueData) = 16. Assuming MAXIMUM_ALIGNOF == 8, I believe that makes BTMaxItemSize come out to 2704 and BTMaxItemSizeNoHeapTid come out to 2712. I have no quibble with the formula for BTMaxItemSizeNoHeapTid. It's just saying that if you subtract out the page header data, the special space, and enough space for 3 line pointers, you have a certain amount of space left (8152 bytes) and so a single item shouldn't use more than a third of that (2717 bytes) but since items use up space in increments of MAXALIGN, you have to round down to the next such multiple (2712 bytes). But what's up with BTMaxItemSize? Here, the idea as I understand it is that we might need to add a TID to the item, so it has to be small enough to squeeze one in while still fitting under the limit. And at first glance everything seems to be OK, because BTMaxItemSize comes out to be 8 bytes less than BTMaxItemSizeNoHeapTid and that's enough space to fit a heap TID for sure. However, it seems to me that the formula calculates this as if those additional 6 bytes were being separately added to the page header or the line pointer array, whereas in reality they will be part of the tuple itself. I think that we should be subtracting sizeof(ItemPointerData) at the very end, rather than subtracting 3*sizeof(ItemPointerData) from the space available to be divided by three. To see why, suppose that sizeof(BTPageOpaqueData) were 24 rather than 16. Then we'd have: BTMaxItemSize = MAXALIGN_DOWN((8192 - MAXALIGN(24 + 3 * 4 + 3 * 6) - MAXALIGN(24)) / 3) = MAXALIGN_DOWN((8192 - MAXALIGN(54) - 24) / 3) = MAXALIGN_DOWN(2704) = 2704 BTMaxItemSizeNoHeapTid = MAXALIGN_DOWN((8192 - MAXALIGN(24 + 3 * 4) - MAXALIGN(24)) / 3 = MAXALIGN_DOWN((8192 - MAXALIGN(36) - 24) / 3) = MAXALIGN_DOWN(2709) = 2704 That's a problem, because if in that scenario you allow three 2704 byte items that don't need a heap TID and later you find you need to add a heap TID to one of those items, the result will be bigger than 2704 bytes, and then you can't fit three of them into a page. Applying the attached patch and running 'make check' suffices to demonstrate the problem for me: diff -U3 /Users/rhaas/pgsql/src/test/regress/expected/vacuum.out /Users/rhaas/pgsql/src/test/regress/results/vacuum.out --- /Users/rhaas/pgsql/src/test/regress/expected/vacuum.out 2022-06-06 14:46:17.000000000 -0400 +++ /Users/rhaas/pgsql/src/test/regress/results/vacuum.out 2022-06-08 17:20:58.000000000 -0400 @@ -137,7 +137,9 @@ repeat('1234567890',269)); -- index cleanup option is ignored if VACUUM FULL VACUUM (INDEX_CLEANUP TRUE, FULL TRUE) no_index_cleanup; +ERROR: cannot insert oversized tuple of size 2712 on internal page of index "no_index_cleanup_idx" VACUUM (FULL TRUE) no_index_cleanup; +ERROR: cannot insert oversized tuple of size 2712 on internal page of index "no_index_cleanup_idx" -- Toast inherits the value from its parent table. ALTER TABLE no_index_cleanup SET (vacuum_index_cleanup = false); DELETE FROM no_index_cleanup WHERE i < 15; -- Robert Haas EDB: http://www.enterprisedb.com
Attachment
pgsql-hackers by date: