[Patch][Bug] Incorrect usage of PageGetFreeSpace instead of PageGetHeapFreeSpace in heap_xlog_visible - Mailing list pgsql-hackers

From Alexey Makhmutov
Subject [Patch][Bug] Incorrect usage of PageGetFreeSpace instead of PageGetHeapFreeSpace in heap_xlog_visible
Date
Msg-id 66c011a7-2f26-4ab9-a5b0-1afcb8523367@postgrespro.ru
Whole thread
List pgsql-hackers
This is a spin-off from the thread 
https://www.postgresql.org/message-id/flat/596c4f1c-f966-4512-b9c9-dd8fbcaf0928%40postgrespro.ru 
to cover the second issue discussed in it. This issue is not more 
relevant for PG19, but is still present in previous versions.

In issues seems to be just a small overlook in the ab7dbd6, which used 
'PageGetFreeSpace' in 'heap_xlog_visible' function to calculate 
available space in page block instead of 'PageGetHeapFreeSpace'. The 
difference between these functions is that ‘PageGetHeapFreeSpace’ 
additionally checks that there is a space in line pointers section (i.e. 
checks that MaxHeapTuplesPerPage is not exceeded). If 
MaxHeapTuplesPerfPage items are already used on the page, then free 
space for this block need to be reported as 0, as no new tuple could be 
added to the page. Such case could be observed if page has large number 
of redirect slots created as result of HOT cleanups. Basically, every 
code working with heap pages should use ‘PageGetHeapFreeSpace’ rather 
than ‘PageGetFreeSpace’ and function ‘heap_xlog_visible’ is the only 
place in the code where ‘PageGetFreeSpace’ is used for heap pages. Usage 
of incorrect function could cause incorrect data being written to the 
FSM on replica: if block still have free space, but has already reached 
the MaxHeapTuplesPerPage limit, then it should be marked as unavailable 
for new rows in FSM, otherwise inserter will need to check and update 
its FSM data as well. This issue was already fixed in PG 19 as result of 
a881cc9, but is still present in previous versions.

I think that situation with difference between results of these two 
functions is rarely observed, but it seems logical to just call the 
correct function in all cases.

A patch is attached which could be applied on PG 14-17 to fix the issue 
(just replacing PageGetFreeSpace with PageGetHeapFreeSpace). For PG 18 
the function was moved to the heapam_xlog.c instead of heapam.c, so 
patch may need a small adjustment (version attached to the previous 
thread could be used in this case).

A small synthetic test case (test_case2.zip) is also attached. It could 
simulate a situation leading to the incorrect FSM data on the replica 
side. The archive contains script ‘test_prepare.sh’, which could be 
edited to specify path to PG installation and port numbers. Invocation 
of this script under the 'postgres' OS user creates two databases 
(master and replica), performs inserts and updates using the 
'insert_hot1.sql' script to create block with large number of HOT 
updates chains and then checks state of the FSM data for the table on 
both the master and replica server. Without the fix script shows full 
block (0 free space) on master and partially filled block (~1150 bytes 
of free space) on replica. With the fix both master and replica 
correctly shows 0 free space in the block. To stop and cleanup created 
servers './test_prepare.sh --cleanup' command could be used.

Thanks,
Alexey
Attachment

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: oauth integer overflow
Next
From: Daniel Gustafsson
Date:
Subject: Re: oauth integer overflow