Re: cannot freeze committed xmax - Mailing list pgsql-hackers

From Mark Dilger
Subject Re: cannot freeze committed xmax
Date
Msg-id 617882E6-22E2-4A39-9BF1-02375F4C20A0@enterprisedb.com
Whole thread Raw
In response to Re: cannot freeze committed xmax  ("Andrey M. Borodin" <x4mmm@yandex-team.ru>)
List pgsql-hackers

> On Nov 20, 2024, at 6:39 AM, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
>
>
>
>> On 20 Nov 2024, at 15:58, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
>>
>> PFA the patch doing so.
>
> Ugh. The patch is simply dysfunctional, sorry. xmax_status is being checked uninitiated.
> But, well, it highlights the idea: make verify_heapam() aware of such corruptions.
> What do you think?

I like the idea of increasing the corruption checking coverage.  The worry with these patches is that we'll overlook
somelegitimate use case of the status bits and call it corruption when it isn't.  Indeed, that appears to be the case
withthis patch, assuming I initialized the xmax_status field in the way you had in mind, and that applying it to
REL_17_STABLEis ok.  (Maybe this would work differently on HEAD?) 

+               get_xid_status(xmax, ctx, &xmax_status);
+               if (xmax_status == XID_COMMITTED && (tuphdr->t_infomask & HEAP_UPDATED))
+               {
+                       report_corruption(ctx,
+                                       psprintf("committed xmax %u while tuple has HEAP_XMAX_INVALID and HEAP_UPDATED
flags",
+                                       xmax));
+               }

That results in TAP test failures on a uncorrupted but frozen table:

# +++ tap check in contrib/amcheck +++
t/001_verify_heapam.pl ......... 74/?
#   Failed test 'all-frozen not corrupted table'
#   at t/001_verify_heapam.pl line 53.
#          got: '30|117||committed xmax 2 while tuple has HEAP_XMAX_INVALID and HEAP_UPDATED flags'
#     expected: ''
t/001_verify_heapam.pl ......... 257/? # Looks like you failed 1 test of 272.
t/001_verify_heapam.pl ......... Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/272 subtests

The first part of your patch which checks the xmin_status seems ok at first glance.


—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly
Next
From: Tom Lane
Date:
Subject: Re: Accessing other session's temp table