Re: Add a new pg_walinspect function to extract FPIs from WAL records - Mailing list pgsql-hackers

From Drouvot, Bertrand
Subject Re: Add a new pg_walinspect function to extract FPIs from WAL records
Date
Msg-id dced46f2-1f10-7ac7-2f5c-f11be5d49990@gmail.com
Whole thread Raw
In response to Re: Add a new pg_walinspect function to extract FPIs from WAL records  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Responses Re: Add a new pg_walinspect function to extract FPIs from WAL records
List pgsql-hackers
Hi,

On 1/6/23 6:41 PM, Bharath Rupireddy wrote:
> On Fri, Jan 6, 2023 at 11:47 AM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
>>
>> On Thu, Jan 5, 2023 at 6:51 PM Bharath Rupireddy
>> <bharath.rupireddyforpostgres@gmail.com> wrote:
>>>
>>>> I'm also wondering if it would make sense to extend the test coverage of it (and pg_waldump) to "validate" that
both
>>>> extracted images are the same and matches the one modified right after the checkpoint.
>>>>
>>>> What do you think? (could be done later in another patch though).
>>>
>>> I think pageinspect can be used here. We can fetch the raw page from
>>> the table after the checkpoint and raw FPI from the WAL record logged
>>> as part of the update. I've tried to do so [1], but I see a slight
>>> difference in the raw output. The expectation is that they both be the
>>> same. It might be that the update operation logs the FPI with some
>>> more info set (prune_xid). I'll try to see why it is so.
>>>
>>> I'm attaching the v2 patch for further review.
>>>
>>> [1]
>>> SELECT * FROM page_header(:'page_from_table');
>>>      lsn    | checksum | flags | lower | upper | special | pagesize |
>>> version | prune_xid
>>> -----------+----------+-------+-------+-------+---------+----------+---------+-----------
>>>   0/1891D78 |        0 |     0 |    40 |  8064 |    8192 |     8192 |
>>>      4 |         0
>>> (1 row)
>>>
>>> SELECT * FROM page_header(:'page_from_wal');
>>>      lsn    | checksum | flags | lower | upper | special | pagesize |
>>> version | prune_xid
>>> -----------+----------+-------+-------+-------+---------+----------+---------+-----------
>>>   0/1891D78 |        0 |     0 |    44 |  8032 |    8192 |     8192 |
>>>      4 |       735
>>> (1 row)
>>
>> Ugh, v2 patch missed the new file added, I'm attaching v3 patch for
>> further review. Sorry for the noise.
> 
> I took a stab at how and what gets logged as FPI in WAL records:
> 
> Option 1:
> WAL record with FPI contains both the unmodified table page from the
> disk after checkpoint and new tuple (not applied to the unmodified
> page) and the recovery (redo) applies the new tuple to the unmodified
> page as part of recovery. A bit more WAL is needed to store both
> unmodified page and new tuple data in the WAL record and recovery can
> get slower a bit too as it needs to stitch the modified page.
> 
> Option 2:
> WAL record with FPI contains only the modified page (new tuple applied
> to the unmodified page from the disk after checkpoint) and the
> recovery (redo)  just returns the applied block as BLK_RESTORED.
> Recovery can get faster with this approach and less WAL is needed to
> store just the modified page.
> 
> My earlier understanding was that postgres does option (1), however, I
> was wrong, option (2) is what actually postgres has implemented for
> the obvious advantages specified.
> 
> I now made the tests a bit stricter in checking the FPI contents
> (tuple values) pulled from the WAL record with raw page contents
> pulled from the table using the pageinspect extension. Please see the
> attached v4 patch.
> 

Thanks for updating the patch!

+-- Compare FPI from WAL record and page from table, they must be same

I think "must be the same" or "must be identical" sounds better (but not 100% sure).

Except this nit, V4 looks good to me.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: typos
Next
From: Ankit Kumar Pandey
Date:
Subject: Re: Todo: Teach planner to evaluate multiple windows in the optimal order