Re: Do away with zero-padding assumption before WALRead() - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: Do away with zero-padding assumption before WALRead()
Date
Msg-id 20240216.104013.1220145785391011056.horikyota.ntt@gmail.com
Whole thread Raw
In response to Do away with zero-padding assumption before WALRead()  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Responses Re: Do away with zero-padding assumption before WALRead()
List pgsql-hackers
At Tue, 13 Feb 2024 11:47:06 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in 
> Hi,
> 
> I noticed an assumption [1] at WALRead() call sites expecting the
> flushed WAL page to be zero-padded after the flush LSN. I think this
> can't always be true as the WAL can get flushed after determining the
> flush LSN before reading it from the WAL file using WALRead(). I've
> hacked the code up a bit to check if that's true -

Good catch! The comment seems wrong also to me. The subsequent bytes
can be written simultaneously, and it's very normal that there are
unflushed bytes are in OS's page buffer. The objective of the comment
seems to be to declare that there's no need to clear out the remaining
bytes, here. I agree that it's not a problem for now. However, I think
we need two fixes here.

1. It's useless to copy the whole page regardless of the 'count'. It's
  enough to copy only up to the 'count'. The patch looks good in this
  regard.

2. Maybe we need a comment that states the page_read callback
  functions leave garbage bytes beyond the returned count, due to
  partial copying without clearing the unused portion.

> I'm wondering why the WALRead() callers are always reading XLOG_BLCKSZ
> despite knowing exactly how much to read. Is it to tell the OS to
> explicitly fetch the whole page from the disk? If yes, the OS will do
> that anyway because the page transfers from disk to OS page cache are
> always in terms of disk block sizes, no?

If I understand your question correctly, I guess that the whole-page
copy was expected to clear out the remaining bytes, as I mentioned
earlier.

> Although, there's no immediate problem with it right now, the
> assumption is going to be incorrect when reading WAL from WAL buffers
> using WALReadFromBuffers -
> https://www.postgresql.org/message-id/CALj2ACV=C1GZT9XQRm4iN1NV1T=hLA_hsGWNx2Y5-G+mSwdhNg@mail.gmail.com.
>
> If we have no reason, can the WALRead() callers just read how much
> they want like walsender for physical replication? Attached a patch
> for the change.
> 
> Thoughts?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: shihao zhong
Date:
Subject: Re: Fix incorrect PG_GETARG in pgcrypto
Next
From: David Rowley
Date:
Subject: Re: make add_paths_to_append_rel aware of startup cost