Re: Lowering the default wal_blocksize to 4K - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Lowering the default wal_blocksize to 4K
Date
Msg-id CA+TgmoZUP6kc9-FwE=BnsZHH17LE4-G3XjvMA2QFOQfYWVOXiQ@mail.gmail.com
Whole thread Raw
In response to Re: Lowering the default wal_blocksize to 4K  (Ants Aasma <ants@cybertec.at>)
List pgsql-hackers
On Thu, Oct 12, 2023 at 9:57 AM Ants Aasma <ants@cybertec.at> wrote:
> This reminds me that xlp_tli is not being used to its full potential right now either. We only check that it's not
goingbackwards, but there is at least one not very hard to hit way to get postgres to silently replay on the wrong
timeline.[1] 
>
> [1] https://www.postgresql.org/message-id/CANwKhkMN3QwAcvuDZHb6wsvLRtkweBiYso-KLFykkQVWuQLcOw@mail.gmail.com

Maybe I'm missing something, but that seems mostly unrelated. What
you're discussing there is the server's ability to figure out when it
ought to perform a timeline switch. In other words, the server settles
on the wrong TLI and therefore opens and reads from the wrong
filename. But here, we're talking about the case where the server is
correct about the TLI and LSN and hence opens exactly the right file
on disk, but the contents of the file on disk aren't what they're
supposed to be due to a procedural error.

Said differently, I don't see how anything we could do with xlp_tli
would actually fix the problem discussed in that thread. That can
detect a situation where the TLI of the file doesn't match the TLI of
the pages inside the file, but it doesn't help with the case where the
server decided to read the wrong file in the first place.

But this does make me wonder whether storing xlp_tli and xlp_pageaddr
in every page is really worth the bit-space. That takes 12 bytes plus
any padding it forces us to incur, but the actual entropy content of
those 12 bytes must be quite low. In normal cases probably 7 or so of
those bytes are going to consist entirely of zero bits (TLI < 256,
LSN%8k ==  0, LSN < 2^40). We could probably find a way of jumbling
the LSN, TLI, and maybe some other stuff into an 8-byte quantity or
even perhaps a 4-byte quantity that would do about as good a job
catching problems as what we have now (e.g.
LSN_HIGH32^LSN_LOW32^BITREVERSE(TLI)). In the event of a mismatch, the
value actually stored in the page header would be harder for humans to
understand, but I'm not sure that really matters here. Users should
mostly be concerned with whether a WAL file matches the cluster where
they're trying to replay it; forensics on misplaced or corrupted WAL
files should be comparatively rare.

--
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: PostgreSQL domains and NOT NULL constraint
Next
From: Tomas Vondra
Date:
Subject: Re: logical decoding and replication of sequences, take 2