Re: prevent immature WAL streaming - Mailing list pgsql-hackers

From Andres Freund
Subject Re: prevent immature WAL streaming
Date
Msg-id 20210831155333.2vv2zdf7v4nhh2m2@alap3.anarazel.de
Whole thread Raw
In response to Re: prevent immature WAL streaming  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
Responses Re: prevent immature WAL streaming  (Fujii Masao <masao.fujii@oss.nttdata.com>)
List pgsql-hackers
Hi,

On 2021-08-31 09:56:30 -0400, Alvaro Herrera wrote:
> On 2021-Aug-30, Andres Freund wrote:
> > I think a better approach might be to handle this on the WAL layout
> > level. What if we never overwrite partial records but instead just
> > skipped over them during decoding?
>
> Maybe this is a workable approach, let's work it out fully.
>
> Let me see if I understand what you mean:
> * We would remove the logic to inhibit archiving and streaming-
>   replicating the tail end of a split WAL record; that logic deals with
>   bytes only, so doesn't have to be aware of record boundaries.
> * On WAL replay, we ignore records that are split across a segment
>   boundary and whose checksum does not match.
> * On WAL write ... ?

I was thinking that on a normal WAL write we'd do nothing. Instead we would
have dedicated code at the end of recovery that, if the WAL ends in a partial
record, changes the page following the "valid" portion of the WAL to indicate
that an incomplete record is to be skipped.

Of course, we need to be careful to not weaken WAL validity checking too
much. How about the following:

If we're "aborting" a continued record, we set XLP_FIRST_IS_ABORTED_PARTIAL on
the page at which we do so (i.e. the page after the valid end of the WAL).

On a page with XLP_FIRST_IS_ABORTED_PARTIAL we expect a special type of record
to start just after the page header. That record contains sufficient
information for us to verify the validity of the partial record (since its
checksum and length aren't valid, and may not even be all readable if the
record header itself was split). I think it would make sense to include the
LSN of the aborted record, and a checksum of the partial data.


> How do we detect after recovery that a record that was being written,
> and potentially was sent to the archive, needs to be "skipped"?

I think we can just read the WAL and see if it ends with a partial
record. It'd add a bit of complication to the error checking in xlogreader,
because we'd likely want to treat verification from page headers a bit
different from verification due to record data. But that seems doable.

Does this make sense?

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Sehrope Sarkuni
Date:
Subject: Add jsonlog log_destination for JSON server logs
Next
From: Fujii Masao
Date:
Subject: Re: Fix around conn_duration in pgbench