Re: prevent immature WAL streaming - Mailing list pgsql-hackers

From Alvaro Herrera
Subject Re: prevent immature WAL streaming
Date
Msg-id 202108311356.sl33wcpcz5x6@alvherre.pgsql
Whole thread Raw
In response to Re: prevent immature WAL streaming  (Andres Freund <andres@anarazel.de>)
Responses Re: prevent immature WAL streaming  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On 2021-Aug-30, Andres Freund wrote:

> I'm doubtful that the approach of adding awareness of record boundaries
> is a good path to go down:

Honestly, I do not like it one bit and if I can avoid relying on them
while making the whole thing work correctly, I am happy.  Clearly it
wasn't a problem for the ancient recovery-only WAL design, but as soon
as we added replication on top the whole issue of continuation records
became a bug.

I do think that the code should be first correct and second performant,
though.
 
> - There are very similar issues with promotions of replicas (consider
>   what happens if we need to promote with the end of local WAL spanning
>   a segment boundary, and what happens to cascading replicas). We have
>   some logic to try to deal with that, but it's pretty grotty and I
>   think incomplete.

Ouch, I hadn't thought of cascading replicas.

> - It seems to make some future optimizations harder - we should work
>   towards replicating data sooner, rather than the opposite. Right now
>   that's a major bottleneck around syncrep.

Absolutely.

> I think a better approach might be to handle this on the WAL layout
> level. What if we never overwrite partial records but instead just
> skipped over them during decoding?

Maybe this is a workable approach, let's work it out fully.

Let me see if I understand what you mean:
* We would remove the logic to inhibit archiving and streaming-
  replicating the tail end of a split WAL record; that logic deals with
  bytes only, so doesn't have to be aware of record boundaries.
* On WAL replay, we ignore records that are split across a segment
  boundary and whose checksum does not match.
* On WAL write ... ?

How do we detect after recovery that a record that was being written,
and potentially was sent to the archive, needs to be "skipped"?

-- 
Álvaro Herrera              Valdivia, Chile  —  https://www.EnterpriseDB.com/



pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: pgsql: Avoid using ambiguous word "positive" in error message.
Next
From: vignesh C
Date:
Subject: Re: Added schema level support for publication.