On 2021-Aug-30, Andres Freund wrote:
> I'm doubtful that the approach of adding awareness of record boundaries
> is a good path to go down:
Honestly, I do not like it one bit and if I can avoid relying on them
while making the whole thing work correctly, I am happy. Clearly it
wasn't a problem for the ancient recovery-only WAL design, but as soon
as we added replication on top the whole issue of continuation records
became a bug.
I do think that the code should be first correct and second performant,
though.
> - There are very similar issues with promotions of replicas (consider
> what happens if we need to promote with the end of local WAL spanning
> a segment boundary, and what happens to cascading replicas). We have
> some logic to try to deal with that, but it's pretty grotty and I
> think incomplete.
Ouch, I hadn't thought of cascading replicas.
> - It seems to make some future optimizations harder - we should work
> towards replicating data sooner, rather than the opposite. Right now
> that's a major bottleneck around syncrep.
Absolutely.
> I think a better approach might be to handle this on the WAL layout
> level. What if we never overwrite partial records but instead just
> skipped over them during decoding?
Maybe this is a workable approach, let's work it out fully.
Let me see if I understand what you mean:
* We would remove the logic to inhibit archiving and streaming-
replicating the tail end of a split WAL record; that logic deals with
bytes only, so doesn't have to be aware of record boundaries.
* On WAL replay, we ignore records that are split across a segment
boundary and whose checksum does not match.
* On WAL write ... ?
How do we detect after recovery that a record that was being written,
and potentially was sent to the archive, needs to be "skipped"?
--
Álvaro Herrera Valdivia, Chile — https://www.EnterpriseDB.com/