Re: archive status ".ready" files may be created too early - Mailing list pgsql-hackers

From Bossart, Nathan
Subject Re: archive status ".ready" files may be created too early
Date
Msg-id EFF40306-8E8A-4259-B181-C84F3F06636C@amazon.com
Whole thread Raw
In response to Re: archive status ".ready" files may be created too early  (Anastasia Lubennikova <a.lubennikova@postgrespro.ru>)
Responses Re: archive status ".ready" files may be created too early
List pgsql-hackers
Apologies for the long delay.

I've spent a good amount of time thinking about this bug and trying
out a few different approaches for fixing it.  I've attached a work-
in-progress patch for my latest attempt.

On 10/13/20, 5:07 PM, "Kyotaro Horiguchi" <horikyota.ntt@gmail.com> wrote:
>           F0        F1
>         AAAAA  F  BBBBB
> |---------|---------|---------|
>    seg X    seg X+1   seg X+2
>
> Matsumura-san has a concern about the case where there are two (or
> more) partially-flushed segment-spanning records at the same time.
>
> This patch remembers only the last cross-segment record. If we were
> going to flush up to F0 after Record-B had been written, we would fail
> to hold-off archiving seg-X. This patch is based on a assumption that
> that case cannot happen because we don't leave a pending page at the
> time of segment switch and no records don't span over three or more
> segments.

I wonder if these are safe assumptions to make.  For your example, if
we've written record B to the WAL buffers, but neither record A nor B
have been written to disk or flushed, aren't we still in trouble?
Also, is there actually any limit on WAL record length that means that
it is impossible for a record to span over three or more segments?
Perhaps these assumptions are true, but it doesn't seem obvious to me
that they are, and they might be pretty fragile.

The attached patch doesn't make use of these assumptions.  Instead, we
track the positions of the records that cross segment boundaries in a
small hash map, and we use that to determine when it is safe to mark a
segment as ready for archival.  I think this approach resembles
Matsumura-san's patch from June.

As before, I'm not handling replication, archive_timeout, and
persisting latest-marked-ready through crashes yet.  For persisting
the latest-marked-ready segment through crashes, I was thinking of
using a new file that stores the segment number.

Nathan


Attachment

pgsql-hackers by date:

Previous
From: John Naylor
Date:
Subject: Re: cutting down the TODO list thread
Next
From: Tom Lane
Date:
Subject: Re: HASH_BLOBS hazards (was Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions)