Re: .ready and .done files considered harmful - Mailing list pgsql-hackers

From Dipesh Pandit
Subject Re: .ready and .done files considered harmful
Date
Msg-id CAN1g5_GCTCtDpVqXdvE0wn9_ZJqRdgMCApUp_pACfwvFzQ6k0Q@mail.gmail.com
Whole thread Raw
In response to Re: .ready and .done files considered harmful  ("Bossart, Nathan" <bossartn@amazon.com>)
Responses Re: .ready and .done files considered harmful
Re: .ready and .done files considered harmful
List pgsql-hackers
Thanks for the feedback.

> The latest post on this thread contained a link to this one, and it
> made me want to rewind to this point in the discussion. Suppose we
> have the following alternative scenario:
>
> Let's say step 1 looks for WAL file 10, but 10.ready doesn't exist
> yet.  The following directory scan ends up finding 12.ready.  Just
> before we update the PgArch state, XLogArchiveNotify() is called and
> creates 11.ready.  However, pg_readyXlog() has already decided to
> return WAL segment 12 and update the state to look for 13 next.
>
> Now, if I'm not mistaken, using <= doesn't help at all.
>
> In my opinion, the problem here is that the natural way to ask "is
> this file being archived out of order?" is to ask yourself "is the
> file that I'm marking as ready for archiving now the one that
> immediately follows the last one I marked as ready for archiving?" and
> then invert the result. That is, if I last marked 10 as ready, and now
> I'm marking 11 as ready, then it's in order, but if I'm now marking
> anything else whatsoever, then it's out of order. But that's not what
> this does. Instead of comparing what it's doing now to what it did
> last, it compares what it did now to what the archiver did last.

I agree that when we are creating a .ready file we should compare
the current .ready file with the last .ready file to check if this file is
created out of order. We can store the state of the last .ready file
in shared memory and compare it with the current .ready file. I
believe that archiver specific shared memory area can be used
to store the state of the last .ready file unless I am missing
something and this needs to be stored in a separate shared
memory area.

With this change, we have the flexibility to move the current archiver
state out of shared memory and keep it local to archiver. I have
incorporated these changes and updated a new patch.


> > And it's really not obvious that that's correct. I think that the
> > above argument actually demonstrates a flaw in the logic, but even if
> > not, or even if it's too small a flaw to be a problem in practice, it
> > seems a lot harder to reason about.
>
> I certainly agree that it's harder to reason about.  If we were to go
> the keep-trying-the-next-file route, we could probably minimize a lot
> of the handling for these rare cases by banking on the "fallback"
> directory scans.  Provided we believe these situations are extremely
> rare, some extra delay for an archive every once in a while might be
> acceptable.

+1. We are forcing a directory scan at the checkpoint and it will make sure
that any missing file gets archived within the checkpoint boundaries.

Please find the attached patch.

Thanks,
Dipesh
Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Physical replication from x86_64 to ARM64
Next
From: Dilip Kumar
Date:
Subject: Re: refactoring basebackup.c