Re: .ready and .done files considered harmful - Mailing list pgsql-hackers

From Dipesh Pandit
Subject Re: .ready and .done files considered harmful
Date
Msg-id CAN1g5_EhWvwmE=_b2sYYZOQF7QGO13D2A4Wd1gb9H2zJsO-rWg@mail.gmail.com
Whole thread Raw
In response to Re: .ready and .done files considered harmful  ("Bossart, Nathan" <bossartn@amazon.com>)
Responses Re: .ready and .done files considered harmful
Re: .ready and .done files considered harmful
List pgsql-hackers
> If a .ready file is created out of order, the directory scan logic
> will pick it up about as soon as possible based on its priority.  If
> the archiver is keeping up relatively well, there's a good chance such
> a file will have the highest archival priority and will be picked up
> the next time the archiver looks for a file to archive.  With the
> patch proposed in this thread, an out-of-order .ready file has no such
> guarantee.  As long as the archiver never has to fall back to a
> directory scan, it won't be archived.  The proposed patch handles the
> case where RemoveOldXlogFiles() creates missing .ready files by
> forcing a directory scan, but I'm not sure this is enough.  I think we
> have to check the archiver state each time we create a .ready file to
> see whether we're creating one out-of-order.

We can handle the scenario where .ready file is created out of order
in XLogArchiveNotify(). This way we can avoid making an explicit call
to enable directory scan from different code paths which may result
into creating an out of order .ready file.

Archiver can store the segment number corresponding to the last or most
recent .ready file found. When a .ready file is created in XLogArchiveNotify(),
the log segment number of the current .ready file can be compared with the
segment number of the last .ready file found at archiver to detect if this file is
created out of order. A directory scan can be forced if required.

I have incorporated these changes in patch v11.

> While this may be an extremely rare problem in practice, archiving
> something after the next checkpoint completes seems better than never
> archiving it at all.  IMO this isn't an area where there is much space
> to take risks.

An alternate approach could be to force a directory scan at checkpoint to
break the infinite wait for a .ready file which is being missed due to the
fact that it is created out of order. This will make sure that the file
gets archived within the checkpoint boundaries.

Thoughts?

Please find attached patch v11.

Thanks,
Dipesh
Attachment

pgsql-hackers by date:

Previous
From: Ajin Cherian
Date:
Subject: Re: Failure of subscription tests with topminnow
Next
From: Masahiko Sawada
Date:
Subject: Re: Failure of subscription tests with topminnow