Re: .ready and .done files considered harmful - Mailing list pgsql-hackers

From Bossart, Nathan
Subject Re: .ready and .done files considered harmful
Date
Msg-id 38E8A6F6-4D00-442B-B14A-26F7D3AA898E@amazon.com
Whole thread Raw
In response to Re: .ready and .done files considered harmful  (Dipesh Pandit <dipesh.pandit@gmail.com>)
List pgsql-hackers
On 8/19/21, 5:42 AM, "Dipesh Pandit" <dipesh.pandit@gmail.com> wrote:
>> Should we have XLogArchiveNotify(), writeTimeLineHistory(), and
>> writeTimeLineHistoryFile() enable the directory scan instead?  Else,
>> we have to exhaustively cover all such code paths, which may be
>> difficult to maintain.  Another reason I am bringing this up is that
>> my patch for adjusting .ready file creation [0] introduces more
>> opportunities for .ready files to be created out-of-order.
>
> XLogArchiveNotify() notifies Archiver when a log segment is ready for
> archival by creating a .ready file. This function is being called for each 
> log segment and placing a call to enable directory scan here will result
> in directory scan for each log segment. 

Could we have XLogArchiveNotify() check the archiver state and only
trigger a directory scan if we detect that we are creating an out-of-
order .ready file?

> There is one possible scenario where it may run into a race condition. If
> archiver has just finished archiving all .ready files and the next anticipated
> log segment is not available then in this case archiver takes the fall-back 
> path to scan directory. It resets the flag before it begins directory scan. 
> Now, if a directory scan is enabled by a timeline switch or .ready file created
> out of order in parallel to the event that the archiver resets the flag then this
> might result in a race condition. But in this case also archiver is eventually 
> going to perform a directory scan and the desired file will be archived as part
> of directory scan. Apart of this I can't think of any other scenario which may 
> result into a race condition unless I am missing something.

What do you think about adding an upper limit to the number of files
we can archive before doing a directory scan?  The more I think about
the directory scan flag, the more I believe it is a best-effort tool
that will remain prone to race conditions.  If we have a guarantee
that a directory scan will happen within the next N files, there's
probably less pressure to make sure that it's 100% correct.

On an unrelated note, do we need to add some extra handling for backup
history files and partial WAL files?

Nathan


pgsql-hackers by date:

Previous
From: Hannu Krosing
Date:
Subject: Re: Middleware Messages for FE/BE
Next
From: David Christensen
Date:
Subject: [PATCH] Proof of concept for GUC improvements