On 8/19/21, 5:42 AM, "Dipesh Pandit" <dipesh.pandit@gmail.com> wrote:
>> Should we have XLogArchiveNotify(), writeTimeLineHistory(), and
>> writeTimeLineHistoryFile() enable the directory scan instead? Else,
>> we have to exhaustively cover all such code paths, which may be
>> difficult to maintain. Another reason I am bringing this up is that
>> my patch for adjusting .ready file creation [0] introduces more
>> opportunities for .ready files to be created out-of-order.
>
> XLogArchiveNotify() notifies Archiver when a log segment is ready for
> archival by creating a .ready file. This function is being called for each
> log segment and placing a call to enable directory scan here will result
> in directory scan for each log segment.
Could we have XLogArchiveNotify() check the archiver state and only
trigger a directory scan if we detect that we are creating an out-of-
order .ready file?
> There is one possible scenario where it may run into a race condition. If
> archiver has just finished archiving all .ready files and the next anticipated
> log segment is not available then in this case archiver takes the fall-back
> path to scan directory. It resets the flag before it begins directory scan.
> Now, if a directory scan is enabled by a timeline switch or .ready file created
> out of order in parallel to the event that the archiver resets the flag then this
> might result in a race condition. But in this case also archiver is eventually
> going to perform a directory scan and the desired file will be archived as part
> of directory scan. Apart of this I can't think of any other scenario which may
> result into a race condition unless I am missing something.
What do you think about adding an upper limit to the number of files
we can archive before doing a directory scan? The more I think about
the directory scan flag, the more I believe it is a best-effort tool
that will remain prone to race conditions. If we have a guarantee
that a directory scan will happen within the next N files, there's
probably less pressure to make sure that it's 100% correct.
On an unrelated note, do we need to add some extra handling for backup
history files and partial WAL files?
Nathan