Hi,
Thanks for the feedback.
> + * by checking the availability of next WAL file. "xlogState" specifies the
> + * segment number and timeline ID corresponding to the next WAL file.
>
> "xlogState" probably needs to be updated here.
Yes, I updated the comment.
> As noted before [0], I think we need to force a directory scan at the
> beginning of pgarch_MainLoop() and when pgarch_ArchiverCopyLoop()
> returns before we exit the "while" loop. Else, there's probably a
> risk that we skip archiving a file until the next directory scan. IMO
> forcing a directory scan at the beginning of pgarch_ArchiverCopyLoop()
> is a simpler way to do roughly the same thing. I'm skeptical that
> persisting the next-anticipated state between calls to
> pgarch_ArchiverCopyLoop() is worth the complexity.
I think if we force a directory scan in pgarch_ArchiverCopyLoop() when it
returns before we exit the "while" loop or outside the loop then it may
result in directory scan for all WAL files in one of the scenarios that I can think of.
There could be two possible scenarios, first scenario in which the archiver
is always lagging and the second scenario in which archiver is in sync or
ahead with the rate at which WAL files are generated.
If we focus on the second scenario, then consider a case where the archiver has
just archived file 1.ready and is about to check the availability of 2.ready but the
file 2.ready is not available in archive status directory. Archiver performs a directory
scan as a fall-back mechanism and goes to wait state.(The current implementation
relies on notifying the archiver by creating a .ready file on disk. It may happen that
the file is ready file archival but due to slow notification mechanism there is a delay
in notification and archiver goes to wait state.) When file 2.ready is created on disk
archive is notified, it wakes up and calls pgarch_ArchiverCopyLoop(). Now if we
unconditionally force a directory scan in pgarch_ArchiverCopyLoop() then it may
result in directory scan for all WAL files in this scenario. In this case we have the
next anticipated log segment number and we can prevent an additional directory
scan. I have tested this with a small setup by creating ~2000 WAL files and it has
resulted in directory scan for each file.
I agree that the the failure scenario discussed in [0] will require a WAL file to
wait until the next directory scan. However, this can be avoided by forcing a
directory scan in pgarch_ArchiverCopyLoop() only in case of failure scenario.
This will make sure that when the archiver wakes up for the next cycle it
performs a full directory leaving out any risk of missing a file due to archive
failure. Additionally, it will also avoid additional directory scans mentioned in
above scenario.
I have incorporated the changes and updated a new patch. PFA patch.
Thanks,
Dipesh