Re: .ready and .done files considered harmful - Mailing list pgsql-hackers
From | Bossart, Nathan |
---|---|
Subject | Re: .ready and .done files considered harmful |
Date | |
Msg-id | 40DBACB6-963B-434D-99A3-B21E607FF240@amazon.com Whole thread Raw |
In response to | Re: .ready and .done files considered harmful (Dipesh Pandit <dipesh.pandit@gmail.com>) |
List | pgsql-hackers |
On 9/14/21, 7:23 AM, "Dipesh Pandit" <dipesh.pandit@gmail.com> wrote: > I agree that when we are creating a .ready file we should compare > the current .ready file with the last .ready file to check if this file is > created out of order. We can store the state of the last .ready file > in shared memory and compare it with the current .ready file. I > believe that archiver specific shared memory area can be used > to store the state of the last .ready file unless I am missing > something and this needs to be stored in a separate shared > memory area. > > With this change, we have the flexibility to move the current archiver > state out of shared memory and keep it local to archiver. I have > incorporated these changes and updated a new patch. I wonder if this can be simplified even further. If we don't bother trying to catch out-of-order .ready files in XLogArchiveNotify() and just depend on the per-checkpoint/restartpoint directory scans, we can probably remove lastReadySegNo from archiver state completely. + /* Force a directory scan if we are archiving anything but a regular + * WAL file or if this WAL file is being created out-of-order. + */ + if (!IsXLogFileName(xlog)) + PgArchForceDirScan(); + else + { + TimeLineID tli; + XLogSegNo last_segno; + XLogSegNo this_segno; + + last_segno = PgArchGetLastReadySegNo(); + XLogFromFileName(xlog, &tli, &this_segno, wal_segment_size); + + /* + * Force a directory scan in case if this .ready file created out of + * order. + */ + last_segno++; + if (last_segno != this_segno) + PgArchForceDirScan(); + + PgArchSetLastReadySegNo(this_segno); + } This is an interesting idea, but the "else" block here seems prone to race conditions. I think we'd have to hold arch_lck to prevent that. But as I mentioned above, if we are okay with depending on the fallback directory scans, I think we can remove the "else" block completely. + /* Initialize the current state of archiver */ + xlogState.lastSegNo = MaxXLogSegNo; + xlogState.lastTli = MaxTimeLineID; It looks like we have two ways to force a directory scan. We can either set force_dir_scan to true, or lastSegNo can be set to MaxXLogSegNo. Why not just set force_dir_scan to true here so that we only have one way to force a directory scan? + /* + * Failed to archive, make sure that archiver performs a + * full directory scan in the next cycle to avoid missing + * the WAL file which could not be archived due to some + * failure in current cycle. + */ + PgArchForceDirScan(); Don't we also need to force a directory scan in the other cases we return early from pgarch_ArchiverCopyLoop()? We will have already advanced the archiver state in pgarch_readyXlog(), so I think we'd end up skipping files if we didn't. For example, if archive_command isn't set, we'll just return, and the next call to pgarch_readyXlog() might return the next file. + /* Continue directory scan until we find a regular WAL file */ + SpinLockAcquire(&PgArch->arch_lck); + PgArch->force_dir_scan = true; + SpinLockRelease(&PgArch->arch_lck); nitpick: I think we should just call PgArchForceDirScan() here. Nathan
pgsql-hackers by date: