Re: [BUG] non archived WAL removed during production crash recovery - Mailing list pgsql-bugs
From | Michael Paquier |
---|---|
Subject | Re: [BUG] non archived WAL removed during production crash recovery |
Date | |
Msg-id | 20200413071414.GF2169@paquier.xyz Whole thread Raw |
In response to | Re: [BUG] non archived WAL removed during production crash recovery (Kyotaro Horiguchi <horikyota.ntt@gmail.com>) |
Responses |
Re: [BUG] non archived WAL removed during production crash recovery
|
List | pgsql-bugs |
On Fri, Apr 10, 2020 at 11:14:54AM +0900, Kyotaro Horiguchi wrote: > I have the same feeling with Michael. The test that archives are > created correctly seems to fit the file. It would be unintentionally > that the file is not exercising archiving so much. I have been finally looking at this thread and the latest patch set, sorry for the late reply. XLogCtl->XLogCacheBlck = XLOGbuffers - 1; - XLogCtl->SharedRecoveryInProgress = true; XLogCtl->SharedHotStandbyActive = false; XLogCtl->SharedPromoteIsTriggered = false; XLogCtl->WalWriterSleeping = false; [...] + switch (ControlFile->state) + { + case DB_SHUTDOWNED: + case DB_SHUTDOWNED_IN_RECOVERY: + XLogCtl->SharedRecoveryState = ARCHIVE_RECOVERING; + break; + default: + XLogCtl->SharedRecoveryState = CRASH_RECOVERING; + } It seems to me that the initial value of SharedRecoveryState should be CRASH_RECOVERING all the time no? StartupXLOG() is a code path taken even if starting cleanly, and the flag would be reset correctly at the end to NOT_IN_RECOVERY. +typedef enum RecoveryState +{ + NOT_IN_RECOVERY = 0, /* currently in production */ + CRASH_RECOVERING, /* recovering from a crash */ + ARCHIVE_RECOVERING /* recovering archives as requested */ +} RecoveryState; I also have some issues with the name of those variables, here is an idea for the three states: - RECOVERY_STATE_CRASH - RECOVERT_STATE_ARCHIVE - RECOVERY_STATE_NONE I would recommend to use the same suffix for those variables to ease grepping. /* - * Local copy of SharedRecoveryInProgress variable. True actually means "not - * known, need to check the shared state". + * This is false when SharedRecoveryState is not ARCHIVE_RECOVERING. + * True actually means "not known, need to check the shared state". */ A double negation sounds wrong to me. And actually this variable is false when the shared state is set to NOT_IN_RECOVERY, and true when the state is either CRASH_RECOVERING or ARCHIVE_RECOVERING because it means that recovery is running, be it archive recovery or crash recovery, so the comment is wrong. + /* The file is always deletable if archive_mode is "off". */ + if ( ! XLogArchivingActive() ) + return true; [...] + if ( recoveryState == ARCHIVE_RECOVERING && !XLogArchivingAlways() ) return true; Incorrect indentation. UpdateControlFile(); + + SpinLockAcquire(&XLogCtl->info_lck); + XLogCtl->SharedRecoveryState = ARCHIVE_RECOVERING; + SpinLockRelease(&XLogCtl->info_lck); + LWLockRelease(ControlFileLock); Here, the shared flag is updated while holding ControlFileLock to ensure a consistent state of things within shared memory, so it would be nice to add a comment about that. +RecoveryState +GetRecoveryState(void) +{ + volatile XLogCtlData *xlogctl = XLogCtl; + + return xlogctl->SharedRecoveryState; +} Er, you need to acquire info_lck to look at the state here, no? /* - * The file is always deletable if archive_mode is "off". On standbys - * archiving is disabled if archive_mode is "on", and enabled with - * "always". On a primary, archiving is enabled if archive_mode is "on" - * or "always". + * On standbys, the file is deletable if archive_mode is not + * "always". */ It would be good to mention that while in crash recovery, files are not considered as deletable. I agree that having a separate .pl file for the tests of this thread is just cleaner. Here are more comments about these. +# temporary fail archive_command for futur tests +$node->safe_psql('postgres', q{ + ALTER SYSTEM SET archive_command TO 'false'; + SELECT pg_reload_conf(); +}); That's likely portable on Windows even if you skip the tests there, and I am not sure that it is a good idea to rely on it being in PATH. Depending on the system, the path of the command is also likely going to be different. As here the goal is to prevent the archiver to do its work, why not relying on the configuration where archive_mode is set but archive_command is not? This would cause the archiver to be a no-op process, and .ready files will remain around. You could then replace the lookup of pg_stat_archiver with poll_query_until() and a query that makes use of pg_stat_file() to make sure that the .ready exists when needed. + ok( -f "$node_data/pg_wal/archive_status/000000010000000000000001.ready", + "WAL still ready to archive in archive_status"); It would be good to mention in the description the check applies to a primary. +# test recovery without archive_mode=always does not keep .ready WALs +$standby1 = get_new_node('standby'); +$standby1->init_from_backup($node, 'backup', has_restoring => 1); +$standby1_data = $standby1->data_dir; +$standby1->start; +$standby1->safe_psql('postgres', q{CHECKPOINT}); For readability archive_mode = on should be added to the configuration file? Okay, this is inherited from the primary, still that would avoid any issues with this code is refactored in some way. "WAL waiting to be archived in backup removed with archive_mode=on on standby" ); That should be "WAL segment" or "WAL file", but not WAL. Regarding the tests on a standby, it seems to me that the following is necessary: 1) Test that with archive_mode = on, segments are not marked with .ready. 2) Test that with archive_mode = always, segments are marked with .ready during archive recovery. 3) Test that with archive_mode = always, segments are not removed during crash recovery. I can see tests for 1) and 2), but not 3). Could you add a stop('immediate')+start() for $standby2 at the end of 020_archive_status.pl and check that the .ready file is still there after crash recovery? The end of the tests actually relies on the fact that archive_command is set to "false" when the cold backup is taken, before resetting it. I think that it would be cleaner to enforce the configuration you want to test before starting each standby. It becomes easier to understand the flow of the test for the reader. -- Michael
Attachment
pgsql-bugs by date: