Re: persist logical slots to disk during shutdown checkpoint - Mailing list pgsql-hackers
From | Ashutosh Bapat |
---|---|
Subject | Re: persist logical slots to disk during shutdown checkpoint |
Date | |
Msg-id | CAExHW5uQHcr10Xu4qT2dY-Nki3un4wrSDUB9SfO8afRcqwh=yA@mail.gmail.com Whole thread Raw |
In response to | Re: persist logical slots to disk during shutdown checkpoint (Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>) |
Responses |
Re: persist logical slots to disk during shutdown checkpoint
|
List | pgsql-hackers |
On Tue, Aug 29, 2023 at 5:40 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > > I am looking at it. If you can wait till the end of the week, that > will be great. /* * Successfully wrote, unset dirty bit, unless somebody dirtied again - * already. + * already and remember the confirmed_flush LSN value. */ SpinLockAcquire(&slot->mutex); if (!slot->just_dirtied) slot->dirty = false; + slot->last_saved_confirmed_flush = slot->data.confirmed_flush; If the slot->data.confirmed_flush gets updated concurrently between copying it to be written to the disk and when it's written to last_saved_confirmed_flush, we will miss one update. I think we need to update last_saved_confirmed_flush based on the cp.slotdata.confirmed_flush rather than slot->data.confirmed_flush. We are setting last_saved_confirmed_flush for all types of slots but using it only when the slot is logical. Should we also set it only for logical slots? /* first check whether there's something to write out */ SpinLockAcquire(&slot->mutex); was_dirty = slot->dirty; slot->just_dirtied = false; + confirmed_flush_has_changed = (slot->data.confirmed_flush != slot->last_saved_confirmed_flush); The confirmed_flush LSN should always move forward, otherwise there may not be enough WAL retained for the slot to work. I am wondering whether we should take an opportunity to make sure Assert(slot->data.confirmed_flush <= slot->last_saved_confirmed_flush) - /* and don't do anything if there's nothing to write */ - if (!was_dirty) + /* Don't do anything if there's nothing to write. See ReplicationSlot. */ + if (!was_dirty && + !(is_shutdown && SlotIsLogical(slot) && confirmed_flush_has_changed)) Rather than complicating this condition, I wonder whether it's better to just set was_dirty = true when is_shutdown && SlotIsLogical(slot) && confirmed_flush_has_changed) or even slot->dirty = true. See also the note at the end of the email. + + /* + * We won't ensure that the slot is persisted after the confirmed_flush + * LSN is updated as that could lead to frequent writes. However, we need + * to ensure that we do persist the slots at the time of shutdown whose + * confirmed_flush LSN is changed since we last saved the slot to disk. + * This will help in avoiding retreat of the confirmed_flush LSN after + * restart. This variable is used to track the last saved confirmed_flush + * LSN value. + */ This comment makes more sense in SaveSlotToPath() than here. We may decide to use last_saved_confirmed_flush for something else in future. + +sub compare_confirmed_flush +{ + my ($node, $confirmed_flush_from_log) = @_; + + # Fetch Latest checkpoint location from the control file + my ($stdout, $stderr) = + run_command([ 'pg_controldata', $node->data_dir ]); + my @control_data = split("\n", $stdout); + my $latest_checkpoint = undef; + foreach (@control_data) + { + if ($_ =~ /^Latest checkpoint location:\s*(.*)$/mg) + { + $latest_checkpoint = $1; + last; + } + } + die "Latest checkpoint location not found in control file\n" + unless defined($latest_checkpoint); + + # Is it same as the value read from log? + ok( $latest_checkpoint eq $confirmed_flush_from_log, + "Check that the slot's confirmed_flush LSN is the same as the latest_checkpoint location" + ); This function assumes that the subscriber will receive and confirm WAL upto checkpoint location and publisher's WAL sender will update it in the slot. Where is the code to ensure that? Does the WAL sender process wait for checkpoint LSN to be confirmed when shutting down? + +# Restart the publisher to ensure that the slot will be persisted if required +$node_publisher->restart(); Can we add this test comparing LSNs after publisher restart, to an existing test itself - like basic replication. That's the only extra thing that this test does beyond usual replication stuff. + +# Wait until the walsender creates decoding context +$node_publisher->wait_for_log( + qr/Streaming transactions committing after ([A-F0-9]+\/[A-F0-9]+), reading WAL from ([A-F0-9]+\/[A-F0-9]+)./, + $offset); + +# Extract confirmed_flush from the logfile +my $log_contents = slurp_file($node_publisher->logfile, $offset); +$log_contents =~ + qr/Streaming transactions committing after ([A-F0-9]+\/[A-F0-9]+), reading WAL from ([A-F0-9]+\/[A-F0-9]+)./ + or die "could not get confirmed_flush_lsn"; Why are we figuring out the LSN from the log file? Is it not available from pg_replication_slots view? If we do so, I think this test will fail is the slot gets written after the restart because of concurrent activity on the publisher (like autovacuum, or other things that cause empty transaction to be replicated) and subscriber. A very rare chance but not 0 probability one. I think we should shut down subscriber, restart publisher and then make this check based on the contents of the replication slot instead of server log. Shutting down subscriber will ensure that the subscriber won't send any new confirmed flush location to the publisher after restart. All the places which call ReplicationSlotSave() mark the slot as dirty. All the places where SaveSlotToPath() is called, the slot is marked dirty except when calling from CheckPointReplicationSlots(). So I am wondering whether we should be marking the slot dirty in CheckPointReplicationSlots() and avoid passing down is_shutdown flag to SaveSlotToPath(). Unrelated to this patch, I noticed that all the callers of SaveSlotToPath() have the same code to craft replication slot file path. I wonder if that needs to be macro'ised or added to some common function or to be pushed into SaveSlotToPath() itself to make sure that any changes to the path in future are consistent for all callers of SaveSlotToPath(). Interestingly slot->data.name is accessed without a lock here. Name of the slot does not change after creation so this isn't a problem right now. But generally against the principle of accessing data protected by a mutex. -- Best Wishes, Ashutosh Bapat
pgsql-hackers by date: