Re: persist logical slots to disk during shutdown checkpoint - Mailing list pgsql-hackers

From Ashutosh Bapat
Subject Re: persist logical slots to disk during shutdown checkpoint
Date
Msg-id CAExHW5uQHcr10Xu4qT2dY-Nki3un4wrSDUB9SfO8afRcqwh=yA@mail.gmail.com
Whole thread Raw
In response to Re: persist logical slots to disk during shutdown checkpoint  (Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>)
Responses Re: persist logical slots to disk during shutdown checkpoint
List pgsql-hackers
On Tue, Aug 29, 2023 at 5:40 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> I am looking at it. If you can wait till the end of the week, that
> will be great.

     /*
      * Successfully wrote, unset dirty bit, unless somebody dirtied again
-     * already.
+     * already and remember the confirmed_flush LSN value.
      */
     SpinLockAcquire(&slot->mutex);
     if (!slot->just_dirtied)
         slot->dirty = false;
+    slot->last_saved_confirmed_flush = slot->data.confirmed_flush;

If the slot->data.confirmed_flush gets updated concurrently between copying it
to be written to the disk and when it's written to last_saved_confirmed_flush,
we will miss one update. I think we need to update last_saved_confirmed_flush
based on the cp.slotdata.confirmed_flush rather than
slot->data.confirmed_flush.

We are setting last_saved_confirmed_flush for all types of slots but using it
only when the slot is logical. Should we also set it only for logical slots?

     /* first check whether there's something to write out */
     SpinLockAcquire(&slot->mutex);
     was_dirty = slot->dirty;
     slot->just_dirtied = false;
+    confirmed_flush_has_changed = (slot->data.confirmed_flush !=
slot->last_saved_confirmed_flush);

The confirmed_flush LSN should always move forward, otherwise there may not be
enough WAL retained for the slot to work. I am wondering whether we should take
an opportunity to make sure
Assert(slot->data.confirmed_flush <= slot->last_saved_confirmed_flush)

-    /* and don't do anything if there's nothing to write */
-    if (!was_dirty)
+    /* Don't do anything if there's nothing to write. See ReplicationSlot. */
+    if (!was_dirty &&
+        !(is_shutdown && SlotIsLogical(slot) && confirmed_flush_has_changed))

Rather than complicating this condition, I wonder whether it's better to just
set was_dirty = true when is_shutdown && SlotIsLogical(slot) &&
confirmed_flush_has_changed) or even slot->dirty = true. See also the note at
the end of the email.

+
+    /*
+     * We won't ensure that the slot is persisted after the confirmed_flush
+     * LSN is updated as that could lead to frequent writes.  However, we need
+     * to ensure that we do persist the slots at the time of shutdown whose
+     * confirmed_flush LSN is changed since we last saved the slot to disk.
+     * This will help in avoiding retreat of the confirmed_flush LSN after
+     * restart.  This variable is used to track the last saved confirmed_flush
+     * LSN value.
+     */

This comment makes more sense in SaveSlotToPath() than here. We may decide to
use last_saved_confirmed_flush for something else in future.
+
+sub compare_confirmed_flush
+{
+    my ($node, $confirmed_flush_from_log) = @_;
+
+    # Fetch Latest checkpoint location from the control file
+    my ($stdout, $stderr) =
+      run_command([ 'pg_controldata', $node->data_dir ]);
+    my @control_data      = split("\n", $stdout);
+    my $latest_checkpoint = undef;
+    foreach (@control_data)
+    {
+        if ($_ =~ /^Latest checkpoint location:\s*(.*)$/mg)
+        {
+            $latest_checkpoint = $1;
+            last;
+        }
+    }
+    die "Latest checkpoint location not found in control file\n"
+      unless defined($latest_checkpoint);
+
+    # Is it same as the value read from log?
+    ok( $latest_checkpoint eq $confirmed_flush_from_log,
+        "Check that the slot's confirmed_flush LSN is the same as the
latest_checkpoint location"
+    );

This function assumes that the subscriber will receive and confirm WAL upto
checkpoint location and publisher's WAL sender will update it in the slot.
Where is the code to ensure that? Does the WAL sender process wait for
checkpoint
LSN to be confirmed when shutting down?
+
+# Restart the publisher to ensure that the slot will be persisted if required
+$node_publisher->restart();

Can we add this test comparing LSNs after publisher restart, to an existing
test itself - like basic replication. That's the only extra thing that this
test does beyond usual replication stuff.

+
+# Wait until the walsender creates decoding context
+$node_publisher->wait_for_log(
+    qr/Streaming transactions committing after
([A-F0-9]+\/[A-F0-9]+), reading WAL from ([A-F0-9]+\/[A-F0-9]+)./,
+    $offset);
+
+# Extract confirmed_flush from the logfile
+my $log_contents = slurp_file($node_publisher->logfile, $offset);
+$log_contents =~
+  qr/Streaming transactions committing after ([A-F0-9]+\/[A-F0-9]+),
reading WAL from ([A-F0-9]+\/[A-F0-9]+)./
+  or die "could not get confirmed_flush_lsn";

Why are we figuring out the LSN from the log file? Is it not available from
pg_replication_slots view? If we do so, I think this test will fail is the slot
gets written after the restart because of concurrent activity on the publisher
(like autovacuum, or other things that cause empty transaction to be
replicated) and subscriber. A very rare chance but not 0 probability one. I
think we should shut down subscriber, restart publisher and then make this
check based on the contents of the replication slot instead of server log.
Shutting down subscriber will ensure that the subscriber won't send any new
confirmed flush location to the publisher after restart.

All the places which call ReplicationSlotSave() mark the slot as dirty.  All
the places where SaveSlotToPath() is called, the slot is marked dirty except
when calling from CheckPointReplicationSlots(). So I am wondering whether we
should be marking the slot dirty in CheckPointReplicationSlots() and avoid
passing down is_shutdown flag to SaveSlotToPath().

Unrelated to this patch, I noticed that all the callers of SaveSlotToPath()
have the same code to craft replication slot file path. I wonder if that needs
to be macro'ised or added to some common function or to be pushed into
SaveSlotToPath() itself to make sure that any changes to the path in future are
consistent for all callers of SaveSlotToPath().  Interestingly slot->data.name
is accessed without a lock here. Name of the slot does not change after
creation so this isn't a problem right now. But generally against the principle
of accessing data protected by a mutex.

--
Best Wishes,
Ashutosh Bapat



pgsql-hackers by date:

Previous
From: Matthias van de Meent
Date:
Subject: Re: Disabling Heap-Only Tuples
Next
From: Robert Haas
Date:
Subject: Re: Disabling Heap-Only Tuples