Re: [PoC] pg_upgrade: allow to upgrade publisher node - Mailing list pgsql-hackers

From vignesh C
Subject Re: [PoC] pg_upgrade: allow to upgrade publisher node
Date
Msg-id CALDaNm1P6FxiNBGWMThnCkMUQ3Do4uzgzimKz91u3qoex4zb5Q@mail.gmail.com
Whole thread Raw
In response to RE: [PoC] pg_upgrade: allow to upgrade publisher node  ("Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>)
Responses RE: [PoC] pg_upgrade: allow to upgrade publisher node
List pgsql-hackers
On Fri, 14 Apr 2023 at 16:00, Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear Julien,
>
> > Sorry for the delay, I didn't had time to come back to it until this afternoon.
>
> No issues, everyone is busy:-).
>
> > I don't think that your analysis is correct.  Slots are guaranteed to be
> > stopped after all the normal backends have been stopped, exactly to avoid such
> > extraneous records.
> >
> > What is happening here is that the slot's confirmed_flush_lsn is properly
> > updated in memory and ends up being the same as the current LSN before the
> > shutdown.  But as it's a logical slot and those records aren't decoded, the
> > slot isn't marked as dirty and therefore isn't saved to disk.  You don't see
> > that behavior when doing a manual checkpoint before (per your script comment),
> > as in that case the checkpoint also tries to save the slot to disk but then
> > finds a slot that was marked as dirty and therefore saves it.
> >
> > In your script's scenario, when you restart the server the previous slot data
> > is restored and the confirmed_flush_lsn goes backward, which explains those
> > extraneous records.
>
> So you meant to say that the key point was that some records which are not sent
> to subscriber do not mark slots as dirty, hence the updated confirmed_flush was
> not written into slot file. Is it right? LogicalConfirmReceivedLocation() is called
> by walsender when the process gets reply from worker process, so your analysis
> seems correct.
>
> > It's probably totally harmless to throw away that value for now (and probably
> > also doesn't lead to crazy amount of work after restart, I really don't know
> > much about the logical slot code), but clearly becomes problematic with your
> > usecase.  One easy way to fix this is to teach the checkpoint code to force
> > saving the logical slots to disk even if they're not marked as dirty during a
> > shutdown checkpoint, as done in the attached v1 patch (renamed as .txt to not
> > interfere with the cfbot).  With this patch applied I reliably only see a final
> > shutdown checkpoint record with your scenario.
> >
> > Now such a change will make shutdown a bit more expensive when using logical
> > replication, even if in 99% of cases you will not need to save the
> > confirmed_flush_lsn value, so I don't know if that's acceptable or not.
>
> In any case we these records must be advanced. IIUC, currently such records are
> read after rebooting but ingored, and this patch just skips them. I have not measured,
> but there is a possibility that is not additional overhead, just a trade-off.
>
> Currently I did not come up with another solution, so I have included your patch.
> Please see 0002.
>
> Additionally, I added a checking functions in 0003.
> According to pg_resetwal and other functions, the length of CHECKPOINT_SHUTDOWN
> record seems (SizeOfXLogRecord + SizeOfXLogRecordDataHeaderShort + sizeof(CheckPoint)).
> Therefore, the function ensures that the difference between current insert position
> and confirmed_flush_lsn is less than (above + page header).

Thanks for the patches.
Currently the two_phase enabled slots are not getting restored from
the dumped contents, this is because we are passing the twophase value
as the second parameter which indicates if it is temporary or not to
the pg_create_logical_replication_slot function as in [1], while
restoring it is internally creating the slot as a temporary slot in
this case:
+               appendPQExpBuffer(query, "SELECT
pg_catalog.pg_create_logical_replication_slot('%s', ",
+                                                slotname);
+               appendStringLiteralAH(query, slotinfo->plugin, fout);
+               appendPQExpBuffer(query, ", ");
+               appendStringLiteralAH(query, slotinfo->twophase, fout);
+               appendPQExpBuffer(query, ");");
+
+               ArchiveEntry(fout, slotinfo->dobj.catId, slotinfo->dobj.dumpId,
+                                        ARCHIVE_OPTS(.tag = slotname,
+
.description = "REPLICATION SLOT",
+
.section = SECTION_POST_DATA,
+
.createStmt = query->data));
+
+               pfree(slotname);
+               destroyPQExpBuffer(query);
+       }
+}

Since we are dumping only the permanent slots, we could update
temporary parameter as false:
+               appendPQExpBuffer(query, "SELECT
pg_catalog.pg_create_logical_replication_slot('%s', ",
+                                                slotname);
+               appendStringLiteralAH(query, slotinfo->plugin, fout);
+               appendPQExpBuffer(query, ", f, ");
+               appendStringLiteralAH(query, slotinfo->twophase, fout);
+               appendPQExpBuffer(query, ");");

[1] - https://www.postgresql.org/docs/devel/functions-admin.html

Regards,
Vignesh



pgsql-hackers by date:

Previous
From: "Hayato Kuroda (Fujitsu)"
Date:
Subject: RE: Time delayed LR (WAS Re: logical replication restrictions)
Next
From: vignesh C
Date:
Subject: Re: [PoC] pg_upgrade: allow to upgrade publisher node