Re: pg_receivewal starting position - Mailing list pgsql-hackers

From Ronan Dunklau
Subject Re: pg_receivewal starting position
Date
Msg-id 6266482.K2JlShyGXD@aivenronan
Whole thread Raw
In response to Re: pg_receivewal starting position  (Michael Paquier <michael@paquier.xyz>)
List pgsql-hackers
Le vendredi 29 octobre 2021, 04:27:51 CEST Michael Paquier a écrit :
> On Thu, Oct 28, 2021 at 03:55:12PM +0200, Ronan Dunklau wrote:
> > Interesting ideas, thanks. For the record, the time drops from ~4.5s to 3s
> > on average on my machine.
> > I think if you reduce the size of the generate_series batches, this should
> > probably be reduced everywhere. With what we do though, inserting a single
> > line should work just as well, I wonder why we insist on inserting a
> > hundred lines ? I updated your patch with that small modification, it
> > also makes the code less verbose.
>
> Thanks for the extra numbers.  I have added your suggestions,
> switching the dummy table to use a primary key with different values,
> while on it, as there is an argument that it makes debugging easier,
> and applied the speedup patch.

Thanks !

>
> >> +$standby->psql('',
> >> +  "CREATE_REPLICATION_SLOT $folder_slot PHYSICAL (RESERVE_WAL)",
> >> +  replication => 1);
> >> Here as well we could use a restart point to reduce the number of
> >> segments archived.
> >
> > The restart point should be very close, as we don't generate any activity
> > on the primary between the backup and the slot's creation. I'm not sure
> > adding the complexity of triggering a checkpoint on the primary and
> > waiting for the standby to catch up on it would be that useful.
>
> Yes, you are right here.  The base backup taken from the primary
> at this point ensures a fresh point.
>
> +# This test is split in two, using the same standby: one test check the
> +# resume-from-folder case, the other the resume-from-slot one.
> This comment needs a refresh, as the resume-from-folder case is no
> more.
>

Done.

> +$standby->psql(
> +  'postgres',
> +  "SELECT pg_promote(wait_seconds => 300)");
> This could be $standby->promote.
>

Oh, didn't know about that.

> +# Switch wal to make sure it is not a partial file but a complete
> segment.
> +$primary->psql('postgres', 'INSERT INTO test_table VALUES (1);');
> +$primary->psql('postgres', 'SELECT pg_switch_wal();');
> +$primary->wait_for_catchup($standby, 'replay', $primary->lsn('write'));
> This INSERT needs a slight change to adapt to the primary key of the
> table.  This one is on me :p

Done.

>
> Anyway, is this first segment switch really necessary?  From the data
> archived by pg_receivewal in the command testing the TLI jump, we
> finish with the following contents (contents generated after fixing
> the three INSERTs):
> 00000001000000000000000B
> 00000001000000000000000C
> 00000002000000000000000D
> 00000002000000000000000E.partial
> 00000002.history
>
> So, even if we don't do the first switch, we'd still have one
> completed segment on the previous timeline, before switching to the
> new timeline and the next segment (pg_receivewal is a bit inconsistent
> with the backend here, by the way, as the first segment on the new
> timeline would map with the last segment of the old timeline, but here
> we have a clean switch as of stop_streaming in pg_receivewal.c).

The first completed segment on the previous timeline comes from the fact we
stream from the restart point. I removed the switch to use the walfilename of
the replication slot's restart point instead. This means querying both the
standby (to get the replication slot's restart_lsn) and the primary (to have
access to pg_walfile_name).

We could use a single query on the primary (using the primary's checkpoint LSN
instead)  but it feels a bit convoluted just to avoid a query on the standby.


--
Ronan Dunklau
Attachment

pgsql-hackers by date:

Previous
From: Amit Langote
Date:
Subject: Re: Multi-Column List Partitioning
Next
From: Amit Kapila
Date:
Subject: Re: Skipping logical replication transactions on subscriber side