Re: Synchronizing slots from primary to standby - Mailing list pgsql-hackers

From Bertrand Drouvot
Subject Re: Synchronizing slots from primary to standby
Date
Msg-id ZjtD9QwH9EB5c37e@ip-10-97-1-34.eu-west-3.compute.internal
Whole thread Raw
In response to RE: Synchronizing slots from primary to standby  ("Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com>)
Responses RE: Synchronizing slots from primary to standby
List pgsql-hackers
Hi,

On Mon, Apr 29, 2024 at 11:58:09AM +0000, Zhijie Hou (Fujitsu) wrote:
> On Monday, April 29, 2024 5:11 PM shveta malik <shveta.malik@gmail.com> wrote:
> > 
> > On Mon, Apr 29, 2024 at 11:38 AM shveta malik <shveta.malik@gmail.com>
> > wrote:
> > >
> > > On Mon, Apr 29, 2024 at 10:57 AM Zhijie Hou (Fujitsu)
> > > <houzj.fnst@fujitsu.com> wrote:
> > > >
> > > > On Friday, March 15, 2024 10:45 PM Bertrand Drouvot
> > <bertranddrouvot.pg@gmail.com> wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > On Thu, Mar 14, 2024 at 02:22:44AM +0000, Zhijie Hou (Fujitsu) wrote:
> > > > > > Hi,
> > > > > >
> > > > > > Since the standby_slot_names patch has been committed, I am
> > > > > > attaching the last doc patch for review.
> > > > > >
> > > > >
> > > > > Thanks!
> > > > >
> > > > > 1 ===
> > > > >
> > > > > +   continue subscribing to publications now on the new primary
> > > > > + server
> > > > > without
> > > > > +   any data loss.
> > > > >
> > > > > I think "without any data loss" should be re-worded in this
> > > > > context. Data loss in the sense "data committed on the primary and
> > > > > not visible on the subscriber in case of failover" can still occurs (in case
> > synchronous replication is not used).
> > > > >
> > > > > 2 ===
> > > > >
> > > > > +   If the result (<literal>failover_ready</literal>) of both above steps is
> > > > > +   true, existing subscriptions will be able to continue without data
> > loss.
> > > > > +  </para>
> > > > >
> > > > > I don't think that's true if synchronous replication is not used.
> > > > > Say,
> > > > >
> > > > > - synchronous replication is not used
> > > > > - primary is not able to reach the standby anymore and
> > > > > standby_slot_names is set
> > > > > - new data is inserted into the primary
> > > > > - then not replicated to subscriber (due to standby_slot_names)
> > > > >
> > > > > Then I think the both above steps will return true but data would
> > > > > be lost in case of failover.
> > > >
> > > > Thanks for the comments, attach the new version patch which reworded
> > > > the above places.

Thanks!

> Here is the V3 doc patch.

Thanks! A few comments:

1 ===

+   losing any data that has been flushed to the new primary server.

Worth to add a few words about possible data loss, something like?

Please note that in case synchronous replication is not used and standby_slot_names
is set correctly, it might be possible to lose data that would have been committed
on the old primary server (in case the standby was not reachable during that time 
for example).

2 ===

+test_sub=# SELECT
+               array_agg(slotname) AS slots
+           FROM
+           ((
+               SELECT r.srsubid AS subid, CONCAT('pg_', srsubid, '_sync_', srrelid, '_', ctl.system_identifier) AS
slotname
+               FROM pg_control_system() ctl, pg_subscription_rel r, pg_subscription s
+               WHERE r.srsubstate = 'f' AND s.oid = r.srsubid AND s.subfailover
+           ) UNION (

I guess this format comes from ReplicationSlotNameForTablesync(). What about
creating a SQL callable function on top of it and make use of it in the query
above? (that would ensure to keep the doc up to date even if the format changes
in ReplicationSlotNameForTablesync()).

3 ===

+test_sub=# SELECT
+               MAX(remote_lsn) AS remote_lsn_on_subscriber
+           FROM
+           ((
+               SELECT (CASE WHEN r.srsubstate = 'f' THEN pg_replication_origin_progress(CONCAT('pg_', r.srsubid, '_',
r.srrelid),false)
 
+                           WHEN r.srsubstate IN ('s', 'r') THEN r.srsublsn END) AS remote_lsn
+               FROM pg_subscription_rel r, pg_subscription s
+               WHERE r.srsubstate IN ('f', 's', 'r') AND s.oid = r.srsubid AND s.subfailover
+           ) UNION (
+               SELECT pg_replication_origin_progress(CONCAT('pg_', s.oid), false) AS remote_lsn
+               FROM pg_subscription s
+               WHERE s.subfailover
+           ));

What about adding a join to pg_replication_origin to get rid of the "hardcoded"
format "CONCAT('pg_', r.srsubid, '_', r.srrelid)" and "CONCAT('pg_', s.oid)"?

Idea behind 2 ===  and 3 === is to have the queries as generic as possible and
not rely on a hardcoded format (that would be more difficult to maintain should
those formats change in the future).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Richard Guo
Date:
Subject: Re: A problem about partitionwise join
Next
From: Matthias van de Meent
Date:
Subject: Expand applicability of aggregate's sortop optimization