RE: speed up a logical replica setup - Mailing list pgsql-hackers

From Hayato Kuroda (Fujitsu)
Subject RE: speed up a logical replica setup
Date
Msg-id TY3PR01MB9889C5D55206DDD978627D07F5752@TY3PR01MB9889.jpnprd01.prod.outlook.com
Whole thread Raw
In response to RE: speed up a logical replica setup  ("Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>)
Responses Re: speed up a logical replica setup
Re: speed up a logical replica setup
List pgsql-hackers
Dear hackers,

>
> 15.
> I found that subscriptions cannot be started if tuples are inserted on publisher
> after creating temp_replslot. After starting a subscriber, I got below output on the
> log.
>
> ```
> ERROR:  could not receive data from WAL stream: ERROR:  publication
> "pg_subscriber_5" does not exist
> CONTEXT:  slot "pg_subscriber_5_3632", output plugin "pgoutput", in the change
> callback, associated LSN 0/30008A8
> LOG:  background worker "logical replication apply worker" (PID 3669) exited
> with exit code 1
> ```
>
> But this is strange. I confirmed that the specified publication surely exists.
> Do you know the reason?
>
> ```
> publisher=# SELECT pubname FROM pg_publication;
>      pubname
> -----------------
>  pg_subscriber_5
> (1 row)
> ```
>

I analyzed and found a reason. This is because publications are invisible for some transactions.

As the first place, below operations were executed in this case.
Tuples were inserted after getting consistent_lsn, but before starting the standby.
After doing the workload, I confirmed again that the publication was created.

1. on primary, logical replication slots were created.
2. on primary, another replication slot was created.
3. ===on primary, some tuples were inserted. ===
4. on standby, a server process was started
5. on standby, the process waited until all changes have come.
6. on primary, publications were created.
7. on standby, subscriptions were created.
8. on standby, a replication progress for each subscriptions was set to given LSN (got at step2).
=====pg_subscriber finished here=====
9. on standby, a server process was started again
10. on standby, subscriptions were enabled. They referred slots created at step1.
11. on primary, decoding was started but ERROR was raised.

In this case, tuples were inserted *before creating publication*.
So I thought that the decoded transaction could not see the publication because
it was committed after insertions.

One solution is to create a publication before creating a consistent slot.
Changes which came before creating the slot were surely replicated to the standby,
so upcoming transactions can see the object. We are planning to patch set to fix
the issue in this approach.


Best Regards,
Hayato Kuroda
FUJITSU LIMITED




pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: [PoC] Improve dead tuple storage for lazy vacuum
Next
From: Andy Fan
Date:
Subject: Re: the s_lock_stuck on perform_spin_delay