Re: speed up a logical replica setup - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: speed up a logical replica setup
Date
Msg-id CAA4eK1+p+7Ag6nqdFRdqowK1EmJ6bG-MtZQ_54dnFBi=_OO5RQ@mail.gmail.com
Whole thread Raw
In response to RE: speed up a logical replica setup  ("Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>)
Responses RE: speed up a logical replica setup
List pgsql-hackers
On Mon, Jul 1, 2024 at 8:22 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> > I have a different but possibly-related complaint: why is
> > 040_pg_createsubscriber.pl so miserably slow?  On my machine it
> > runs for a bit over 19 seconds, which seems completely out of line
> > (for comparison, 010_pg_basebackup.pl takes 6 seconds, and the
> > other test scripts in this directory take much less).  It looks
> > like most of the blame falls on this step:
> >
> > [12:47:22.292](14.534s) ok 28 - run pg_createsubscriber on node S
> >
> > AFAICS the amount of data being replicated is completely trivial,
> > so that it doesn't make any sense for this to take so long --- and
> > if it does, that suggests that this tool will be impossibly slow
> > for production use.  But I suspect there is a logic flaw causing
> > this.
>
> I analyzed the issue. My elog() debugging said that wait_for_end_recovery() was
> wasted some time. This was caused by the recovery target seeming unsatisfactory.
>
> We are setting recovery_target_lsn by the return value of pg_create_logical_replication_slot(),
> which returns the end of the RUNNING_XACT record. If we use the returned value as
> recovery_target_lsn as-is, however, we must wait for additional WAL generation
> because the parameter requires that the replicated WAL overtake a certain point.
> On my env, the function waited until the bgwriter emitted the XLOG_RUNNING_XACTS record.
>

IIUC, the problem is that the consistent_lsn value returned by
setup_publisher() is the "end +1" location of the required LSN whereas
the recovery_target_lsn used in wait_for_end_recovery() expects the
LSN value to be "start" location of required LSN.

> One simple solution is to add an additional WAL record at the end of the publisher
> setup. IIUC, an arbitrary WAL insertion can reduce the waiting time. The attached
> patch inserts a small XLOG_LOGICAL_MESSAGE record, which could reduce much execution
> time on my environment.
>

This sounds like an ugly hack to me and don't know if we can use it.
The ideal way to fix this is to get the start_lsn from the
create_logical_slot functionality or have some parameter like
recover_target_end_lsn but I don't know if this is a good time to
extend such a functionality.

--
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Matthias van de Meent
Date:
Subject: Re: Use generation memory context for tuplestore.c
Next
From: Daniel Gustafsson
Date:
Subject: Re: CREATE OR REPLACE MATERIALIZED VIEW