On Mon, Jul 1, 2024 at 8:22 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> > I have a different but possibly-related complaint: why is
> > 040_pg_createsubscriber.pl so miserably slow? On my machine it
> > runs for a bit over 19 seconds, which seems completely out of line
> > (for comparison, 010_pg_basebackup.pl takes 6 seconds, and the
> > other test scripts in this directory take much less). It looks
> > like most of the blame falls on this step:
> >
> > [12:47:22.292](14.534s) ok 28 - run pg_createsubscriber on node S
> >
> > AFAICS the amount of data being replicated is completely trivial,
> > so that it doesn't make any sense for this to take so long --- and
> > if it does, that suggests that this tool will be impossibly slow
> > for production use. But I suspect there is a logic flaw causing
> > this.
>
> I analyzed the issue. My elog() debugging said that wait_for_end_recovery() was
> wasted some time. This was caused by the recovery target seeming unsatisfactory.
>
> We are setting recovery_target_lsn by the return value of pg_create_logical_replication_slot(),
> which returns the end of the RUNNING_XACT record. If we use the returned value as
> recovery_target_lsn as-is, however, we must wait for additional WAL generation
> because the parameter requires that the replicated WAL overtake a certain point.
> On my env, the function waited until the bgwriter emitted the XLOG_RUNNING_XACTS record.
>
IIUC, the problem is that the consistent_lsn value returned by
setup_publisher() is the "end +1" location of the required LSN whereas
the recovery_target_lsn used in wait_for_end_recovery() expects the
LSN value to be "start" location of required LSN.
> One simple solution is to add an additional WAL record at the end of the publisher
> setup. IIUC, an arbitrary WAL insertion can reduce the waiting time. The attached
> patch inserts a small XLOG_LOGICAL_MESSAGE record, which could reduce much execution
> time on my environment.
>
This sounds like an ugly hack to me and don't know if we can use it.
The ideal way to fix this is to get the start_lsn from the
create_logical_slot functionality or have some parameter like
recover_target_end_lsn but I don't know if this is a good time to
extend such a functionality.
--
With Regards,
Amit Kapila.