Re: 040_pg_createsubscriber.pl is slow and unstable (was Re: speed up a logical replica setup) - Mailing list pgsql-hackers

From Ashutosh Bapat
Subject Re: 040_pg_createsubscriber.pl is slow and unstable (was Re: speed up a logical replica setup)
Date
Msg-id CAExHW5vpLkWu5OJDfqGgftW7YROjcy=7Uk=DO2qFOh9-j8FD3g@mail.gmail.com
Whole thread Raw
In response to Re: 040_pg_createsubscriber.pl is slow and unstable (was Re: speed up a logical replica setup)  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: 040_pg_createsubscriber.pl is slow and unstable (was Re: speed up a logical replica setup)
List pgsql-hackers
On Tue, Jul 30, 2024 at 9:25 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Jul 30, 2024 at 1:48 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >
> > Robert Haas <robertmhaas@gmail.com> writes:
> > > On Sun, Jun 30, 2024 at 2:40 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > >> ... However, I added a new open item about how the
> > >> 040_pg_createsubscriber.pl test is slow and still unstable.
> >
> > > But that said, I see no commits in the commit history which purport to
> > > improve performance, so I guess the performance is probably still not
> > > what you want, though I am not clear on the details.
> >
> > My concern is described at [1]:
> >
> > >> I have a different but possibly-related complaint: why is
> > >> 040_pg_createsubscriber.pl so miserably slow?  On my machine it
> > >> runs for a bit over 19 seconds, which seems completely out of line
> > >> (for comparison, 010_pg_basebackup.pl takes 6 seconds, and the
> > >> other test scripts in this directory take much less).  It looks
> > >> like most of the blame falls on this step:
> > >>
> > >> [12:47:22.292](14.534s) ok 28 - run pg_createsubscriber on node S
> > >>
> > >> AFAICS the amount of data being replicated is completely trivial,
> > >> so that it doesn't make any sense for this to take so long --- and
> > >> if it does, that suggests that this tool will be impossibly slow
> > >> for production use.  But I suspect there is a logic flaw causing
> > >> this.  Speculating wildly, perhaps that is related to the failure
> > >> Alexander spotted?
> >
> > The followup discussion in that thread made it sound like there's
> > some fairly fundamental deficiency in how wait_for_end_recovery()
> > detects end-of-recovery.  I'm not too conversant with the details
> > though, and it's possible that pg_createsubscriber is just falling
> > foul of a pre-existing infelicity.
> >
> > If the problem can be correctly described as "pg_createsubscriber
> > takes 10 seconds or so to detect end-of-stream",
> >
>
> The problem can be defined as: "pg_createsubscriber waits for an
> additional (new) WAL record to be generated on primary before it
> considers the standby is ready for becoming a subscriber". Now, on
> busy systems, this shouldn't be a problem but for idle systems, the
> time to detect end-of-stream can't be easily defined.

AFAIU, the server will emit running transactions WAL record at least
15 seconds. So the subscriber should not have to wait longer than 15
seconds. I understand that it would be a problem for tests, but will
it be a problem for end users? Sorry for repetition, if this has been
discussed.

--
Best Wishes,
Ashutosh Bapat



pgsql-hackers by date:

Previous
From: Sutou Kouhei
Date:
Subject: Re: Separate HEAP WAL replay logic into its own file
Next
From: Amit Kapila
Date:
Subject: Re: 040_pg_createsubscriber.pl is slow and unstable (was Re: speed up a logical replica setup)