Re: Checkpointer sync queue fills up / loops around pg_usleep() are bad - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: Checkpointer sync queue fills up / loops around pg_usleep() are bad
Date
Msg-id CA+hUKGLRtjhGWB-dd_B8z6agJaFmfxVTiSyqnka937ss3+VywQ@mail.gmail.com
Whole thread Raw
In response to Re: Checkpointer sync queue fills up / loops around pg_usleep() are bad  (Andres Freund <andres@anarazel.de>)
Responses Re: Checkpointer sync queue fills up / loops around pg_usleep() are bad  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Mon, Feb 28, 2022 at 2:36 PM Andres Freund <andres@anarazel.de> wrote:
> On February 27, 2022 4:19:21 PM PST, Thomas Munro <thomas.munro@gmail.com> wrote:
> >It seems a little strange to introduce a new wait event that will very
> >often appear into a stable branch, but ... it is actually telling the
> >truth, so there is that.
>
> In the back branches it needs to be at the end of the enum - I assume you intended that just to be for HEAD.

Yeah.

> I wonder whether in HEAD we shouldn't make that sleep duration be computed from the calculation in IsOnSchedule...

I might look into this.

> >The sleep/poll loop in RegisterSyncRequest() may also have another
> >problem.  The comment explains that it was a deliberate choice not to
> >do CHECK_FOR_INTERRUPTS() here, which may be debatable, but I don't
> >think there's an excuse to ignore postmaster death in a loop that
> >presumably becomes infinite if the checkpointer exits.  I guess we
> >could do:
> >
> >-               pg_usleep(10000L);
> >+               WaitLatch(NULL, WL_EXIT_ON_PM_DEATH | WL_TIMEOUT, 10,
> >WAIT_EVENT_SYNC_REQUEST);
> >
> >But... really, this should be waiting on a condition variable that the
> >checkpointer broadcasts on when the queue goes from full to not full,
> >no?  Perhaps for master only?
>
> Looks worth improving, but yes, I'd not do it in the back branches.

0003 is a first attempt at that, for master only (on top of 0002 which
is the minimal fix).  This shaves another second off
027_stream_regress.pl on my workstation.  The main thing I realised is
that I needed to hold interrupts while waiting, which seems like it
should go away with 'tombstone' files as discussed in other threads.
That's not a new problem in this patch, it just looks more offensive
to the eye when you spell it out, instead of hiding it with an
unreported sleep/poll loop...

> I do think it's worth giving that sleep a proper wait event though, even in the back branches.

I'm thinking that 0002 should be back-patched all the way, but 0001
could be limited to 14.

Attachment

pgsql-hackers by date:

Previous
From: Bharath Rupireddy
Date:
Subject: Re: Allow async standbys wait for sync replication
Next
From: Nitin Jadhav
Date:
Subject: Refactor statistics collector, backend status reporting and command progress reporting