Re: WIP: WAL prefetch (another approach) - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: WIP: WAL prefetch (another approach)
Date
Msg-id CA+hUKGJ1=pOiNjSgXYJnjE3OyRtp8tjMRcON256e5EFpzPpgtA@mail.gmail.com
Whole thread Raw
In response to Re: WIP: WAL prefetch (another approach)  (Stephen Frost <sfrost@snowman.net>)
Responses Re: WIP: WAL prefetch (another approach)  (Stephen Frost <sfrost@snowman.net>)
RE: WIP: WAL prefetch (another approach)  (Jakub Wartak <Jakub.Wartak@tomtom.com>)
List pgsql-hackers
On Sat, Nov 14, 2020 at 4:13 AM Stephen Frost <sfrost@snowman.net> wrote:
> * Tomas Vondra (tomas.vondra@enterprisedb.com) wrote:
> > On 11/13/20 3:20 AM, Thomas Munro wrote:
> > > I'm not really sure what to do about achive restore scripts that
> > > block.  That seems to be fundamentally incompatible with what I'm
> > > doing here.
> >
> > IMHO we can't do much about that, except for documenting it - if the
> > prefetch can't work because of blocking restore script, someone has to
> > fix/improve the script. No way around that, I'm afraid.
>
> I'm a bit confused about what the issue here is- is the concern that a
> restore_command is specified that isn't allowed to run concurrently but
> this patch is intending to run more than one concurrently..?  There's
> another patch that I was looking at for doing pre-fetching of WAL
> segments, so if this is also doing that we should figure out which
> patch we want..

The problem is that the recovery loop tries to look further ahead in
between applying individual records, which causes the restore script
to run, and if that blocks, we won't apply records that we already
have, because we're waiting for the next WAL file to appear.  This
behaviour is on by default with my patch, so pg_standby will introduce
a weird replay delays.  We could think of some ways to fix that, with
meaningful return codes and periodic polling or something, I suppose,
but something feels a bit weird about it.

> I don't know that it's needed, but it feels likely that we could provide
> a better result if we consider making changes to the restore_command API
> (eg: have a way to say "please fetch this many segments ahead, and you
> can put them in this directory with these filenames" or something).  I
> would think we'd be able to continue supporting the existing API and
> accept that it might not be as performant.

Hmm.  Every time I try to think of a protocol change for the
restore_command API that would be acceptable, I go around the same
circle of thoughts about event flow and realise that what we really
need for this is ... a WAL receiver...

Here's a rebase over the recent commit "Get rid of the dedicated latch
for signaling the startup process." just to fix cfbot; no other
changes.

Attachment

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Tab complete for CREATE OR REPLACE TRIGGER statement
Next
From: Pavel Stehule
Date:
Subject: Re: pl/pgsql feature request: shorthand for argument and local variable references