Re: Weird failure with latches in curculio on v15 - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Weird failure with latches in curculio on v15
Date
Msg-id 20230216163254.nktnbbb2b7wlpzwy@awork3.anarazel.de
Whole thread Raw
In response to Re: Weird failure with latches in curculio on v15  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Weird failure with latches in curculio on v15
List pgsql-hackers
Hi,

On 2023-02-16 15:18:57 +0530, Robert Haas wrote:
> On Fri, Feb 10, 2023 at 12:59 AM Andres Freund <andres@anarazel.de> wrote:
> > I don't think it's that hard to imagine problems. To be reasonably fast, a
> > decent restore implementation will have to 'restore ahead'. Which also
> > provides ample things to go wrong. E.g.
> >
> > - WAL source is switched, restore module needs to react to that, but doesn't,
> >   we end up lots of wasted work, or worse, filename conflicts
> > - recovery follows a timeline, restore module doesn't catch on quickly enough
> > - end of recovery happens, restore just continues on
> 
> I don't see how you can prevent those things from happening. If the
> restore process is working in some way that requires an event loop,
> and I think that will be typical for any kind of remote archiving,
> then either it has control most of the time, so the event loop can be
> run inside the restore process, or, as Nathan proposes, we don't let
> the archiver have control and it needs to run that restore process in
> a separate background worker. The hazards that you mention here exist
> either way. If the event loop is running inside the restore process,
> it can decide not to call the functions that we provide in a timely
> fashion and thus fail to react as it should. If the event loop runs
> inside a separate background worker, then that process can fail to be
> responsive in precisely the same way. Fundamentally, if the author of
> a restore module writes code to have multiple I/Os in flight at the
> same time and does not write code to cancel those I/Os if something
> changes, then such cancellation will not occur. That remains true no
> matter which process is performing the I/O.

IDK. I think we can make that easier or harder. Right now the proposed API
doesn't provide anything to allow to address this.


> > > I don't quite see how you can make asynchronous and parallel archiving
> > > work if the archiver process only calls into the archive module at
> > > times that it chooses. That would mean that the module has to return
> > > control to the archiver when it's in the middle of archiving one or
> > > more files -- and then I don't see how it can get control back at the
> > > appropriate time. Do you have a thought about that?
> >
> > I don't think archiver is the hard part, that already has a dedicated
> > process, and it also has something of a queuing system already. The startup
> > process imo is the complicated one...
> >
> > If we had a 'restorer' process, startup fed some sort of a queue with things
> > to restore in the near future, it might be more realistic to do something you
> > describe?
> 
> Some kind of queueing system might be a useful part of the interface,
> and a dedicated restorer process does sound like a good idea. But the
> archiver doesn't have this solved, precisely because you have to
> archive a single file, return control, and wait to be invoked again
> for the next file. That does not scale.

But there's nothing inherent in that. We know for certain which files we're
going to archive. And we don't need to work one-by-one. The archiver could
just start multiple subprocesses at the same time. All the blocking it does
right now are artificially imposed by the use of system(). We could instead
just use something popen() like and have a configurable number of processes
running at the same time.

What I was trying to point out was that the work a "restorer" process has to
do is more speculative, because we don't know when we'll promote, whether
we'll follow a timeline increase, whether the to-be-restored WAL already
exists. That's solvable, but a bunch of the relevant work ought to be solved
in core core code, instead of just in archive modules.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Move defaults toward ICU in 16?
Next
From: Andrew Dunstan
Date:
Subject: Re: run pgindent on a regular basis / scripted manner