Re: Weird failure with latches in curculio on v15 - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: Weird failure with latches in curculio on v15 |
Date | |
Msg-id | 20230216163254.nktnbbb2b7wlpzwy@awork3.anarazel.de Whole thread Raw |
In response to | Re: Weird failure with latches in curculio on v15 (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Weird failure with latches in curculio on v15
|
List | pgsql-hackers |
Hi, On 2023-02-16 15:18:57 +0530, Robert Haas wrote: > On Fri, Feb 10, 2023 at 12:59 AM Andres Freund <andres@anarazel.de> wrote: > > I don't think it's that hard to imagine problems. To be reasonably fast, a > > decent restore implementation will have to 'restore ahead'. Which also > > provides ample things to go wrong. E.g. > > > > - WAL source is switched, restore module needs to react to that, but doesn't, > > we end up lots of wasted work, or worse, filename conflicts > > - recovery follows a timeline, restore module doesn't catch on quickly enough > > - end of recovery happens, restore just continues on > > I don't see how you can prevent those things from happening. If the > restore process is working in some way that requires an event loop, > and I think that will be typical for any kind of remote archiving, > then either it has control most of the time, so the event loop can be > run inside the restore process, or, as Nathan proposes, we don't let > the archiver have control and it needs to run that restore process in > a separate background worker. The hazards that you mention here exist > either way. If the event loop is running inside the restore process, > it can decide not to call the functions that we provide in a timely > fashion and thus fail to react as it should. If the event loop runs > inside a separate background worker, then that process can fail to be > responsive in precisely the same way. Fundamentally, if the author of > a restore module writes code to have multiple I/Os in flight at the > same time and does not write code to cancel those I/Os if something > changes, then such cancellation will not occur. That remains true no > matter which process is performing the I/O. IDK. I think we can make that easier or harder. Right now the proposed API doesn't provide anything to allow to address this. > > > I don't quite see how you can make asynchronous and parallel archiving > > > work if the archiver process only calls into the archive module at > > > times that it chooses. That would mean that the module has to return > > > control to the archiver when it's in the middle of archiving one or > > > more files -- and then I don't see how it can get control back at the > > > appropriate time. Do you have a thought about that? > > > > I don't think archiver is the hard part, that already has a dedicated > > process, and it also has something of a queuing system already. The startup > > process imo is the complicated one... > > > > If we had a 'restorer' process, startup fed some sort of a queue with things > > to restore in the near future, it might be more realistic to do something you > > describe? > > Some kind of queueing system might be a useful part of the interface, > and a dedicated restorer process does sound like a good idea. But the > archiver doesn't have this solved, precisely because you have to > archive a single file, return control, and wait to be invoked again > for the next file. That does not scale. But there's nothing inherent in that. We know for certain which files we're going to archive. And we don't need to work one-by-one. The archiver could just start multiple subprocesses at the same time. All the blocking it does right now are artificially imposed by the use of system(). We could instead just use something popen() like and have a configurable number of processes running at the same time. What I was trying to point out was that the work a "restorer" process has to do is more speculative, because we don't know when we'll promote, whether we'll follow a timeline increase, whether the to-be-restored WAL already exists. That's solvable, but a bunch of the relevant work ought to be solved in core core code, instead of just in archive modules. Greetings, Andres Freund
pgsql-hackers by date: