Re: Weird failure with latches in curculio on v15 - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: Weird failure with latches in curculio on v15 |
Date | |
Msg-id | 20230209192952.jvx56yuutlxuvjjf@awork3.anarazel.de Whole thread Raw |
In response to | Re: Weird failure with latches in curculio on v15 (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Weird failure with latches in curculio on v15
Re: Weird failure with latches in curculio on v15 |
List | pgsql-hackers |
Hi, On 2023-02-09 11:12:21 -0500, Robert Haas wrote: > On Thu, Feb 9, 2023 at 10:51 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > I'm fairly concerned about the idea of making it common for people > > to write their own main loop for the archiver. That means that, if > > we have a bug fix that requires the archiver to do X, we will not > > just be patching our own code but trying to get an indeterminate > > set of third parties to add the fix to their code. I'm somewhat concerned about that too, but perhaps from a different angle. First, I think we don't do our users a service by defaulting the in-core implementation to something that doesn't scale to even a moderately busy server. Second, I doubt we'll get the API for any of this right, without an acutual user that does something more complicated than restoring one-by-one in a blocking manner. > I don't know what kind of bug we could really have in the main loop > that would be common to every implementation. They're probably all > going to check for interrupts, do some work, and then wait for I/O on > some things by calling select() or some equivalent. But the work, and > the wait for the I/O, would be different for every implementation. I > would anticipate that the amount of common code would be nearly zero. I don't think it's that hard to imagine problems. To be reasonably fast, a decent restore implementation will have to 'restore ahead'. Which also provides ample things to go wrong. E.g. - WAL source is switched, restore module needs to react to that, but doesn't, we end up lots of wasted work, or worse, filename conflicts - recovery follows a timeline, restore module doesn't catch on quickly enough - end of recovery happens, restore just continues on > > If we think we need primitives to let the archiver hooks get all > > the pending files, or whatever, by all means add those. But don't > > cede fundamental control of the archiver. The hooks need to be > > decoration on a framework we provide, not the framework themselves. > > I don't quite see how you can make asynchronous and parallel archiving > work if the archiver process only calls into the archive module at > times that it chooses. That would mean that the module has to return > control to the archiver when it's in the middle of archiving one or > more files -- and then I don't see how it can get control back at the > appropriate time. Do you have a thought about that? I don't think archiver is the hard part, that already has a dedicated process, and it also has something of a queuing system already. The startup process imo is the complicated one... If we had a 'restorer' process, startup fed some sort of a queue with things to restore in the near future, it might be more realistic to do something you describe? Greetings, Andres Freund
pgsql-hackers by date: