Re: O(n) tasks cause lengthy startups and checkpoints - Mailing list pgsql-hackers
From | Bossart, Nathan |
---|---|
Subject | Re: O(n) tasks cause lengthy startups and checkpoints |
Date | |
Msg-id | 32B93472-B51D-491A-B7AF-45DD8927D981@amazon.com Whole thread Raw |
In response to | Re: O(n) tasks cause lengthy startups and checkpoints (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: O(n) tasks cause lengthy startups and checkpoints
|
List | pgsql-hackers |
On 12/13/21, 5:54 AM, "Robert Haas" <robertmhaas@gmail.com> wrote: > I don't know whether this kind of idea is good or not. Thanks for chiming in. I have an almost-complete patch set that I'm hoping to post to the lists in the next couple of days. > One thing we've seen a number of times now is that entrusting the same > process with multiple responsibilities often ends poorly. Sometimes > it's busy with one thing when another thing really needs to be done > RIGHT NOW. Perhaps that won't be an issue here since all of these > things are related to checkpointing, but then the process name should > reflect that rather than making it sound like we can just keep piling > more responsibilities onto this process indefinitely. At some point > that seems bound to become an issue. Two of the tasks are cleanup tasks that checkpointing handles at the moment, and two are cleanup tasks that are done at startup. For now, all of these tasks are somewhat nonessential. There's no requirement that any of these tasks complete in order to finish startup or checkpointing. In fact, outside of preventing the server from running out of disk space, I don't think there's any requirement that these tasks run at all. IMO this would have to be a core tenet of a new auxiliary process like this. That being said, I totally understand your point. If there were a dozen such tasks handled by a single auxiliary process, that could cause a new set of problems. Your checkpointing and startup might be fast, but you might run out of disk space because our cleanup process can't handle it all. So a new worker could end up becoming an availability risk as well. > Another issue is that we don't want to increase the number of > processes without bound. Processes use memory and CPU resources and if > we run too many of them it becomes a burden on the system. Low-end > systems may not have too many resources in total, and high-end systems > can struggle to fit demanding workloads within the resources that they > have. Maybe it would be cheaper to do more things at once if we were > using threads rather than processes, but that day still seems fairly > far off. I do agree that it is important to be very careful about adding new processes, and if a better idea for how to handle these tasks emerges, I will readily abandon my current approach. Upthread, Andres mentioned optimizing unnecessary snapshot files, and I mentioned possibly limiting how much time startup and checkpoints spend on these tasks. I don't have too many details for the former, and for the latter, I'm worried about not being able to keep up. But if the prospect of adding a new auxiliary process for this stuff is a non- starter, perhaps I should explore that approach some more. > But against all that, if these tasks are slowing down checkpoints and > that's avoidable, that seems pretty important too. Interestingly, I > can't say that I've ever seen any of these things be a problem for > checkpoint or startup speed. I wonder why you've had a different > experience. Yeah, it's difficult for me to justify why users should suffer long periods of downtime because startup or checkpointing is taking a very long time doing things that are arguably unrelated to startup and checkpointing. Nathan
pgsql-hackers by date: