Re: O(n) tasks cause lengthy startups and checkpoints - Mailing list pgsql-hackers

From Bossart, Nathan
Subject Re: O(n) tasks cause lengthy startups and checkpoints
Date
Msg-id 32B93472-B51D-491A-B7AF-45DD8927D981@amazon.com
Whole thread Raw
In response to Re: O(n) tasks cause lengthy startups and checkpoints  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: O(n) tasks cause lengthy startups and checkpoints
List pgsql-hackers
On 12/13/21, 5:54 AM, "Robert Haas" <robertmhaas@gmail.com> wrote:
> I don't know whether this kind of idea is good or not.

Thanks for chiming in.  I have an almost-complete patch set that I'm
hoping to post to the lists in the next couple of days.

> One thing we've seen a number of times now is that entrusting the same
> process with multiple responsibilities often ends poorly. Sometimes
> it's busy with one thing when another thing really needs to be done
> RIGHT NOW. Perhaps that won't be an issue here since all of these
> things are related to checkpointing, but then the process name should
> reflect that rather than making it sound like we can just keep piling
> more responsibilities onto this process indefinitely. At some point
> that seems bound to become an issue.

Two of the tasks are cleanup tasks that checkpointing handles at the
moment, and two are cleanup tasks that are done at startup.  For now,
all of these tasks are somewhat nonessential.  There's no requirement
that any of these tasks complete in order to finish startup or
checkpointing.  In fact, outside of preventing the server from running
out of disk space, I don't think there's any requirement that these
tasks run at all.  IMO this would have to be a core tenet of a new
auxiliary process like this.

That being said, I totally understand your point.  If there were a
dozen such tasks handled by a single auxiliary process, that could
cause a new set of problems.  Your checkpointing and startup might be
fast, but you might run out of disk space because our cleanup process
can't handle it all.  So a new worker could end up becoming an
availability risk as well.

> Another issue is that we don't want to increase the number of
> processes without bound. Processes use memory and CPU resources and if
> we run too many of them it becomes a burden on the system. Low-end
> systems may not have too many resources in total, and high-end systems
> can struggle to fit demanding workloads within the resources that they
> have. Maybe it would be cheaper to do more things at once if we were
> using threads rather than processes, but that day still seems fairly
> far off.

I do agree that it is important to be very careful about adding new
processes, and if a better idea for how to handle these tasks emerges,
I will readily abandon my current approach.  Upthread, Andres
mentioned optimizing unnecessary snapshot files, and I mentioned
possibly limiting how much time startup and checkpoints spend on these
tasks.  I don't have too many details for the former, and for the
latter, I'm worried about not being able to keep up.  But if the
prospect of adding a new auxiliary process for this stuff is a non-
starter, perhaps I should explore that approach some more.

> But against all that, if these tasks are slowing down checkpoints and
> that's avoidable, that seems pretty important too. Interestingly, I
> can't say that I've ever seen any of these things be a problem for
> checkpoint or startup speed. I wonder why you've had a different
> experience.

Yeah, it's difficult for me to justify why users should suffer long
periods of downtime because startup or checkpointing is taking a very
long time doing things that are arguably unrelated to startup and
checkpointing.

Nathan


pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: Column Filtering in Logical Replication
Next
From: Tomas Vondra
Date:
Subject: Re: using extended statistics to improve join estimates