Home > mailing lists

Re: O(n) tasks cause lengthy startups and checkpoints - Mailing list pgsql-hackers

From	Euler Taveira
Subject	Re: O(n) tasks cause lengthy startups and checkpoints
Date	December 2, 2021 02:05:03
Msg-id	d4c7e393-b8be-4879-ad4b-4e80994e5763@www.fastmail.com Whole thread Raw
In response to	Re: O(n) tasks cause lengthy startups and checkpoints ("Bossart, Nathan" <bossartn@amazon.com>)
Responses	Re: O(n) tasks cause lengthy startups and checkpoints
List	pgsql-hackers

Tree view

On Wed, Dec 1, 2021, at 9:19 PM, Bossart, Nathan wrote:

On 12/1/21, 2:56 PM, "Andres Freund" <andres@anarazel.de> wrote:
> On 2021-12-01 20:24:25 +0000, Bossart, Nathan wrote:
>> I realize adding a new maintenance worker might be a bit heavy-handed,
>> but I think it would be nice to have somewhere to offload tasks that
>> really shouldn't impact startup and checkpointing. I imagine such a
>> process would come in handy down the road, too. WDYT?
>
> -1. I think the overhead of an additional worker is disproportional here. And
> there's simplicity benefits in having a predictable cleanup interlock as well.

Another idea I had was to put some upper limit on how much time is
spent on such tasks. For example, a checkpoint would only spend X
minutes on CheckPointSnapBuild() before giving up until the next one.
I think the main downside of that approach is that it could lead to
unbounded growth, so perhaps we would limit (or even skip) such tasks
only for end-of-recovery and shutdown checkpoints. Perhaps the
startup tasks could be limited in a similar fashion.

Saying that a certain task is O(n) doesn't mean it needs a separate process to

handle it. Did you have a use case or even better numbers (% of checkpoint /

startup time) that makes your proposal worthwhile?

I would try to optimize (1) and (2). However, delayed removal can be a

long-term issue if the new routine cannot keep up with the pace of file

creation (specially if the checkpoints are far apart).

For (3), there is already a GUC that would avoid the slowdown during startup.

Use it if you think the startup time is more important that disk space occupied

by useless files.

For (4), you are forgetting that the on-disk state of replication slots is

stored in the pg_replslot/SLOTNAME/state. It seems you cannot just rename the

replication slot directory and copy the state file. What happen if there is a

crash before copying the state file?

While we are talking about items (1), (2) and (4), we could probably have an

option to create some ephemeral logical decoding files into ramdisk (similar to

statistics directory). I wouldn't like to hijack this thread but this proposal

could alleviate the possible issues that you pointed out. If people are

interested in this proposal, I can start a new thread about it.

Euler Taveira

EDB https://www.enterprisedb.com/

pgsql-hackers by date:

From: "osumi.takamichi@fujitsu.com"
Date: 02 December 2021, 01:05:16
Subject: RE: Optionally automatically disable logical replication subscriptions on error

From: Amit Kapila
Date: 02 December 2021, 02:33:31
Subject: Re: Data is copied twice when specifying both child and parent table in publication

Re: O(n) tasks cause lengthy startups and checkpoints - Mailing list pgsql-hackers

Previous

Next