Thread: Run end-of-recovery checkpoint in non-wait mode or skip it entirely for faster server availability?
Run end-of-recovery checkpoint in non-wait mode or skip it entirely for faster server availability?
From
Bharath Rupireddy
Date:
Hi, Currently postgres runs end-of-recovery(EOR) checkpoint in wait mode meaning the server can take longer before it opens up for connections. The EOR checkpoint, at times, can take a while if there was a lot of work the server has done during crash recovery, say it replayed many WAL records or created many snapshot or mapping files or dirtied so many buffers and so on. Since the server spins up checkpointer process [1] while the startup process performs recovery, isn't it a good idea to make end-of-recovery completely optional for the users or at least run it in non-wait mode so that the server will be available faster. The next checkpointer cycle will take care of performing the EOR checkpoint work, if user chooses to skip the EOR or the checkpointer will run EOR checkpoint in background, if user chooses to run it in the non-wait mode (without CHECKPOINT_WAIT flag). Of course by choosing this option, users must be aware of the fact that the extra amount of recovery work that needs to be done if a crash happens from the point EOR gets skipped or runs in non-wait mode until the next checkpoint. But the advantage that users get is the faster server availability. Thanks a lot Thomas for the internal discussion. Thoughts? [1] commit 7ff23c6d277d1d90478a51f0dd81414d343f3850 Author: Thomas Munro <tmunro@postgresql.org> Date: Mon Aug 2 17:32:20 2021 +1200 Run checkpointer and bgwriter in crash recovery. Regards, Bharath Rupireddy.
Re: Run end-of-recovery checkpoint in non-wait mode or skip it entirely for faster server availability?
From
Robert Haas
Date:
On Fri, Mar 25, 2022 at 3:40 AM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > Since the server spins up checkpointer process [1] while the startup > process performs recovery, isn't it a good idea to make > end-of-recovery completely optional for the users or at least run it > in non-wait mode so that the server will be available faster. The next > checkpointer cycle will take care of performing the EOR checkpoint > work, if user chooses to skip the EOR or the checkpointer will run EOR > checkpoint in background, if user chooses to run it in the non-wait > mode (without CHECKPOINT_WAIT flag). Of course by choosing this > option, users must be aware of the fact that the extra amount of > recovery work that needs to be done if a crash happens from the point > EOR gets skipped or runs in non-wait mode until the next checkpoint. > But the advantage that users get is the faster server availability. I think that we should remove end-of-recovery checkpoints completely and instead use the end-of-recovery WAL record (cf. CreateEndOfRecoveryRecord). However, when I tried to do that, I ran into some problems: http://postgr.es/m/CA+TgmobrM2jvkiccCS9NgFcdjNSgAvk1qcAPx5S6F+oJT3D2mQ@mail.gmail.com The second problem described in that email has subsequently been fixed, I believe, but the first one remains. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Run end-of-recovery checkpoint in non-wait mode or skip it entirely for faster server availability?
From
Andres Freund
Date:
Hi, On March 25, 2022 9:56:38 AM PDT, Robert Haas <robertmhaas@gmail.com> wrote: >On Fri, Mar 25, 2022 at 3:40 AM Bharath Rupireddy ><bharath.rupireddyforpostgres@gmail.com> wrote: >> Since the server spins up checkpointer process [1] while the startup >> process performs recovery, isn't it a good idea to make >> end-of-recovery completely optional for the users or at least run it >> in non-wait mode so that the server will be available faster. The next >> checkpointer cycle will take care of performing the EOR checkpoint >> work, if user chooses to skip the EOR or the checkpointer will run EOR >> checkpoint in background, if user chooses to run it in the non-wait >> mode (without CHECKPOINT_WAIT flag). Of course by choosing this >> option, users must be aware of the fact that the extra amount of >> recovery work that needs to be done if a crash happens from the point >> EOR gets skipped or runs in non-wait mode until the next checkpoint. >> But the advantage that users get is the faster server availability. > >I think that we should remove end-of-recovery checkpoints completely >and instead use the end-of-recovery WAL record (cf. >CreateEndOfRecoveryRecord). However, when I tried to do that, I ran >into some problems: > >http://postgr.es/m/CA+TgmobrM2jvkiccCS9NgFcdjNSgAvk1qcAPx5S6F+oJT3D2mQ@mail.gmail.com > >The second problem described in that email has subsequently been >fixed, I believe, but the first one remains. Seems we could deal with that by making latestCompleted a 64bit xid? Then there never are cases where we have to retreatback into such early xids? A random note from a conversation with Thomas a few days ago: We still perform timeline increases with checkpoints in somecases. Might be worth fixing as a step towards just using EOR. Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.