Thread: skip replication slot snapshot/map file removal during end-of-recovery checkpoint
skip replication slot snapshot/map file removal during end-of-recovery checkpoint
From
Bharath Rupireddy
Date:
Hi, Currently the end-of-recovery checkpoint can be much slower, impacting the server availability, if there are many replication slot files XXXX.snap or map-XXXX to be enumerated and deleted. How about skipping the .snap and map- file handling during the end-of-recovery checkpoint? It makes the server available faster and the next regular checkpoint can deal with these files. If required, we can have a GUC (skip_replication_slot_file_handling or some other better name) to control this default being the existing behavior. Thoughts? Regards, Bharath Rupireddy.
Re: skip replication slot snapshot/map file removal during end-of-recovery checkpoint
From
Bharath Rupireddy
Date:
On Thu, Dec 23, 2021 at 4:46 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > Hi, > > Currently the end-of-recovery checkpoint can be much slower, impacting > the server availability, if there are many replication slot files > XXXX.snap or map-XXXX to be enumerated and deleted. How about skipping > the .snap and map- file handling during the end-of-recovery > checkpoint? It makes the server available faster and the next regular > checkpoint can deal with these files. If required, we can have a GUC > (skip_replication_slot_file_handling or some other better name) to > control this default being the existing behavior. > > Thoughts? Here's the v1 patch, please review it. Regards, Bharath Rupireddy.
Attachment
Re: skip replication slot snapshot/map file removal during end-of-recovery checkpoint
From
"Bossart, Nathan"
Date:
On 12/23/21, 3:17 AM, "Bharath Rupireddy" <bharath.rupireddyforpostgres@gmail.com> wrote: > Currently the end-of-recovery checkpoint can be much slower, impacting > the server availability, if there are many replication slot files > XXXX.snap or map-XXXX to be enumerated and deleted. How about skipping > the .snap and map- file handling during the end-of-recovery > checkpoint? It makes the server available faster and the next regular > checkpoint can deal with these files. If required, we can have a GUC > (skip_replication_slot_file_handling or some other better name) to > control this default being the existing behavior. I suggested something similar as a possibility in the other thread where these tasks are being discussed [0]. I think it is worth considering, but IMO it is not a complete solution to the problem. If there are frequently many such files to delete and regular checkpoints are taking longer, the shutdown/end-of-recovery checkpoint could still take a while. I think it would be better to separate these tasks from checkpointing instead. Nathan [0] https://postgr.es/m/A285A823-0AF2-4376-838E-847FA4710F9A%40amazon.com
Re: skip replication slot snapshot/map file removal during end-of-recovery checkpoint
From
Bharath Rupireddy
Date:
On Thu, Jan 6, 2022 at 5:04 AM Bossart, Nathan <bossartn@amazon.com> wrote: > > On 12/23/21, 3:17 AM, "Bharath Rupireddy" <bharath.rupireddyforpostgres@gmail.com> wrote: > > Currently the end-of-recovery checkpoint can be much slower, impacting > > the server availability, if there are many replication slot files > > XXXX.snap or map-XXXX to be enumerated and deleted. How about skipping > > the .snap and map- file handling during the end-of-recovery > > checkpoint? It makes the server available faster and the next regular > > checkpoint can deal with these files. If required, we can have a GUC > > (skip_replication_slot_file_handling or some other better name) to > > control this default being the existing behavior. > > I suggested something similar as a possibility in the other thread > where these tasks are being discussed [0]. I think it is worth > considering, but IMO it is not a complete solution to the problem. If > there are frequently many such files to delete and regular checkpoints > are taking longer, the shutdown/end-of-recovery checkpoint could still > take a while. I think it would be better to separate these tasks from > checkpointing instead. > > [0] https://postgr.es/m/A285A823-0AF2-4376-838E-847FA4710F9A%40amazon.com Thanks. I agree to solve it as part of the other thread and close this thread here. Regards, Bharath Rupireddy.