Re: POC: Cleaning up orphaned files using undo logs - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: POC: Cleaning up orphaned files using undo logs |
Date | |
Msg-id | CA+TgmoYHBkm7M8tNk6Z9G_aEOiw3Bjdux7v9+UzmdNTdFmFzjA@mail.gmail.com Whole thread Raw |
In response to | Re: POC: Cleaning up orphaned files using undo logs (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: POC: Cleaning up orphaned files using undo logs
Re: POC: Cleaning up orphaned files using undo logs Re: POC: Cleaning up orphaned files using undo logs |
List | pgsql-hackers |
On Tue, Jun 18, 2019 at 7:31 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > [ new patches ] I tried writing some code that throws an error from an undo log handler and the results were not good. It appears that the code will retry in a tight loop: 2019-06-18 13:58:53.262 EDT [42803] ERROR: robert_undo 2019-06-18 13:58:53.263 EDT [42803] ERROR: robert_undo 2019-06-18 13:58:53.263 EDT [42803] ERROR: robert_undo 2019-06-18 13:58:53.263 EDT [42803] ERROR: robert_undo 2019-06-18 13:58:53.263 EDT [42803] ERROR: robert_undo 2019-06-18 13:58:53.263 EDT [42803] ERROR: robert_undo 2019-06-18 13:58:53.263 EDT [42803] ERROR: robert_undo 2019-06-18 13:58:53.263 EDT [42803] ERROR: robert_undo 2019-06-18 13:58:53.263 EDT [42803] ERROR: robert_undo 2019-06-18 13:58:53.263 EDT [42803] ERROR: robert_undo 2019-06-18 13:58:53.263 EDT [42803] ERROR: robert_undo 2019-06-18 13:58:53.263 EDT [42803] ERROR: robert_undo 2019-06-18 13:58:53.263 EDT [42803] ERROR: robert_undo 2019-06-18 13:58:53.263 EDT [42803] ERROR: robert_undo 2019-06-18 13:58:53.263 EDT [42803] ERROR: robert_undo 2019-06-18 13:58:53.263 EDT [42803] ERROR: robert_undo 2019-06-18 13:58:53.263 EDT [42803] ERROR: robert_undo 2019-06-18 13:58:53.263 EDT [42803] ERROR: robert_undo 2019-06-18 13:58:53.263 EDT [42803] ERROR: robert_undo 2019-06-18 13:58:53.263 EDT [42803] ERROR: robert_undo 2019-06-18 13:58:53.263 EDT [42803] ERROR: robert_undo 2019-06-18 13:58:53.263 EDT [42803] ERROR: robert_undo 2019-06-18 13:58:53.263 EDT [42803] ERROR: robert_undo 2019-06-18 13:58:53.263 EDT [42803] ERROR: robert_undo 2019-06-18 13:58:53.263 EDT [42803] ERROR: robert_undo 2019-06-18 13:58:53.263 EDT [42803] ERROR: robert_undo 2019-06-18 13:58:53.263 EDT [42803] ERROR: robert_undo 2019-06-18 13:58:53.263 EDT [42803] ERROR: robert_undo 2019-06-18 13:58:53.263 EDT [42803] ERROR: robert_undo 2019-06-18 13:58:53.263 EDT [42803] ERROR: robert_undo 2019-06-18 13:58:53.263 EDT [42803] ERROR: robert_undo 2019-06-18 13:58:53.263 EDT [42803] ERROR: robert_undo 2019-06-18 13:58:53.263 EDT [42803] ERROR: robert_undo 2019-06-18 13:58:53.263 EDT [42803] ERROR: robert_undo 2019-06-18 13:58:53.263 EDT [42803] ERROR: robert_undo 2019-06-18 13:58:53.263 EDT [42803] ERROR: robert_undo 2019-06-18 13:58:53.263 EDT [42803] ERROR: robert_undo 2019-06-18 13:58:53.263 EDT [42803] ERROR: robert_undo 2019-06-18 13:58:53.263 EDT [42803] ERROR: robert_undo 2019-06-18 13:58:53.263 EDT [42803] ERROR: robert_undo 2019-06-18 13:58:53.263 EDT [42803] ERROR: robert_undo 2019-06-18 13:58:53.264 EDT [42803] ERROR: robert_undo It seems clear that the error-handling aspect of this patch has not been given enough thought. It's debatable what strategy should be used when undo fails, but retrying 40 times per millisecond isn't the right answer. I assume we want some kind of cool-down between retries. 10 seconds? A minute? Some kind of back-off algorithm that gradually increases the retry time up to some maximum? Should there be one or more GUCs? Another thing that is not very nice is that when I tried to shut down the server via 'pg_ctl stop' while the above was happening, it did not shut down. I had to use an immediate shutdown. That's clearly not OK. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: