Re: POC: Cleaning up orphaned files using undo logs - Mailing list pgsql-hackers

From Robert Haas
Subject Re: POC: Cleaning up orphaned files using undo logs
Date
Msg-id CA+TgmoYHBkm7M8tNk6Z9G_aEOiw3Bjdux7v9+UzmdNTdFmFzjA@mail.gmail.com
Whole thread Raw
In response to Re: POC: Cleaning up orphaned files using undo logs  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: POC: Cleaning up orphaned files using undo logs
Re: POC: Cleaning up orphaned files using undo logs
Re: POC: Cleaning up orphaned files using undo logs
List pgsql-hackers
On Tue, Jun 18, 2019 at 7:31 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> [ new patches ]

I tried writing some code that throws an error from an undo log
handler and the results were not good.  It appears that the code will
retry in a tight loop:

2019-06-18 13:58:53.262 EDT [42803] ERROR:  robert_undo
2019-06-18 13:58:53.263 EDT [42803] ERROR:  robert_undo
2019-06-18 13:58:53.263 EDT [42803] ERROR:  robert_undo
2019-06-18 13:58:53.263 EDT [42803] ERROR:  robert_undo
2019-06-18 13:58:53.263 EDT [42803] ERROR:  robert_undo
2019-06-18 13:58:53.263 EDT [42803] ERROR:  robert_undo
2019-06-18 13:58:53.263 EDT [42803] ERROR:  robert_undo
2019-06-18 13:58:53.263 EDT [42803] ERROR:  robert_undo
2019-06-18 13:58:53.263 EDT [42803] ERROR:  robert_undo
2019-06-18 13:58:53.263 EDT [42803] ERROR:  robert_undo
2019-06-18 13:58:53.263 EDT [42803] ERROR:  robert_undo
2019-06-18 13:58:53.263 EDT [42803] ERROR:  robert_undo
2019-06-18 13:58:53.263 EDT [42803] ERROR:  robert_undo
2019-06-18 13:58:53.263 EDT [42803] ERROR:  robert_undo
2019-06-18 13:58:53.263 EDT [42803] ERROR:  robert_undo
2019-06-18 13:58:53.263 EDT [42803] ERROR:  robert_undo
2019-06-18 13:58:53.263 EDT [42803] ERROR:  robert_undo
2019-06-18 13:58:53.263 EDT [42803] ERROR:  robert_undo
2019-06-18 13:58:53.263 EDT [42803] ERROR:  robert_undo
2019-06-18 13:58:53.263 EDT [42803] ERROR:  robert_undo
2019-06-18 13:58:53.263 EDT [42803] ERROR:  robert_undo
2019-06-18 13:58:53.263 EDT [42803] ERROR:  robert_undo
2019-06-18 13:58:53.263 EDT [42803] ERROR:  robert_undo
2019-06-18 13:58:53.263 EDT [42803] ERROR:  robert_undo
2019-06-18 13:58:53.263 EDT [42803] ERROR:  robert_undo
2019-06-18 13:58:53.263 EDT [42803] ERROR:  robert_undo
2019-06-18 13:58:53.263 EDT [42803] ERROR:  robert_undo
2019-06-18 13:58:53.263 EDT [42803] ERROR:  robert_undo
2019-06-18 13:58:53.263 EDT [42803] ERROR:  robert_undo
2019-06-18 13:58:53.263 EDT [42803] ERROR:  robert_undo
2019-06-18 13:58:53.263 EDT [42803] ERROR:  robert_undo
2019-06-18 13:58:53.263 EDT [42803] ERROR:  robert_undo
2019-06-18 13:58:53.263 EDT [42803] ERROR:  robert_undo
2019-06-18 13:58:53.263 EDT [42803] ERROR:  robert_undo
2019-06-18 13:58:53.263 EDT [42803] ERROR:  robert_undo
2019-06-18 13:58:53.263 EDT [42803] ERROR:  robert_undo
2019-06-18 13:58:53.263 EDT [42803] ERROR:  robert_undo
2019-06-18 13:58:53.263 EDT [42803] ERROR:  robert_undo
2019-06-18 13:58:53.263 EDT [42803] ERROR:  robert_undo
2019-06-18 13:58:53.263 EDT [42803] ERROR:  robert_undo
2019-06-18 13:58:53.263 EDT [42803] ERROR:  robert_undo
2019-06-18 13:58:53.264 EDT [42803] ERROR:  robert_undo

It seems clear that the error-handling aspect of this patch has not
been given enough thought.  It's debatable what strategy should be
used when undo fails, but retrying 40 times per millisecond isn't the
right answer. I assume we want some kind of cool-down between retries.
10 seconds?  A minute?  Some kind of back-off algorithm that gradually
increases the retry time up to some maximum?  Should there be one or
more GUCs?

Another thing that is not very nice is that when I tried to shut down
the server via 'pg_ctl stop' while the above was happening, it did not
shut down.  I had to use an immediate shutdown.  That's clearly not
OK.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Chapman Flack
Date:
Subject: Re: Avoiding possible future conformance headaches in JSON work
Next
From: Oleksii Kliukin
Date:
Subject: Re: pgsql: Avoid spurious deadlocks when upgrading a tuple lock