Standby recovery conflicts: add information when the cancellation occurs - Mailing list pgsql-hackers

From Drouvot, Bertrand
Subject Standby recovery conflicts: add information when the cancellation occurs
Date
Msg-id 5a11fa42-f275-8610-4b69-76f52d11d8ab@amazon.com
Whole thread Raw
List pgsql-hackers
Hi hackers,

As suggested by Masao, I am starting a new thread to follow up about 
standby recovery conflicts.

The initial patch proposed in [1] has been split in 3 parts:

- Add block information in error context of WAL REDO apply: committed 
(9d0bd95fa90a7243047a74e29f265296a9fc556d)
- Add information when the startup process is waiting for recovery 
conflicts: committed (0650ff23038bc3eb8d8fd851744db837d921e285)
- Add information when the cancellation occurs:  subject of this new thread

As you can see, the initial idea was also to dump information about the 
blocking backends (should they reach the cancellation stage).

Main idea is to provide information like:

2020-06-15 06:48:54.778 UTC [7037] LOG: about to interrupt pid: 7037, 
backend_type: client backend, state: active, wait_event_type: Timeout, 
wait_event: PgSleep, query_start: 2020-06-15 06:48:13.008427+00

Some examples, on how this could be useful:

     - For example the query being canceled usually runs in 1 second, 
seeing that it started 1 minute ago (when canceled) could indicate plan 
change.
     - For example a lot of queries have been canceled and all of them 
were waiting on “DataFileRead”: that could indicate bad IO response time 
at that moment.
     - Seeing the state as “idle in transaction” could potentially 
indicate an unexpected application behavior (say the application is 
using Begin; SET TRANSACTION ISOLATION LEVEL REPEATABLE READ; then 
select and then stay in an idle in transaction state that could lead to 
recovery conflict)

Main purpose is to dump information just before the cancellation occurs 
to get some clue on what was going on and get some data to work on (to 
avoid future conflict and cancellation).

If you think this information can be useful then I can submit a patch in 
this area.

Bertrand

[1]: 
https://www.postgresql.org/message-id/9a60178c-a853-1440-2cdc-c3af916cff59%40amazon.com








pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: ResourceOwner refactoring
Next
From: Alvaro Herrera
Date:
Subject: Re: support for MERGE