Re: Some problems of recovery conflict wait events - Mailing list pgsql-hackers

From Fujii Masao
Subject Re: Some problems of recovery conflict wait events
Date
Msg-id d60fd913-7cfc-564e-62b6-3db3995a5e33@oss.nttdata.com
Whole thread Raw
In response to Re: Some problems of recovery conflict wait events  (Masahiko Sawada <masahiko.sawada@2ndquadrant.com>)
Responses Re: Some problems of recovery conflict wait events
List pgsql-hackers

On 2020/02/29 12:36, Masahiko Sawada wrote:
> On Wed, 26 Feb 2020 at 16:19, Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
>>
>> On Tue, 18 Feb 2020 at 17:58, Masahiko Sawada
>> <masahiko.sawada@2ndquadrant.com> wrote:
>>>
>>> Hi all,
>>>
>>> When recovery conflicts happen on the streaming replication standby,
>>> the wait event of startup process is null when
>>> max_standby_streaming_delay = 0 (to be exact, when the limit time
>>> calculated by max_standby_streaming_delay is behind the last WAL data
>>> receipt time is behind). Moreover the process title of waiting startup
>>> process looks odd in the case of lock conflicts.
>>>
>>> 1. When max_standby_streaming_delay > 0 and the startup process
>>> conflicts with a lock,
>>>
>>> * wait event
>>>   backend_type | wait_event_type | wait_event
>>> --------------+-----------------+------------
>>>   startup      | Lock            | relation
>>> (1 row)
>>>
>>> * ps
>>> 42513   ??  Ss     0:00.05 postgres: startup   recovering
>>> 000000010000000000000003 waiting
>>>
>>> Looks good.
>>>
>>> 2. When max_standby_streaming_delay > 0 and the startup process
>>> conflicts with a snapshot,
>>>
>>> * wait event
>>>   backend_type | wait_event_type | wait_event
>>> --------------+-----------------+------------
>>>   startup      |                 |
>>> (1 row)
>>>
>>> * ps
>>> 44299   ??  Ss     0:00.05 postgres: startup   recovering
>>> 000000010000000000000003 waiting
>>>
>>> wait_event_type and wait_event are null in spite of waiting for
>>> conflict resolution.
>>>
>>> 3. When max_standby_streaming_delay > 0 and the startup process
>>> conflicts with a lock,
>>>
>>> * wait event
>>>   backend_type | wait_event_type | wait_event
>>> --------------+-----------------+------------
>>>   startup      |                 |
>>> (1 row)
>>>
>>> * ps
>>> 46510   ??  Ss     0:00.05 postgres: startup   recovering
>>> 000000010000000000000003 waiting waiting
>>>
>>> wait_event_type and wait_event are null and the process title is
>>> wrong; "waiting" appears twice.
>>>
>>> The cause of the first problem, wait_event_type and wait_event are not
>>> set, is that WaitExceedsMaxStandbyDelay which is called by
>>> ResolveRecoveryConflictWithVirtualXIDs waits for other transactions
>>> using pg_usleep rather than WaitLatch. I think we can change it so
>>> that it uses WaitLatch and those caller passes wait event information.
>>>
>>> For the second problem, wrong process title, the cause is also
>>> relevant with ResolveRecoveryConflictWithVirtualXIDs; in case of lock
>>> conflicts we add "waiting" to the process title in WaitOnLock but we
>>> add it again in ResolveRecoveryConflictWithVirtualXIDs. I think we can
>>> have WaitOnLock not set process title in recovery case.
>>>
>>> This problem exists on 12, 11 and 10. I'll submit the patch.
>>>
>>
>> I've attached patches that fix the above two issues.
>>
>> 0001 patch fixes the first problem. Currently there are 5 types of
>> recovery conflict resolution: snapshot, tablespace, lock, database and
>> buffer pin, and we set wait events to only 2 events out of 5: lock
>> (only when doing ProcWaitForSignal) and buffer pin.

+1 to add those new wait events in the master. But adding them sounds like
new feature rather than bug fix. So ISTM that it's not be back-patchable...

Regards,


-- 
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters



pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: [PATCH] kNN for btree
Next
From: Alvaro Herrera
Date:
Subject: Re: range_agg