Re: Some problems of recovery conflict wait events - Mailing list pgsql-hackers
From | Fujii Masao |
---|---|
Subject | Re: Some problems of recovery conflict wait events |
Date | |
Msg-id | d60fd913-7cfc-564e-62b6-3db3995a5e33@oss.nttdata.com Whole thread Raw |
In response to | Re: Some problems of recovery conflict wait events (Masahiko Sawada <masahiko.sawada@2ndquadrant.com>) |
Responses |
Re: Some problems of recovery conflict wait events
|
List | pgsql-hackers |
On 2020/02/29 12:36, Masahiko Sawada wrote: > On Wed, 26 Feb 2020 at 16:19, Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: >> >> On Tue, 18 Feb 2020 at 17:58, Masahiko Sawada >> <masahiko.sawada@2ndquadrant.com> wrote: >>> >>> Hi all, >>> >>> When recovery conflicts happen on the streaming replication standby, >>> the wait event of startup process is null when >>> max_standby_streaming_delay = 0 (to be exact, when the limit time >>> calculated by max_standby_streaming_delay is behind the last WAL data >>> receipt time is behind). Moreover the process title of waiting startup >>> process looks odd in the case of lock conflicts. >>> >>> 1. When max_standby_streaming_delay > 0 and the startup process >>> conflicts with a lock, >>> >>> * wait event >>> backend_type | wait_event_type | wait_event >>> --------------+-----------------+------------ >>> startup | Lock | relation >>> (1 row) >>> >>> * ps >>> 42513 ?? Ss 0:00.05 postgres: startup recovering >>> 000000010000000000000003 waiting >>> >>> Looks good. >>> >>> 2. When max_standby_streaming_delay > 0 and the startup process >>> conflicts with a snapshot, >>> >>> * wait event >>> backend_type | wait_event_type | wait_event >>> --------------+-----------------+------------ >>> startup | | >>> (1 row) >>> >>> * ps >>> 44299 ?? Ss 0:00.05 postgres: startup recovering >>> 000000010000000000000003 waiting >>> >>> wait_event_type and wait_event are null in spite of waiting for >>> conflict resolution. >>> >>> 3. When max_standby_streaming_delay > 0 and the startup process >>> conflicts with a lock, >>> >>> * wait event >>> backend_type | wait_event_type | wait_event >>> --------------+-----------------+------------ >>> startup | | >>> (1 row) >>> >>> * ps >>> 46510 ?? Ss 0:00.05 postgres: startup recovering >>> 000000010000000000000003 waiting waiting >>> >>> wait_event_type and wait_event are null and the process title is >>> wrong; "waiting" appears twice. >>> >>> The cause of the first problem, wait_event_type and wait_event are not >>> set, is that WaitExceedsMaxStandbyDelay which is called by >>> ResolveRecoveryConflictWithVirtualXIDs waits for other transactions >>> using pg_usleep rather than WaitLatch. I think we can change it so >>> that it uses WaitLatch and those caller passes wait event information. >>> >>> For the second problem, wrong process title, the cause is also >>> relevant with ResolveRecoveryConflictWithVirtualXIDs; in case of lock >>> conflicts we add "waiting" to the process title in WaitOnLock but we >>> add it again in ResolveRecoveryConflictWithVirtualXIDs. I think we can >>> have WaitOnLock not set process title in recovery case. >>> >>> This problem exists on 12, 11 and 10. I'll submit the patch. >>> >> >> I've attached patches that fix the above two issues. >> >> 0001 patch fixes the first problem. Currently there are 5 types of >> recovery conflict resolution: snapshot, tablespace, lock, database and >> buffer pin, and we set wait events to only 2 events out of 5: lock >> (only when doing ProcWaitForSignal) and buffer pin. +1 to add those new wait events in the master. But adding them sounds like new feature rather than bug fix. So ISTM that it's not be back-patchable... Regards, -- Fujii Masao NTT DATA CORPORATION Advanced Platform Technology Group Research and Development Headquarters
pgsql-hackers by date: