Re: Some problems of recovery conflict wait events - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: Some problems of recovery conflict wait events |
Date | |
Msg-id | CA+fd4k42mqvEd6J9x0yD4Zpya9nXK0CwSOtMs6ju7edj-da0sw@mail.gmail.com Whole thread Raw |
In response to | Re: Some problems of recovery conflict wait events (Fujii Masao <masao.fujii@oss.nttdata.com>) |
Responses |
Re: Some problems of recovery conflict wait events
|
List | pgsql-hackers |
On Wed, 4 Mar 2020 at 11:04, Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > > > On 2020/02/29 12:36, Masahiko Sawada wrote: > > On Wed, 26 Feb 2020 at 16:19, Masahiko Sawada > > <masahiko.sawada@2ndquadrant.com> wrote: > >> > >> On Tue, 18 Feb 2020 at 17:58, Masahiko Sawada > >> <masahiko.sawada@2ndquadrant.com> wrote: > >>> > >>> Hi all, > >>> > >>> When recovery conflicts happen on the streaming replication standby, > >>> the wait event of startup process is null when > >>> max_standby_streaming_delay = 0 (to be exact, when the limit time > >>> calculated by max_standby_streaming_delay is behind the last WAL data > >>> receipt time is behind). Moreover the process title of waiting startup > >>> process looks odd in the case of lock conflicts. > >>> > >>> 1. When max_standby_streaming_delay > 0 and the startup process > >>> conflicts with a lock, > >>> > >>> * wait event > >>> backend_type | wait_event_type | wait_event > >>> --------------+-----------------+------------ > >>> startup | Lock | relation > >>> (1 row) > >>> > >>> * ps > >>> 42513 ?? Ss 0:00.05 postgres: startup recovering > >>> 000000010000000000000003 waiting > >>> > >>> Looks good. > >>> > >>> 2. When max_standby_streaming_delay > 0 and the startup process > >>> conflicts with a snapshot, > >>> > >>> * wait event > >>> backend_type | wait_event_type | wait_event > >>> --------------+-----------------+------------ > >>> startup | | > >>> (1 row) > >>> > >>> * ps > >>> 44299 ?? Ss 0:00.05 postgres: startup recovering > >>> 000000010000000000000003 waiting > >>> > >>> wait_event_type and wait_event are null in spite of waiting for > >>> conflict resolution. > >>> > >>> 3. When max_standby_streaming_delay > 0 and the startup process > >>> conflicts with a lock, > >>> > >>> * wait event > >>> backend_type | wait_event_type | wait_event > >>> --------------+-----------------+------------ > >>> startup | | > >>> (1 row) > >>> > >>> * ps > >>> 46510 ?? Ss 0:00.05 postgres: startup recovering > >>> 000000010000000000000003 waiting waiting > >>> > >>> wait_event_type and wait_event are null and the process title is > >>> wrong; "waiting" appears twice. > >>> > >>> The cause of the first problem, wait_event_type and wait_event are not > >>> set, is that WaitExceedsMaxStandbyDelay which is called by > >>> ResolveRecoveryConflictWithVirtualXIDs waits for other transactions > >>> using pg_usleep rather than WaitLatch. I think we can change it so > >>> that it uses WaitLatch and those caller passes wait event information. > >>> > >>> For the second problem, wrong process title, the cause is also > >>> relevant with ResolveRecoveryConflictWithVirtualXIDs; in case of lock > >>> conflicts we add "waiting" to the process title in WaitOnLock but we > >>> add it again in ResolveRecoveryConflictWithVirtualXIDs. I think we can > >>> have WaitOnLock not set process title in recovery case. > >>> > >>> This problem exists on 12, 11 and 10. I'll submit the patch. > >>> > >> > >> I've attached patches that fix the above two issues. > >> > >> 0001 patch fixes the first problem. Currently there are 5 types of > >> recovery conflict resolution: snapshot, tablespace, lock, database and > >> buffer pin, and we set wait events to only 2 events out of 5: lock > >> (only when doing ProcWaitForSignal) and buffer pin. > > +1 to add those new wait events in the master. But adding them sounds like > new feature rather than bug fix. So ISTM that it's not be back-patchable... > Yeah, so 0001 patch sets existing wait events to recovery conflict resolution. For instance, it sets (PG_WAIT_LOCK | LOCKTAG_TRANSACTION) to the recovery conflict on a snapshot. 0003 patch improves these wait events by adding the new type of wait event such as WAIT_EVENT_RECOVERY_CONFLICT_SNAPSHOT. Therefore 0001 (and 0002) patch is the fix for existing versions and 0003 patch is an improvement for only PG13. Did you mean even 0001 patch doesn't fit for back-patching? Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: