Re: Some problems of recovery conflict wait events - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Some problems of recovery conflict wait events
Date
Msg-id CA+fd4k42mqvEd6J9x0yD4Zpya9nXK0CwSOtMs6ju7edj-da0sw@mail.gmail.com
Whole thread Raw
In response to Re: Some problems of recovery conflict wait events  (Fujii Masao <masao.fujii@oss.nttdata.com>)
Responses Re: Some problems of recovery conflict wait events  (Michael Paquier <michael@paquier.xyz>)
List pgsql-hackers
On Wed, 4 Mar 2020 at 11:04, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>
>
>
> On 2020/02/29 12:36, Masahiko Sawada wrote:
> > On Wed, 26 Feb 2020 at 16:19, Masahiko Sawada
> > <masahiko.sawada@2ndquadrant.com> wrote:
> >>
> >> On Tue, 18 Feb 2020 at 17:58, Masahiko Sawada
> >> <masahiko.sawada@2ndquadrant.com> wrote:
> >>>
> >>> Hi all,
> >>>
> >>> When recovery conflicts happen on the streaming replication standby,
> >>> the wait event of startup process is null when
> >>> max_standby_streaming_delay = 0 (to be exact, when the limit time
> >>> calculated by max_standby_streaming_delay is behind the last WAL data
> >>> receipt time is behind). Moreover the process title of waiting startup
> >>> process looks odd in the case of lock conflicts.
> >>>
> >>> 1. When max_standby_streaming_delay > 0 and the startup process
> >>> conflicts with a lock,
> >>>
> >>> * wait event
> >>>   backend_type | wait_event_type | wait_event
> >>> --------------+-----------------+------------
> >>>   startup      | Lock            | relation
> >>> (1 row)
> >>>
> >>> * ps
> >>> 42513   ??  Ss     0:00.05 postgres: startup   recovering
> >>> 000000010000000000000003 waiting
> >>>
> >>> Looks good.
> >>>
> >>> 2. When max_standby_streaming_delay > 0 and the startup process
> >>> conflicts with a snapshot,
> >>>
> >>> * wait event
> >>>   backend_type | wait_event_type | wait_event
> >>> --------------+-----------------+------------
> >>>   startup      |                 |
> >>> (1 row)
> >>>
> >>> * ps
> >>> 44299   ??  Ss     0:00.05 postgres: startup   recovering
> >>> 000000010000000000000003 waiting
> >>>
> >>> wait_event_type and wait_event are null in spite of waiting for
> >>> conflict resolution.
> >>>
> >>> 3. When max_standby_streaming_delay > 0 and the startup process
> >>> conflicts with a lock,
> >>>
> >>> * wait event
> >>>   backend_type | wait_event_type | wait_event
> >>> --------------+-----------------+------------
> >>>   startup      |                 |
> >>> (1 row)
> >>>
> >>> * ps
> >>> 46510   ??  Ss     0:00.05 postgres: startup   recovering
> >>> 000000010000000000000003 waiting waiting
> >>>
> >>> wait_event_type and wait_event are null and the process title is
> >>> wrong; "waiting" appears twice.
> >>>
> >>> The cause of the first problem, wait_event_type and wait_event are not
> >>> set, is that WaitExceedsMaxStandbyDelay which is called by
> >>> ResolveRecoveryConflictWithVirtualXIDs waits for other transactions
> >>> using pg_usleep rather than WaitLatch. I think we can change it so
> >>> that it uses WaitLatch and those caller passes wait event information.
> >>>
> >>> For the second problem, wrong process title, the cause is also
> >>> relevant with ResolveRecoveryConflictWithVirtualXIDs; in case of lock
> >>> conflicts we add "waiting" to the process title in WaitOnLock but we
> >>> add it again in ResolveRecoveryConflictWithVirtualXIDs. I think we can
> >>> have WaitOnLock not set process title in recovery case.
> >>>
> >>> This problem exists on 12, 11 and 10. I'll submit the patch.
> >>>
> >>
> >> I've attached patches that fix the above two issues.
> >>
> >> 0001 patch fixes the first problem. Currently there are 5 types of
> >> recovery conflict resolution: snapshot, tablespace, lock, database and
> >> buffer pin, and we set wait events to only 2 events out of 5: lock
> >> (only when doing ProcWaitForSignal) and buffer pin.
>
> +1 to add those new wait events in the master. But adding them sounds like
> new feature rather than bug fix. So ISTM that it's not be back-patchable...
>

Yeah, so 0001 patch sets existing wait events to recovery conflict
resolution. For instance, it sets (PG_WAIT_LOCK | LOCKTAG_TRANSACTION)
to the recovery conflict on a snapshot. 0003 patch improves these wait
events by adding the new type of wait event such as
WAIT_EVENT_RECOVERY_CONFLICT_SNAPSHOT. Therefore 0001 (and 0002) patch
is the fix for existing versions and 0003 patch is an improvement for
only PG13. Did you mean even 0001 patch doesn't fit for back-patching?

Regards,


--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Adam Lee
Date:
Subject: Re: Add LogicalTapeSetExtend() to logtape.c
Next
From: Dilip Kumar
Date:
Subject: Re: logical replication empty transactions