Re: Some problems of recovery conflict wait events - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Some problems of recovery conflict wait events
Date
Msg-id CA+fd4k51DA2sYvAzQ5wKzrbgtRmLmuDQ0i5F4y0i2dat5+4yaA@mail.gmail.com
Whole thread Raw
In response to Re: Some problems of recovery conflict wait events  (Fujii Masao <masao.fujii@oss.nttdata.com>)
Responses Re: Some problems of recovery conflict wait events  (Fujii Masao <masao.fujii@oss.nttdata.com>)
List pgsql-hackers
On Fri, 27 Mar 2020 at 17:54, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>
>
>
> On 2020/03/04 14:31, Masahiko Sawada wrote:
> > On Wed, 4 Mar 2020 at 13:48, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> >>
> >>
> >>
> >> On 2020/03/04 13:27, Michael Paquier wrote:
> >>> On Wed, Mar 04, 2020 at 01:13:19PM +0900, Masahiko Sawada wrote:
> >>>> Yeah, so 0001 patch sets existing wait events to recovery conflict
> >>>> resolution. For instance, it sets (PG_WAIT_LOCK | LOCKTAG_TRANSACTION)
> >>>> to the recovery conflict on a snapshot. 0003 patch improves these wait
> >>>> events by adding the new type of wait event such as
> >>>> WAIT_EVENT_RECOVERY_CONFLICT_SNAPSHOT. Therefore 0001 (and 0002) patch
> >>>> is the fix for existing versions and 0003 patch is an improvement for
> >>>> only PG13. Did you mean even 0001 patch doesn't fit for back-patching?
> >>
> >> Yes, it looks like a improvement rather than bug fix.
> >>
> >
> > Okay, understand.
> >
> >>> I got my eyes on this patch set.  The full patch set is in my opinion
> >>> just a set of improvements, and not bug fixes, so I would refrain from
> >>> back-backpatching.
> >>
> >> I think that the issue (i.e., "waiting" is reported twice needlessly
> >> in PS display) that 0002 patch tries to fix is a bug. So it should be
> >> fixed even in the back branches.
> >
> > So we need only two patches: one fixes process title issue and another
> > improve wait event. I've attached updated patches.
>
> I started reading v2-0002-Improve-wait-events-for-recovery-conflict-resolut.patch.
>
> -       ProcWaitForSignal(PG_WAIT_BUFFER_PIN);
> +       ProcWaitForSignal(WAIT_EVENT_RECOVERY_CONFLICT_BUFFER_PIN);
>
> Currently the wait event indicating the wait for buffer pin has already
> been reported. But the above change in the patch changes the name of
> wait event for buffer pin only in the startup process. Is this really useful?
> Isn't the existing wait event for buffer pin enough?
>
> -       /* Wait to be signaled by the release of the Relation Lock */
> -       ProcWaitForSignal(PG_WAIT_LOCK | locktag.locktag_type);
> +               /* Wait to be signaled by the release of the Relation Lock */
> +               ProcWaitForSignal(WAIT_EVENT_RECOVERY_CONFLICT_LOCK);
>
> Same as above. Isn't the existing wait event enough?

Yeah, we can use the existing wait events for buffer pin and lock.

>
> -       /*
> -        * Progressively increase the sleep times, but not to more than 1s, since
> -        * pg_usleep isn't interruptible on some platforms.
> -        */
> -       standbyWait_us *= 2;
> -       if (standbyWait_us > 1000000)
> -               standbyWait_us = 1000000;
> +       WaitLatch(MyLatch,
> +                         WL_LATCH_SET | WL_POSTMASTER_DEATH | WL_TIMEOUT,
> +                         STANDBY_WAIT_MS,
> +                         wait_event_info);
> +       ResetLatch(MyLatch);
>
> ResetLatch() should be called before WaitLatch()?

Fixed.

>
> Could you tell me why you dropped the "increase-sleep-times" logic?

I thought we can remove it because WaitLatch is interruptible but my
observation was not correct. The waiting startup process is not
necessarily woken up by signal. I think it's still better to not wait
more than 1 sec even if it's an interruptible wait.

Attached patch fixes the above and introduces only two wait events of
conflict resolution: snapshot and tablespace. I also removed the wait
event of conflict resolution of database since it's unlikely to become
a user-visible and a long sleep as we discussed before.

Regards,

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

pgsql-hackers by date:

Previous
From: Rajkumar Raghuwanshi
Date:
Subject: Re: WIP/PoC for parallel backup
Next
From: "k.jamison@fujitsu.com"
Date:
Subject: RE: [Patch] Optimize dropping of relation buffers using dlist