Re: Some problems of recovery conflict wait events - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Some problems of recovery conflict wait events
Date
Msg-id CA+fd4k7_f6-yQLiwH0YVKN-J2C1NRbOJxF1LbAZW=kn-98X4=w@mail.gmail.com
Whole thread Raw
In response to Re: Some problems of recovery conflict wait events  (Masahiko Sawada <masahiko.sawada@2ndquadrant.com>)
Responses Re: Some problems of recovery conflict wait events
List pgsql-hackers
On Wed, 26 Feb 2020 at 16:19, Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Tue, 18 Feb 2020 at 17:58, Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > Hi all,
> >
> > When recovery conflicts happen on the streaming replication standby,
> > the wait event of startup process is null when
> > max_standby_streaming_delay = 0 (to be exact, when the limit time
> > calculated by max_standby_streaming_delay is behind the last WAL data
> > receipt time is behind). Moreover the process title of waiting startup
> > process looks odd in the case of lock conflicts.
> >
> > 1. When max_standby_streaming_delay > 0 and the startup process
> > conflicts with a lock,
> >
> > * wait event
> >  backend_type | wait_event_type | wait_event
> > --------------+-----------------+------------
> >  startup      | Lock            | relation
> > (1 row)
> >
> > * ps
> > 42513   ??  Ss     0:00.05 postgres: startup   recovering
> > 000000010000000000000003 waiting
> >
> > Looks good.
> >
> > 2. When max_standby_streaming_delay > 0 and the startup process
> > conflicts with a snapshot,
> >
> > * wait event
> >  backend_type | wait_event_type | wait_event
> > --------------+-----------------+------------
> >  startup      |                 |
> > (1 row)
> >
> > * ps
> > 44299   ??  Ss     0:00.05 postgres: startup   recovering
> > 000000010000000000000003 waiting
> >
> > wait_event_type and wait_event are null in spite of waiting for
> > conflict resolution.
> >
> > 3. When max_standby_streaming_delay > 0 and the startup process
> > conflicts with a lock,
> >
> > * wait event
> >  backend_type | wait_event_type | wait_event
> > --------------+-----------------+------------
> >  startup      |                 |
> > (1 row)
> >
> > * ps
> > 46510   ??  Ss     0:00.05 postgres: startup   recovering
> > 000000010000000000000003 waiting waiting
> >
> > wait_event_type and wait_event are null and the process title is
> > wrong; "waiting" appears twice.
> >
> > The cause of the first problem, wait_event_type and wait_event are not
> > set, is that WaitExceedsMaxStandbyDelay which is called by
> > ResolveRecoveryConflictWithVirtualXIDs waits for other transactions
> > using pg_usleep rather than WaitLatch. I think we can change it so
> > that it uses WaitLatch and those caller passes wait event information.
> >
> > For the second problem, wrong process title, the cause is also
> > relevant with ResolveRecoveryConflictWithVirtualXIDs; in case of lock
> > conflicts we add "waiting" to the process title in WaitOnLock but we
> > add it again in ResolveRecoveryConflictWithVirtualXIDs. I think we can
> > have WaitOnLock not set process title in recovery case.
> >
> > This problem exists on 12, 11 and 10. I'll submit the patch.
> >
>
> I've attached patches that fix the above two issues.
>
> 0001 patch fixes the first problem. Currently there are 5 types of
> recovery conflict resolution: snapshot, tablespace, lock, database and
> buffer pin, and we set wait events to only 2 events out of 5: lock
> (only when doing ProcWaitForSignal) and buffer pin. Therefore, users
> cannot know that the startup process is waiting or not, and what
> waiting for. This patch sets wait events to more 3 events: snapshot,
> tablespace and lock. For wait events of those 3 events, I thought that
> we can create a new more appropriate wait event type, say
> RecoveryConflict, and set it for them. However, considering
> back-patching to existing versions, adding new wait event type would
> not be acceptable. So this patch sets existing wait events such as
> PG_WAIT_LOCK to those 3 places and doesn't not set a wait event for
> conflict resolution on dropping database because there is not an
> appropriate existing one. I'll start a separate thread about
> improvement on wait events of recovery conflict resolution for PG13 if
> necessary.

Attached a patch improves wait events of recovery conflict resolution.
It's for PG13. I added new RecoveryConflict wait_event_type and some
wait_event. This patch can be applied on top of two patches I already
proposed.

Regards,

[1] https://www.postgresql.org/message-id/CA%2Bfd4k63ukOtdNx2f-fUZ2vuB3RgE%3DPo%2BxSnpmcPJbKqsJMtiA%40mail.gmail.com

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

pgsql-hackers by date:

Previous
From: Justin Pryzby
Date:
Subject: Re: ALTER tbl rewrite loses CLUSTER ON index
Next
From: Dilip Kumar
Date:
Subject: Re: PATCH: logical_work_mem and logical streaming of largein-progress transactions