Re: Some problems of recovery conflict wait events - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: Some problems of recovery conflict wait events |
Date | |
Msg-id | CA+fd4k7_f6-yQLiwH0YVKN-J2C1NRbOJxF1LbAZW=kn-98X4=w@mail.gmail.com Whole thread Raw |
In response to | Re: Some problems of recovery conflict wait events (Masahiko Sawada <masahiko.sawada@2ndquadrant.com>) |
Responses |
Re: Some problems of recovery conflict wait events
|
List | pgsql-hackers |
On Wed, 26 Feb 2020 at 16:19, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Tue, 18 Feb 2020 at 17:58, Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > Hi all, > > > > When recovery conflicts happen on the streaming replication standby, > > the wait event of startup process is null when > > max_standby_streaming_delay = 0 (to be exact, when the limit time > > calculated by max_standby_streaming_delay is behind the last WAL data > > receipt time is behind). Moreover the process title of waiting startup > > process looks odd in the case of lock conflicts. > > > > 1. When max_standby_streaming_delay > 0 and the startup process > > conflicts with a lock, > > > > * wait event > > backend_type | wait_event_type | wait_event > > --------------+-----------------+------------ > > startup | Lock | relation > > (1 row) > > > > * ps > > 42513 ?? Ss 0:00.05 postgres: startup recovering > > 000000010000000000000003 waiting > > > > Looks good. > > > > 2. When max_standby_streaming_delay > 0 and the startup process > > conflicts with a snapshot, > > > > * wait event > > backend_type | wait_event_type | wait_event > > --------------+-----------------+------------ > > startup | | > > (1 row) > > > > * ps > > 44299 ?? Ss 0:00.05 postgres: startup recovering > > 000000010000000000000003 waiting > > > > wait_event_type and wait_event are null in spite of waiting for > > conflict resolution. > > > > 3. When max_standby_streaming_delay > 0 and the startup process > > conflicts with a lock, > > > > * wait event > > backend_type | wait_event_type | wait_event > > --------------+-----------------+------------ > > startup | | > > (1 row) > > > > * ps > > 46510 ?? Ss 0:00.05 postgres: startup recovering > > 000000010000000000000003 waiting waiting > > > > wait_event_type and wait_event are null and the process title is > > wrong; "waiting" appears twice. > > > > The cause of the first problem, wait_event_type and wait_event are not > > set, is that WaitExceedsMaxStandbyDelay which is called by > > ResolveRecoveryConflictWithVirtualXIDs waits for other transactions > > using pg_usleep rather than WaitLatch. I think we can change it so > > that it uses WaitLatch and those caller passes wait event information. > > > > For the second problem, wrong process title, the cause is also > > relevant with ResolveRecoveryConflictWithVirtualXIDs; in case of lock > > conflicts we add "waiting" to the process title in WaitOnLock but we > > add it again in ResolveRecoveryConflictWithVirtualXIDs. I think we can > > have WaitOnLock not set process title in recovery case. > > > > This problem exists on 12, 11 and 10. I'll submit the patch. > > > > I've attached patches that fix the above two issues. > > 0001 patch fixes the first problem. Currently there are 5 types of > recovery conflict resolution: snapshot, tablespace, lock, database and > buffer pin, and we set wait events to only 2 events out of 5: lock > (only when doing ProcWaitForSignal) and buffer pin. Therefore, users > cannot know that the startup process is waiting or not, and what > waiting for. This patch sets wait events to more 3 events: snapshot, > tablespace and lock. For wait events of those 3 events, I thought that > we can create a new more appropriate wait event type, say > RecoveryConflict, and set it for them. However, considering > back-patching to existing versions, adding new wait event type would > not be acceptable. So this patch sets existing wait events such as > PG_WAIT_LOCK to those 3 places and doesn't not set a wait event for > conflict resolution on dropping database because there is not an > appropriate existing one. I'll start a separate thread about > improvement on wait events of recovery conflict resolution for PG13 if > necessary. Attached a patch improves wait events of recovery conflict resolution. It's for PG13. I added new RecoveryConflict wait_event_type and some wait_event. This patch can be applied on top of two patches I already proposed. Regards, [1] https://www.postgresql.org/message-id/CA%2Bfd4k63ukOtdNx2f-fUZ2vuB3RgE%3DPo%2BxSnpmcPJbKqsJMtiA%40mail.gmail.com -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
pgsql-hackers by date: