Re: Some problems of recovery conflict wait events - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: Some problems of recovery conflict wait events |
Date | |
Msg-id | CA+fd4k4xoqXpD6V07Sfr_EDO+Rr0wQ6ma29wdDwXUcSFbbw1ZA@mail.gmail.com Whole thread Raw |
In response to | Re: Some problems of recovery conflict wait events (Fujii Masao <masao.fujii@oss.nttdata.com>) |
Responses |
Re: Some problems of recovery conflict wait events
|
List | pgsql-hackers |
On Mon, 9 Mar 2020 at 13:24, Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > > > On 2020/03/08 13:52, Masahiko Sawada wrote: > > On Thu, 5 Mar 2020 at 20:16, Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > >> > >> > >> > >> On 2020/03/05 16:58, Masahiko Sawada wrote: > >>> On Wed, 4 Mar 2020 at 15:21, Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > >>>> > >>>> > >>>> > >>>> On 2020/03/04 14:31, Masahiko Sawada wrote: > >>>>> On Wed, 4 Mar 2020 at 13:48, Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > >>>>>> > >>>>>> > >>>>>> > >>>>>> On 2020/03/04 13:27, Michael Paquier wrote: > >>>>>>> On Wed, Mar 04, 2020 at 01:13:19PM +0900, Masahiko Sawada wrote: > >>>>>>>> Yeah, so 0001 patch sets existing wait events to recovery conflict > >>>>>>>> resolution. For instance, it sets (PG_WAIT_LOCK | LOCKTAG_TRANSACTION) > >>>>>>>> to the recovery conflict on a snapshot. 0003 patch improves these wait > >>>>>>>> events by adding the new type of wait event such as > >>>>>>>> WAIT_EVENT_RECOVERY_CONFLICT_SNAPSHOT. Therefore 0001 (and 0002) patch > >>>>>>>> is the fix for existing versions and 0003 patch is an improvement for > >>>>>>>> only PG13. Did you mean even 0001 patch doesn't fit for back-patching? > >>>>>> > >>>>>> Yes, it looks like a improvement rather than bug fix. > >>>>>> > >>>>> > >>>>> Okay, understand. > >>>>> > >>>>>>> I got my eyes on this patch set. The full patch set is in my opinion > >>>>>>> just a set of improvements, and not bug fixes, so I would refrain from > >>>>>>> back-backpatching. > >>>>>> > >>>>>> I think that the issue (i.e., "waiting" is reported twice needlessly > >>>>>> in PS display) that 0002 patch tries to fix is a bug. So it should be > >>>>>> fixed even in the back branches. > >>>>> > >>>>> So we need only two patches: one fixes process title issue and another > >>>>> improve wait event. I've attached updated patches. > >>>> > >>>> Thanks for updating the patches! I started reading 0001 patch. > >>> > >>> Thank you for reviewing this patch. > >>> > >>>> > >>>> - /* > >>>> - * Report via ps if we have been waiting for more than 500 msec > >>>> - * (should that be configurable?) > >>>> - */ > >>>> - if (update_process_title && new_status == NULL && > >>>> - TimestampDifferenceExceeds(waitStart, GetCurrentTimestamp(), > >>>> - 500)) > >>>> > >>>> The patch changes ResolveRecoveryConflictWithSnapshot() and > >>>> ResolveRecoveryConflictWithTablespace() so that they always add > >>>> "waiting" into the PS display, whether wait is really necessary or not. > >>>> But isn't it better to display "waiting" in PS basically when wait is > >>>> necessary, like originally ResolveRecoveryConflictWithVirtualXIDs() > >>>> does as the above? > >>> > >>> You're right. Will fix it. > >>> > >>>> > >>>> ResolveRecoveryConflictWithDatabase(Oid dbid) > >>>> { > >>>> + char *new_status = NULL; > >>>> + > >>>> + /* Report via ps we are waiting */ > >>>> + new_status = set_process_title_waiting(); > >>>> > >>>> In ResolveRecoveryConflictWithDatabase(), there seems no need to > >>>> display "waiting" in PS because no wait occurs when recovery conflict > >>>> with database happens. > >>> > >>> Isn't the startup process waiting for other backend to terminate? > >> > >> Yeah, you're right. I agree that "waiting" should be reported in this case. > >> > >> Currently ResolveRecoveryConflictWithLock() and > >> ResolveRecoveryConflictWithBufferPin() don't call > >> ResolveRecoveryConflictWithVirtualXIDs and don't report "waiting" > >> in PS display. You changed them so that they report "waiting". I agree > >> to have this change. But this change is an improvement rather than > >> a bug fix, i.e., we should apply this change only in v13? > >> > > > > Did you mean ResolveRecoveryConflictWithDatabase and > > ResolveRecoveryConflictWithBufferPin? > > Yes! Sorry for my typo. > > > In the current code as far as I > > researched there are two cases where we don't add "waiting" and one > > case where we doubly add "waiting". > > > > ResolveRecoveryConflictWithDatabase and > > ResolveRecoveryConflictWithBufferPin don't update the ps title. > > Although the path where GetCurrentTimestamp() >= ltime is false in > > ResolveRecoveryConflictWithLock also doesn't update the ps title, it's > > already updated in WaitOnLock. On the other hand, the path where > > GetCurrentTimestamp() >= ltime is true in > > ResolveRecoveryConflictWithLock updates the ps title but it's wrong as > > I reported. > > > > I've split the patch into two patches: 0001 patch fixes the issue > > about doubly updating ps title, 0002 patch makes the recovery conflict > > resolution on database and buffer pin update the ps title. > > Thanks for splitting the patches. I think that 0001 patch can be back-patched. > > - /* > - * Report via ps if we have been waiting for more than 500 msec > - * (should that be configurable?) > - */ > - if (update_process_title && new_status == NULL && > - TimestampDifferenceExceeds(waitStart, GetCurrentTimestamp(), > - 500)) > > Originally, "waiting" is reported in PS if we've been waiting for more than > 500 msec, as the above does. But you got rid of those codes in the patch. > Did you confirm that it's safe to do that? If not, isn't it better to apply > the attached patch? In WaitOnLock() we update the ps title regardless of waiting time. So I thought we can change it to make these behavior consistent. But considering back-patch, your patch looks better than mine. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: