Re: Fix checkpoint skip logic on idle systems by tracking LSN progress - Mailing list pgsql-hackers

From Kyotaro HORIGUCHI
Subject Re: Fix checkpoint skip logic on idle systems by tracking LSN progress
Date
Msg-id 20160930.140015.150178454.horiguchi.kyotaro@lab.ntt.co.jp
Whole thread Raw
In response to Re: Fix checkpoint skip logic on idle systems by tracking LSN progress  (David Steele <david@pgmasters.net>)
Responses Re: Fix checkpoint skip logic on idle systems by tracking LSN progress
List pgsql-hackers
Sorry, I might have torn off this thread somehow..

At Thu, 29 Sep 2016 11:26:29 -0400, David Steele <david@pgmasters.net> wrote in
<30095aea-3910-dbb7-1790-a579fb93fa5e@pgmasters.net>
> On 9/28/16 10:32 PM, Michael Paquier wrote:
> > On Thu, Sep 29, 2016 at 7:45 AM, David Steele <david@pgmasters.net>
> > wrote:
> >>
> >> In general I agree with the other comments that this could end up
> >> being
> >> a problem.  On the other hand, since the additional locks are only
> >> taken
> >> at checkpoint or archive_timeout it may not be that big a deal.
> >
> > Yes, I did some tests on my laptop a couple of months back, that has 4
> > cores. After reducing NUM_XLOGINSERT_LOCKS from 8 to 4 to increase
> > contention and performing a bunch of INSERT using 4 clients on 4
> > different relations I could not catch a difference.. Autovacuum was
> > disabled to eliminate any noise. I tried checkpoint_segments at 30s to
> > see its effects, as well as larger values to see the impact with the
> > standby snapshot taken by the bgwriter. Other thoughts are welcome.
> 
> I don't have any better ideas than that.

I don't see no problem in setting progressAt in XLogInsertRecord.
But I doubt GetProgressRecPtr is harmful, especially when
NUM_XLOGINSERT_LOCKS is *large*. So reducing the number seems
rather alleviates the impact. But it actually doesn't seem so
harmful up to 8. (Even though I don't like the locking in
GetProgressRecPtr..)

Currently possiblly harmful calling of GetProgressRecPtr is that
in BackgroundWriterMain. It should be called with ther interval
BgWriterDelay, and anytime pgwriter recieved SIGUSR1. But I don't
see the issuer of SIGUSR1 of bgwriter..


> >> +++ b/src/backend/postmaster/checkpointer.c
> >> +                       /* OK, it's time to switch */
> >> +                       elog(LOG, "Request XLog Switch");
> >>
> >> LOG level seems a bit much here, perhaps DEBUG1?
> >
> > That's from Horiguchi-san's patch, and those would be definitely
> > better as DEBUG1 by looking at it. Now and in order to keep things
> > simple I think that we had better discard this patch for now. I was
> > planning to come back to this thing anyway once we are done with the
> > first problem.
> 
> I still see this:
> 
> +++ b/src/backend/postmaster/checkpointer.c
>          /* OK, it's time to switch */
> +        elog(LOG, "Request XLog Switch");
> 
> > Well for now attached are two patches, that could just be squashed
> > into one.

Mmmm. Sorry, this was for my quite private instant debug, spilt
outside.. But I don't mind to leave it with DEBUG2 if it seems
useful.

> Yes, I think that makes sense.
> 
> More importantly, there is a regression.  With your new patch the
> xlogs are switching on archive_timeout again even with no changes.
> The v12 worked fine.

As Michael mentioned in this or another thread, it is another
issue that he wants to solve separately. I personally doubt that
this patch (v11 and v13) can be evaluated alone without it, but
we can review this with the excessive switching problem, perhaps?

> The differences are all in 0002-hs-checkpoints-v12-2.patch and as far
> as I can see the patch does not work correctly without these changes.
> Am I missing something?

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center





pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Tracking wait event for latches
Next
From: Kyotaro HORIGUCHI
Date:
Subject: Re: Fix checkpoint skip logic on idle systems by tracking LSN progress