Re: Checkpoint sync pause - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Checkpoint sync pause
Date
Msg-id CA+TgmoaxbamSfWyA4s3-G1QOMMB4tP7vKhXRAmrAa2Ofms9Qvw@mail.gmail.com
Whole thread Raw
In response to Checkpoint sync pause  (Greg Smith <greg@2ndQuadrant.com>)
Responses Re: Checkpoint sync pause  (Greg Smith <greg@2ndQuadrant.com>)
List pgsql-hackers
On Mon, Jan 16, 2012 at 2:57 AM, Greg Smith <greg@2ndquadrant.com> wrote:
> ...
> 2012-01-16 02:39:01.184 EST [25052]: DEBUG:  checkpoint sync: number=34
> file=base/16385/11766 time=0.006 msec
> 2012-01-16 02:39:01.184 EST [25052]: DEBUG:  checkpoint sync delay: seconds
> left=3
> 2012-01-16 02:39:01.284 EST [25052]: DEBUG:  checkpoint sync delay: seconds
> left=2
> 2012-01-16 02:39:01.385 EST [25052]: DEBUG:  checkpoint sync delay: seconds
> left=1
> 2012-01-16 02:39:01.860 EST [25052]: DEBUG:  checkpoint sync: number=35
> file=global/12007 time=375.710 msec
> 2012-01-16 02:39:01.860 EST [25052]: DEBUG:  checkpoint sync delay: seconds
> left=3
> 2012-01-16 02:39:01.961 EST [25052]: DEBUG:  checkpoint sync delay: seconds
> left=2
> 2012-01-16 02:39:02.061 EST [25052]: DEBUG:  checkpoint sync delay: seconds
> left=1
> 2012-01-16 02:39:02.161 EST [25052]: DEBUG:  checkpoint sync: number=36
> file=base/16385/11754 time=0.008 msec
> 2012-01-16 02:39:02.555 EST [25052]: LOG:  checkpoint complete: wrote 2586
> buffers (63.1%); 1 transaction log file(s) added, 0 removed, 0 recycled;
> write=2.422 s, sync=13.282 s, total=16.123 s; sync files=36, longest=1.085
> s, average=0.040 s
>
> No docs yet, really need a better guide to tuning checkpoints as they exist
> now before there's a place to attach a discussion of this to.

Yeah, I think this is an area where a really good documentation patch
might help more users than any code we could write.  On the technical
end, I dislike this a little bit because the parameter is clearly
something some people are going to want to set, but it's not at all
clear what value they should set it to and it has complex interactions
with the other checkpoint settings - and the user's hardware
configuration.  If there's no way to make it more self-tuning, then
perhaps we should just live with that, but it would be nice to come up
with something more user-transparent.  Also, I am still struggling
with what the right benchmarking methodology even is to judge whether
any patch in this area "works".  Can you provide more details about
your test setup?

Just one random thought: I wonder if it would make sense to cap the
delay after each sync to the time spending performing that sync.  That
would make the tuning of the delay less sensitive to the total number
of files, because we won't unnecessarily wait after each sync when
they're not actually taking any time to complete.  It's probably
easier to estimate the number of segments that are likely to contain
lots of dirty data than to estimate the total number of segments that
you might have touched at least once since the last checkpoint, and
there's no particular reason to think the latter is really what you
should be tuning on anyway.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Standalone synchronous master
Next
From: Alvaro Herrera
Date:
Subject: Re: pgstat documentation tables