Thread: Re: Trying to minimize the impact of checkpoints

Re: Trying to minimize the impact of checkpoints

From
jao@geophile.com
Date:
Quoting jao@geophile.com:

> When a checkpoint occurs, all operations slow way, way down.
> The attached spreadsheet (xls file, prepared in OO so unlikely
> to be dangerous) shows a run of a few hours, and the various spikes
> every 25-30 minutes seem consistent with checkpointing. The
> application is doing 1/2 reads and 1/2 inserts during this time.

Sorry for being cryptic in my message. The spreadsheet contains
iostat output.

Jack Orenstein

----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.

Re: Trying to minimize the impact of checkpoints

From
Jan Wieck
Date:
On 6/11/2004 2:02 PM, jao@geophile.com wrote:

> Quoting jao@geophile.com:
>
>> When a checkpoint occurs, all operations slow way, way down.
>> The attached spreadsheet (xls file, prepared in OO so unlikely
>> to be dangerous) shows a run of a few hours, and the various spikes
>> every 25-30 minutes seem consistent with checkpointing. The
>> application is doing 1/2 reads and 1/2 inserts during this time.
>
> Sorry for being cryptic in my message. The spreadsheet contains
> iostat output.

Did you try to play with the background writer config options (new in
7.5) at all?


Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #


Re: Trying to minimize the impact of checkpoints

From
Jack Orenstein
Date:
Jan Wieck wrote:
> On 6/11/2004 2:02 PM, jao@geophile.com wrote:
>
>> Quoting jao@geophile.com:
>>
>>> When a checkpoint occurs, all operations slow way, way down.
>>> The attached spreadsheet (xls file, prepared in OO so unlikely
>>> to be dangerous) shows a run of a few hours, and the various spikes
>>> every 25-30 minutes seem consistent with checkpointing. The
>>> application is doing 1/2 reads and 1/2 inserts during this time.
>>
>>
>> Sorry for being cryptic in my message. The spreadsheet contains
>> iostat output.
>
>
> Did you try to play with the background writer config options (new in
> 7.5) at all?

No, not yet. It looks like 7.5 won't be available in time for us to
use in the first release of our product, so I've been very focused
on how to best use 7.3-4.

Any opinions on the stability of 7.5 and the effectiveness of the background
writer in reducing variability in performance due to checkpoints?

Jack Orenstein


Re: Trying to minimize the impact of checkpoints

From
Jan Wieck
Date:
On 6/12/2004 1:44 PM, Jack Orenstein wrote:

> Jan Wieck wrote:
>> On 6/11/2004 2:02 PM, jao@geophile.com wrote:
>>
>>> Quoting jao@geophile.com:
>>>
>>>> When a checkpoint occurs, all operations slow way, way down.
>>>> The attached spreadsheet (xls file, prepared in OO so unlikely
>>>> to be dangerous) shows a run of a few hours, and the various spikes
>>>> every 25-30 minutes seem consistent with checkpointing. The
>>>> application is doing 1/2 reads and 1/2 inserts during this time.
>>>
>>>
>>> Sorry for being cryptic in my message. The spreadsheet contains
>>> iostat output.
>>
>>
>> Did you try to play with the background writer config options (new in
>> 7.5) at all?
>
> No, not yet. It looks like 7.5 won't be available in time for us to
> use in the first release of our product, so I've been very focused
> on how to best use 7.3-4.
>
> Any opinions on the stability of 7.5 and the effectiveness of the background
> writer in reducing variability in performance due to checkpoints?

I didn't save any of the charts done with 7.4, but the responsetime
spikes on checkpoints went up to 60 seconds without the bgwriter. If you
look at the last chart on this page

     http://developer.postgresql.org/~wieck/vacuum_cost/

there are no spikes at all.


Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #


Re: Trying to minimize the impact of checkpoints

From
Jack Orenstein
Date:
Jan Wieck wrote:
> On 6/12/2004 1:44 PM, Jack Orenstein wrote:
>
>> Any opinions on the stability of 7.5 and the effectiveness of the
>> background writer in reducing variability in performance due to checkpoints?
>
> I didn't save any of the charts done with 7.4, but the responsetime
> spikes on checkpoints went up to 60 seconds without the bgwriter. If you
> look at the last chart on this page
>
>     http://developer.postgresql.org/~wieck/vacuum_cost/
>
> there are no spikes at all.

This looks very promising. Our product should be installed by then. Do you know
what the process of migrating from 7.4.2 to 7.5 will be? Will it be simply a
software upgrade or will databases have to be modified in some way?

Jack Orenstein


Re: Trying to minimize the impact of checkpoints

From
Tom Lane
Date:
Jan Wieck <JanWieck@Yahoo.com> writes:
> I didn't save any of the charts done with 7.4, but the responsetime
> spikes on checkpoints went up to 60 seconds without the bgwriter. If you
> look at the last chart on this page
>      http://developer.postgresql.org/~wieck/vacuum_cost/
> there are no spikes at all.

I have been meaning to ask you to redo those charts with CVS tip, to see
how things work now that checkpoints use fsync() instead of sync().

There was talk earlier of providing an option to issue sync() before
starting the loop that issues fsync() against each file we've written
since the last checkpoint.  The idea was that the sync() would cue the
kernel to schedule I/O for all currently dirty buffers in the most
efficient order, and then the fsync()s would merely ensure that Postgres
waits until the I/O it needs is done.  This should be optional since it
would be a clear loser in systems where Postgres isn't the dominant
cause of disk write traffic (since the sync would force much unneeded
I/O).  But in a system that's dedicated to one Postgres installation it
seems like it might be a win, compared to doing just fsyncs which might
cause the I/O to be done in a globally non-optimal order.

On the other hand, if the bgwriter's trickle writes are getting the job
done then there shouldn't be all that much work to do at checkpoint
time, and so this might be all just theorizing with not much real-world
effect.

So, before troubling to create this option I'd like to see some
evidence that it'd actually be worthwhile.  Could you test it out?
The place to put the sync() call would be at the top of mdsync() in
storage/smgr/md.c.

            regards, tom lane

Re: Trying to minimize the impact of checkpoints

From
Jan Wieck
Date:
On 6/12/2004 3:44 PM, Jack Orenstein wrote:
> Jan Wieck wrote:
>> On 6/12/2004 1:44 PM, Jack Orenstein wrote:
>>
>>> Any opinions on the stability of 7.5 and the effectiveness of the
>>> background writer in reducing variability in performance due to checkpoints?
>>
>> I didn't save any of the charts done with 7.4, but the responsetime
>> spikes on checkpoints went up to 60 seconds without the bgwriter. If you
>> look at the last chart on this page
>>
>>     http://developer.postgresql.org/~wieck/vacuum_cost/
>>
>> there are no spikes at all.
>
> This looks very promising. Our product should be installed by then. Do you know
> what the process of migrating from 7.4.2 to 7.5 will be? Will it be simply a
> software upgrade or will databases have to be modified in some way?

As usual, dump and restore. Or (if you have backup hardware) you use the
Slony (http://gborg.postgresql.org/project/slony1/projdisplay.php)
replication systems switchover capability for upgrading.


Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #


Re: Trying to minimize the impact of checkpoints

From
Martijn van Oosterhout
Date:
On Sat, Jun 12, 2004 at 04:00:46PM -0400, Tom Lane wrote:
> There was talk earlier of providing an option to issue sync() before
> starting the loop that issues fsync() against each file we've written
> since the last checkpoint.  The idea was that the sync() would cue the
> kernel to schedule I/O for all currently dirty buffers in the most
> efficient order, and then the fsync()s would merely ensure that Postgres
> waits until the I/O it needs is done.  This should be optional since it

<snip>

Not a good idea on some systems. From the linux sync(2) manpage:

BUGS
       According to the standard specification (e.g., SVID), sync()
       schedules the writes, but may return before the actual writing
       is done.  However, since version 1.3.20 Linux does actually
       wait.  (This still does not guaran- tee data integrity: modern
       disks have large caches.)

So your fsyncs become no-ops instead. And I don't think we need a
discussion on whether this behaviour is correct or not, this is the way
it is, I don't know why.

I wonder if any other systems works this way...
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Attachment

Re: Trying to minimize the impact of checkpoints

From
Tom Lane
Date:
Martijn van Oosterhout <kleptog@svana.org> writes:
>> [sync before fsync]

> Not a good idea on some systems. From the linux sync(2) manpage:

> BUGS
>        According to the standard specification (e.g., SVID), sync()
>        schedules the writes, but may return before the actual writing
>        is done.  However, since version 1.3.20 Linux does actually
>        wait.

This is another reason why it would have to be optional: the win comes
only if the kernel adheres literally to the SVID specification for
sync(2).  I think all the BSDen do, and HPUX seems to, but there are
undoubtedly platforms that don't.

            regards, tom lane

Re: Trying to minimize the impact of checkpoints

From
Bruce Momjian
Date:
Added to TODO:

    *  Add an option to sync() before fsync()'ing checkpoint files

---------------------------------------------------------------------------

Tom Lane wrote:
> Jan Wieck <JanWieck@Yahoo.com> writes:
> > I didn't save any of the charts done with 7.4, but the responsetime
> > spikes on checkpoints went up to 60 seconds without the bgwriter. If you
> > look at the last chart on this page
> >      http://developer.postgresql.org/~wieck/vacuum_cost/
> > there are no spikes at all.
>
> I have been meaning to ask you to redo those charts with CVS tip, to see
> how things work now that checkpoints use fsync() instead of sync().
>
> There was talk earlier of providing an option to issue sync() before
> starting the loop that issues fsync() against each file we've written
> since the last checkpoint.  The idea was that the sync() would cue the
> kernel to schedule I/O for all currently dirty buffers in the most
> efficient order, and then the fsync()s would merely ensure that Postgres
> waits until the I/O it needs is done.  This should be optional since it
> would be a clear loser in systems where Postgres isn't the dominant
> cause of disk write traffic (since the sync would force much unneeded
> I/O).  But in a system that's dedicated to one Postgres installation it
> seems like it might be a win, compared to doing just fsyncs which might
> cause the I/O to be done in a globally non-optimal order.
>
> On the other hand, if the bgwriter's trickle writes are getting the job
> done then there shouldn't be all that much work to do at checkpoint
> time, and so this might be all just theorizing with not much real-world
> effect.
>
> So, before troubling to create this option I'd like to see some
> evidence that it'd actually be worthwhile.  Could you test it out?
> The place to put the sync() call would be at the top of mdsync() in
> storage/smgr/md.c.
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faqs/FAQ.html
>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073