Thread: Re: Trying to minimize the impact of checkpoints
Quoting jao@geophile.com: > When a checkpoint occurs, all operations slow way, way down. > The attached spreadsheet (xls file, prepared in OO so unlikely > to be dangerous) shows a run of a few hours, and the various spikes > every 25-30 minutes seem consistent with checkpointing. The > application is doing 1/2 reads and 1/2 inserts during this time. Sorry for being cryptic in my message. The spreadsheet contains iostat output. Jack Orenstein ---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program.
On 6/11/2004 2:02 PM, jao@geophile.com wrote: > Quoting jao@geophile.com: > >> When a checkpoint occurs, all operations slow way, way down. >> The attached spreadsheet (xls file, prepared in OO so unlikely >> to be dangerous) shows a run of a few hours, and the various spikes >> every 25-30 minutes seem consistent with checkpointing. The >> application is doing 1/2 reads and 1/2 inserts during this time. > > Sorry for being cryptic in my message. The spreadsheet contains > iostat output. Did you try to play with the background writer config options (new in 7.5) at all? Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
Jan Wieck wrote: > On 6/11/2004 2:02 PM, jao@geophile.com wrote: > >> Quoting jao@geophile.com: >> >>> When a checkpoint occurs, all operations slow way, way down. >>> The attached spreadsheet (xls file, prepared in OO so unlikely >>> to be dangerous) shows a run of a few hours, and the various spikes >>> every 25-30 minutes seem consistent with checkpointing. The >>> application is doing 1/2 reads and 1/2 inserts during this time. >> >> >> Sorry for being cryptic in my message. The spreadsheet contains >> iostat output. > > > Did you try to play with the background writer config options (new in > 7.5) at all? No, not yet. It looks like 7.5 won't be available in time for us to use in the first release of our product, so I've been very focused on how to best use 7.3-4. Any opinions on the stability of 7.5 and the effectiveness of the background writer in reducing variability in performance due to checkpoints? Jack Orenstein
On 6/12/2004 1:44 PM, Jack Orenstein wrote: > Jan Wieck wrote: >> On 6/11/2004 2:02 PM, jao@geophile.com wrote: >> >>> Quoting jao@geophile.com: >>> >>>> When a checkpoint occurs, all operations slow way, way down. >>>> The attached spreadsheet (xls file, prepared in OO so unlikely >>>> to be dangerous) shows a run of a few hours, and the various spikes >>>> every 25-30 minutes seem consistent with checkpointing. The >>>> application is doing 1/2 reads and 1/2 inserts during this time. >>> >>> >>> Sorry for being cryptic in my message. The spreadsheet contains >>> iostat output. >> >> >> Did you try to play with the background writer config options (new in >> 7.5) at all? > > No, not yet. It looks like 7.5 won't be available in time for us to > use in the first release of our product, so I've been very focused > on how to best use 7.3-4. > > Any opinions on the stability of 7.5 and the effectiveness of the background > writer in reducing variability in performance due to checkpoints? I didn't save any of the charts done with 7.4, but the responsetime spikes on checkpoints went up to 60 seconds without the bgwriter. If you look at the last chart on this page http://developer.postgresql.org/~wieck/vacuum_cost/ there are no spikes at all. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
Jan Wieck wrote: > On 6/12/2004 1:44 PM, Jack Orenstein wrote: > >> Any opinions on the stability of 7.5 and the effectiveness of the >> background writer in reducing variability in performance due to checkpoints? > > I didn't save any of the charts done with 7.4, but the responsetime > spikes on checkpoints went up to 60 seconds without the bgwriter. If you > look at the last chart on this page > > http://developer.postgresql.org/~wieck/vacuum_cost/ > > there are no spikes at all. This looks very promising. Our product should be installed by then. Do you know what the process of migrating from 7.4.2 to 7.5 will be? Will it be simply a software upgrade or will databases have to be modified in some way? Jack Orenstein
Jan Wieck <JanWieck@Yahoo.com> writes: > I didn't save any of the charts done with 7.4, but the responsetime > spikes on checkpoints went up to 60 seconds without the bgwriter. If you > look at the last chart on this page > http://developer.postgresql.org/~wieck/vacuum_cost/ > there are no spikes at all. I have been meaning to ask you to redo those charts with CVS tip, to see how things work now that checkpoints use fsync() instead of sync(). There was talk earlier of providing an option to issue sync() before starting the loop that issues fsync() against each file we've written since the last checkpoint. The idea was that the sync() would cue the kernel to schedule I/O for all currently dirty buffers in the most efficient order, and then the fsync()s would merely ensure that Postgres waits until the I/O it needs is done. This should be optional since it would be a clear loser in systems where Postgres isn't the dominant cause of disk write traffic (since the sync would force much unneeded I/O). But in a system that's dedicated to one Postgres installation it seems like it might be a win, compared to doing just fsyncs which might cause the I/O to be done in a globally non-optimal order. On the other hand, if the bgwriter's trickle writes are getting the job done then there shouldn't be all that much work to do at checkpoint time, and so this might be all just theorizing with not much real-world effect. So, before troubling to create this option I'd like to see some evidence that it'd actually be worthwhile. Could you test it out? The place to put the sync() call would be at the top of mdsync() in storage/smgr/md.c. regards, tom lane
On 6/12/2004 3:44 PM, Jack Orenstein wrote: > Jan Wieck wrote: >> On 6/12/2004 1:44 PM, Jack Orenstein wrote: >> >>> Any opinions on the stability of 7.5 and the effectiveness of the >>> background writer in reducing variability in performance due to checkpoints? >> >> I didn't save any of the charts done with 7.4, but the responsetime >> spikes on checkpoints went up to 60 seconds without the bgwriter. If you >> look at the last chart on this page >> >> http://developer.postgresql.org/~wieck/vacuum_cost/ >> >> there are no spikes at all. > > This looks very promising. Our product should be installed by then. Do you know > what the process of migrating from 7.4.2 to 7.5 will be? Will it be simply a > software upgrade or will databases have to be modified in some way? As usual, dump and restore. Or (if you have backup hardware) you use the Slony (http://gborg.postgresql.org/project/slony1/projdisplay.php) replication systems switchover capability for upgrading. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
On Sat, Jun 12, 2004 at 04:00:46PM -0400, Tom Lane wrote: > There was talk earlier of providing an option to issue sync() before > starting the loop that issues fsync() against each file we've written > since the last checkpoint. The idea was that the sync() would cue the > kernel to schedule I/O for all currently dirty buffers in the most > efficient order, and then the fsync()s would merely ensure that Postgres > waits until the I/O it needs is done. This should be optional since it <snip> Not a good idea on some systems. From the linux sync(2) manpage: BUGS According to the standard specification (e.g., SVID), sync() schedules the writes, but may return before the actual writing is done. However, since version 1.3.20 Linux does actually wait. (This still does not guaran- tee data integrity: modern disks have large caches.) So your fsyncs become no-ops instead. And I don't think we need a discussion on whether this behaviour is correct or not, this is the way it is, I don't know why. I wonder if any other systems works this way... -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Attachment
Martijn van Oosterhout <kleptog@svana.org> writes: >> [sync before fsync] > Not a good idea on some systems. From the linux sync(2) manpage: > BUGS > According to the standard specification (e.g., SVID), sync() > schedules the writes, but may return before the actual writing > is done. However, since version 1.3.20 Linux does actually > wait. This is another reason why it would have to be optional: the win comes only if the kernel adheres literally to the SVID specification for sync(2). I think all the BSDen do, and HPUX seems to, but there are undoubtedly platforms that don't. regards, tom lane
Added to TODO: * Add an option to sync() before fsync()'ing checkpoint files --------------------------------------------------------------------------- Tom Lane wrote: > Jan Wieck <JanWieck@Yahoo.com> writes: > > I didn't save any of the charts done with 7.4, but the responsetime > > spikes on checkpoints went up to 60 seconds without the bgwriter. If you > > look at the last chart on this page > > http://developer.postgresql.org/~wieck/vacuum_cost/ > > there are no spikes at all. > > I have been meaning to ask you to redo those charts with CVS tip, to see > how things work now that checkpoints use fsync() instead of sync(). > > There was talk earlier of providing an option to issue sync() before > starting the loop that issues fsync() against each file we've written > since the last checkpoint. The idea was that the sync() would cue the > kernel to schedule I/O for all currently dirty buffers in the most > efficient order, and then the fsync()s would merely ensure that Postgres > waits until the I/O it needs is done. This should be optional since it > would be a clear loser in systems where Postgres isn't the dominant > cause of disk write traffic (since the sync would force much unneeded > I/O). But in a system that's dedicated to one Postgres installation it > seems like it might be a win, compared to doing just fsyncs which might > cause the I/O to be done in a globally non-optimal order. > > On the other hand, if the bgwriter's trickle writes are getting the job > done then there shouldn't be all that much work to do at checkpoint > time, and so this might be all just theorizing with not much real-world > effect. > > So, before troubling to create this option I'd like to see some > evidence that it'd actually be worthwhile. Could you test it out? > The place to put the sync() call would be at the top of mdsync() in > storage/smgr/md.c. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faqs/FAQ.html > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073