Thread: Checkpoint Tuning Question
I'm running a steady state test where I am pushing about 600 queries per second through a Posgres 8.3 system on an 8 CPU Linux system. It's a mix of inserts, updates, and deletes on a few tables - the two biggest ones probably have about 200,000 rows. Harddrive is just a simple, run-of-the-mill desktop drive. Here are parameters that I have changed from defaults: shared_buffers =100MB synchronous_commit=off And, after noting complaints in the log file about checkpoint intervals, I set: checkpoint_segments=10 Then I turned on slow query logging for queries that take more than 1 second, and checkpoint logging. Typically, I see no slow queries. The system handles the load just fine. Once in a while, I'll see a query that takes 3 - 5 seconds. However, once the checkpoint process begins, I get a whole flood of queries that take between 1 and 10 seconds to complete. My throughput crashes to near nothing. The checkpoint takes between 45 seconds and a minute to complete. After the checkpoint completes - the system returns to having very few slow queries, and the keeps up with the load fine. Is there anything I can do to prevent the occasional slow query? Is there anything I can do to prevent (or minimize) the performance impact of the checkpoint? Thanks, Dan
Dan Armbrust <daniel.armbrust.list@gmail.com> writes: > However, once the checkpoint process begins, I get a whole flood of > queries that take between 1 and 10 seconds to complete. My throughput > crashes to near nothing. The checkpoint takes between 45 seconds and > a minute to complete. You sure this is 8.3? It should spread out checkpoints over a couple of minutes by default. [thinks...] Maybe you need to increase checkpoint_segments some more. If it's forcing the checkpoint in order to hold down the number of WAL segments used up, that would explain a fast checkpoint. regards, tom lane
On Wed, Jul 8, 2009 at 12:50 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > Dan Armbrust <daniel.armbrust.list@gmail.com> writes: >> However, once the checkpoint process begins, I get a whole flood of >> queries that take between 1 and 10 seconds to complete. My throughput >> crashes to near nothing. The checkpoint takes between 45 seconds and >> a minute to complete. > > You sure this is 8.3? It should spread out checkpoints over a couple of > minutes by default. [thinks...] Maybe you need to increase > checkpoint_segments some more. If it's forcing the checkpoint in order > to hold down the number of WAL segments used up, that would explain a > fast checkpoint. > > regards, tom lane > Just checked - currently running 8.3.4 on the system I'm testing on. With checkpoint_segments set to 10, the checkpoints appear to be happening due to checkpoint_timeout - which I've left at the default of 5 minutes. If I double my test load, I end up with checkpoints happening about every 4 minutes, with the log message that I read to indicate that it used up all 10 segments. So not much I can do to keep the checkpoint process from causing a burst of slow queries? Thanks, Dan
On Wed, Jul 8, 2009 at 1:23 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > Dan Armbrust <daniel.armbrust.list@gmail.com> writes: >> With checkpoint_segments set to 10, the checkpoints appear to be >> happening due to checkpoint_timeout - which I've left at the default >> of 5 minutes. > > Well, you could increase both those settings so as to put the > checkpoints further apart, and/or increase checkpoint_completion_target > to spread the checkpoint I/O over a larger fraction of the cycle. > > regards, tom lane > Sorry, didn't mean to get off list. Wouldn't increasing the length between checkpoints result in the checkpoint process taking even longer to complete? The way my system processes and buffers incoming data, having infrequent (but long and disruptive) checkpoints is bad, since it causes the throughput to suffer so bad - my buffers can't hold the flood, and I have to drop data. If I can reduce the impact of the checkpoints, and have them occur more frequently, they my buffers should be able to hold the incoming data during the short durations that I have slow queries. I'll go experiment with checkpoint_completion_target. Thanks, Dan
Dan Armbrust <daniel.armbrust.list@gmail.com> writes: > On Wed, Jul 8, 2009 at 1:23 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >> Well, you could increase both those settings so as to put the >> checkpoints further apart, and/or increase checkpoint_completion_target >> to spread the checkpoint I/O over a larger fraction of the cycle. > Wouldn't increasing the length between checkpoints result in the > checkpoint process taking even longer to complete? You don't really care how long it takes. What you want is for it not to be chewing a bigger fraction of your I/O bandwidth than you can spare. Hence, you want it to take longer. Trying to shorten it is just going to make the spike worse. regards, tom lane
>> Wouldn't increasing the length between checkpoints result in the >> checkpoint process taking even longer to complete? > > You don't really care how long it takes. What you want is for it not to > be chewing a bigger fraction of your I/O bandwidth than you can spare. > Hence, you want it to take longer. Trying to shorten it is just going > to make the spike worse. > > regards, tom lane > I bumped the segments up to 15, and the timeout up to 10 minutes, and changed the completion target to .7. What I observe now is that I get a short (1-2 second) period where I get slow queries - I'm running about 30 queries in parallel at any given time - it appears that all 30 queries get paused for a couple of seconds at the moment that a checkpoint begins. However, after the initial slowdown, I don't get any additional slow queries logged while the checkpoint process runs. My takeaway is that starting the checkpoint process is really expensive - so I don't want to start it very frequently. And the only downside to longer intervals between checkpoints is a longer recovery time if the system crashes? Thanks, Dan
On Wed, 8 Jul 2009, Dan Armbrust wrote: > With checkpoint_segments set to 10, the checkpoints appear to be > happening due to checkpoint_timeout - which I've left at the default > of 5 minutes. OK, then that's as far upwards as you probably need to tweak that for your workload, even though most systems tuned for performance need 30+ instead. If what you're getting nailed with is the pause during the initial checkpoint processing, the only real option you have on the database side is to lower shared_buffers. You might get some improvement playing with operating system tunables too, to make it buffer less data and write more aggressively instead. The idea is that when the checkpoint starts, there should be little else buffered already. On Linux this is best accomplished with tuning dirty_background_ratio downward for example. There's more on this subject than you probably ever wanted to know at http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm if you haven't seen that already. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
On Wed, 8 Jul 2009, Dan Armbrust wrote: > My takeaway is that starting the checkpoint process is really > expensive - so I don't want to start it very frequently. And the only > downside to longer intervals between checkpoints is a longer recovery > time if the system crashes? And additional disk space wasted in hold the write-ahead logs. You're moving in the right direction here, the less checkpoints the better as long as you can stand the recovery time. What you'll discover if you bump checkpoint_segments up high enough is that you have to lengthen the test run you're trying, because eventually you'll reach a point where there are none of them happening during some test runs. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
Greg Smith <gsmith@gregsmith.com> writes: On Wed, 8 Jul 2009, Dan Armbrust wrote: >> What I observe now is that I get a short (1-2 second) period where I >> get slow queries - I'm running about 30 queries in parallel at any >> given time - it appears that all 30 queries get paused for a couple of >> seconds at the moment that a checkpoint begins. However, after the >> initial slowdown, I don't get any additional slow queries logged while >> the checkpoint process runs. > If what you're getting nailed with is the pause during the initial > checkpoint processing, the only real option you have on the database side > is to lower shared_buffers. He's only got 100MB of shared buffers, which doesn't seem like much considering it's apparently a fairly beefy system. I definitely don't see how one CPU spinning over the buffer headers in BufferSync is going to create the sort of hiccup he's describing. Dan, are you sure that this hiccup is happening at the *start* of a checkpoint? Do you have log_checkpoints turned on, and if so what does it show? regards, tom lane
Tom Lane wrote: > He's only got 100MB of shared buffers, which doesn't seem like much > considering it's apparently a fairly beefy system. a beefy system with... > Harddrive is just a simple, run-of-the-mill desktop drive. which is going to severely limit random write throughput....
John R Pierce <pierce@hogranch.com> writes: > a beefy system with... >> Harddrive is just a simple, run-of-the-mill desktop drive. > which is going to severely limit random write throughput.... True, which is why he's having to flail so hard to keep the checkpoint from saturating his I/O. However, the latest report says that he managed that, and yet there's still a one-or-two-second transient of some sort. I'm wondering what's causing that. If it were at the *end* of the checkpoint, it might be the disk again (failing to handle a bunch of fsyncs, perhaps). But if it really is at the *start* then there's something else odd happening. regards, tom lane
On Wed, 8 Jul 2009, Tom Lane wrote: > He's only got 100MB of shared buffers, which doesn't seem like much > considering it's apparently a fairly beefy system. I definitely > don't see how one CPU spinning over the buffer headers in BufferSync > is going to create the sort of hiccup he's describing. Agreed, it doesn't seem like a likely cause. If the problem reduces in magnitude in proportion with the size of the buffer cache, we might have to accept that's it's true regardless; that's why I was curious to see what impact that had on the test results. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
> However, the latest report says that he > managed that, and yet there's still a one-or-two-second transient of > some sort. I'm wondering what's causing that. If it were at the *end* > of the checkpoint, it might be the disk again (failing to handle a bunch > of fsyncs, perhaps). But if it really is at the *start* then there's > something else odd happening. > > regards, tom lane > Log output during heavy load - checkpoint logging on, and slow query logging on for queries that take longer than 1 second. Blank space inserted to show interesting bits.... Almost all of the slow query log messages are logged within about 3 seconds of the checkpoint starting message. LOG: checkpoint starting: xlog LOG: duration: 1101.419 ms execute <unnamed>: select dynamichos0_.ethernetmacaddr as ethernet1_0_, dynamichos0_.ipaddr as ipaddr0_, dynamichos0_.cpemac as cpemac0_, dynamichos0_.last_updated as last3_0_, dynamichos0_.leasetime as leasetime0_, dynamichos0_.regtime as regtime0_ from iphost dynamichos0_, cpe cpe1_ where dynamichos0_.cpemac=cpe1_.cpemac and 1=1 and dynamichos0_.ethernetmacaddr=$1 and dynamichos0_.cpemac=$2 and cpe1_.regBaseId=$3 and dynamichos0_.ipaddr<>$4 DETAIL: parameters: $1 = '01:01:01:56:01:7F', $2 = '00:55:00:82', $3 = '01:01:01:56', $4 = '000.000.000.000' LOG: duration: 1101.422 ms execute <unnamed>: insert into iphost (cpemac, last_updated, leasetime, regtime, ethernetmacaddr, ipaddr) values ($1, $2, $3, $4, $5, $6) DETAIL: parameters: $1 = '01:AA:00:A0', $2 = '2009-07-08 15:33:20.673', $3 = '2009-07-08 21:06:40.67', $4 = '2009-07-08 15:33:20.67', $5 = '01:01:02:AB:01:9D', $6 = '2.171.156.0' <snip about 200 lines of similar messages> LOG: duration: 1501.905 ms execute <unnamed>: select dynamichos0_.ethernetmacaddr as ethernet1_0_, dynamichos0_.ipaddr as ipaddr0_, dynamichos0_.cpemac as cpemac0_, dynamichos0_.last_updated as last3_0_, dynamichos0_.leasetime as leasetime0_, dynamichos0_.regtime as regtime0_ from iphost dynamichos0_, cpe cpe1_ where dynamichos0_.cpemac=cpe1_.cpemac and 1=1 and dynamichos0_.ethernetmacaddr=$1 and dynamichos0_.cpemac=$2 and cpe1_.regBaseId=$3 and dynamichos0_.ipaddr<>$4 DETAIL: parameters: $1 = '01:01:01:C3:01:8B', $2 = '00:C2:00:8E', $3 = '01:01:01:C3', $4 = '000.000.000.000' LOG: checkpoint complete: wrote 9975 buffers (77.9%); 0 transaction log file(s) added, 0 removed, 15 recycled; write=156.576 s, sync=0.065 s, total=156.662 s LOG: checkpoint starting: xlog LOG: duration: 1104.780 ms execute <unnamed>: delete from iphost where ethernetmacaddr=$1 and ipaddr=$2 and last_updated=$3 DETAIL: parameters: $1 = '01:01:01:33:01:AA', $2 = '1.50.169.0', $3 = '2009-07-08 15:32:57.131' LOG: duration: 1106.499 ms execute <unnamed>: select cpe0_.cpemac as cpemac2_0_, cpe0_.changeTime as changeTime2_0_, cpe0_.comment as comment2_0_, cpe0_.configuration as configur4_2_0_, cpe0_.cpeconfigid as cpeconf17_2_0_, cpe0_.cpefilterid as cpefilt18_2_0_, cpe0_.endTime as endTime2_0_, cpe0_.ispId as ispId2_0_, cpe0_.last_updated as last7_2_0_, cpe0_.reglocationid as regloca19_2_0_, cpe0_.modelId as modelId2_0_, cpe0_.numberOfHosts as numberOf9_2_0_, cpe0_.regBaseId as regBaseId2_0_, cpe0_.regTime as regTime2_0_, cpe0_.roamAllowed as roamAll12_2_0_, cpe0_.serialNumber as serialN13_2_0_, cpe0_.slaid as slaid2_0_, cpe0_.enable as enable2_0_, cpe0_.staticip as staticip2_0_, cpe0_.subscriberid as subscri21_2_0_, cpe0_.swVersion as swVersion2_0_, cpe0_.vlanid as vlanid2_0_, cpe0_.voipid as voipid2_0_ from cpe cpe0_ where cpe0_.cpemac=$1 DETAIL: parameters: $1 = '00:84:00:37' <snip about 300 lines of similar messages> LOG: duration: 1205.828 ms execute <unnamed>: insert into iphost (cpemac, last_updated, leasetime, regtime, ethernetmacaddr, ipaddr) values ($1, $2, $3, $4, $5, $6) DETAIL: parameters: $1 = '02:31:00:25', $2 = '2009-07-08 15:39:53.718', $3 = '2009-07-08 21:13:13.715', $4 = '2009-07-08 15:39:53.715', $5 = '01:01:03:32:01:22', $6 = '3.51.33.0' LOG: duration: 1203.287 ms execute <unnamed>: insert into iphost (cpemac, last_updated, leasetime, regtime, ethernetmacaddr, ipaddr) values ($1, $2, $3, $4, $5, $6) DETAIL: parameters: $1 = '03:47:00:81', $2 = '2009-07-08 15:39:53.72', $3 = '2009-07-08 21:13:13.717', $4 = '2009-07-08 15:39:53.717', $5 = '01:01:04:48:01:7E', $6 = '4.74.125.0' LOG: duration: 1201.480 ms execute <unnamed>: insert into iphost (cpemac, last_updated, leasetime, regtime, ethernetmacaddr, ipaddr) values ($1, $2, $3, $4, $5, $6) DETAIL: parameters: $1 = '01:12:00:7F', $2 = '2009-07-08 15:39:53.725', $3 = '2009-07-08 21:13:13.721', $4 = '2009-07-08 15:39:53.721', $5 = '01:01:02:13:01:7C', $6 = '2.19.123.0' LOG: checkpoint complete: wrote 9794 buffers (76.5%); 0 transaction log file(s) added, 0 removed, 15 recycled; write=148.084 s, sync=0.062 s, total=148.172 s
Dan Armbrust <daniel.armbrust.list@gmail.com> writes: > Almost all of the slow query log messages are logged within about 3 > seconds of the checkpoint starting message. > LOG: checkpoint complete: wrote 9975 buffers (77.9%); 0 transaction > log file(s) added, 0 removed, 15 recycled; write=156.576 s, sync=0.065 > s, total=156.662 s Huh. And there's just about no daylight between the total checkpoint time and the write+sync time (barely more than 20msec in both examples). So that seems to wipe out the thought I had that maybe we'd underestimated the work involved in one of the other steps of checkpoint. As Greg commented upthread, we seem to be getting forced to the conclusion that the initial buffer scan in BufferSync() is somehow causing this. There are a couple of things it'd be useful to try here: * see how the size of the hiccup varies with shared_buffers; * try inserting a delay into that scan loop, as per attached quick-and-dirty patch. (Numbers pulled from the air, but we can worry about tuning after we see if this is really where the problem is.) regards, tom lane *** src/backend/storage/buffer/bufmgr.c~ Tue Jan 1 14:45:51 2008 --- src/backend/storage/buffer/bufmgr.c Wed Jul 8 18:12:49 2009 *************** *** 1037,1042 **** --- 1037,1045 ---- } UnlockBufHdr(bufHdr); + + if ((buf_id % 256) == 0) + pg_usleep(1000L); } if (num_to_write == 0)
> As Greg commented upthread, we seem to be getting forced to the > conclusion that the initial buffer scan in BufferSync() is somehow > causing this. There are a couple of things it'd be useful to try > here: > > * see how the size of the hiccup varies with shared_buffers; I tried decreasing shared buffers - both 25MB and 50MB were too small for my load - I had slow queries at all times. So then I increased it from what I was using - 100MB - to 500MB - and the hiccup roughly doubles in length. At 100MB, the hiccup is about 2-3 seconds long. At 500MB, the hiccup is about 6 seconds long. > > * try inserting a delay into that scan loop, as per attached > quick-and-dirty patch. (Numbers pulled from the air, but > we can worry about tuning after we see if this is really > where the problem is.) > After finally getting this particular system into a state where I could build postgres (I was using the binary install) I built a 8.3.4, using your patch - but I didn't see any change in the behaviour. I see hiccups that appear to be the same length as I saw on the binary build of 8.3.4. Thanks, Dan
On Wed, 2009-07-08 at 18:22 -0400, Tom Lane wrote: > As Greg commented upthread, we seem to be getting forced to the > conclusion that the initial buffer scan in BufferSync() is somehow > causing this. There are a couple of things it'd be useful to try > here: Not sure why you're forced to that conclusion? ISTM more likely to be a problem with checkpointing clog or subtrans. That would block everybody and the scale of the problem is about right. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support
Simon Riggs <simon@2ndQuadrant.com> writes: > ISTM more likely to be a problem with checkpointing clog or subtrans. > That would block everybody and the scale of the problem is about right. That's what I had been thinking too, but the log_checkpoint output conclusively disproves it: those steps are taking less than 20msec. regards, tom lane
On Fri, 2009-07-10 at 10:27 -0400, Tom Lane wrote: > Simon Riggs <simon@2ndQuadrant.com> writes: > > ISTM more likely to be a problem with checkpointing clog or subtrans. > > That would block everybody and the scale of the problem is about right. > > That's what I had been thinking too, but the log_checkpoint output > conclusively disproves it: those steps are taking less than 20msec. OK, I was looking at total -write, not total - write - sync. I think its a traffic jam. After checkpoint in XLogInsert(), we discover that we now have to backup a block that we didn't think so previously. So we have to drop the lock and then re-access WALInsertLock. So every backend has to go through the queue twice the first time it tries to write WAL immediately after a checkpoint. Also, suddenly, every block needs to be copied to WAL, so the CRC checks make each lock holder take longer than normal, so the whole queue begins to backup. Then, because of wal_buffers being small we find that the increased volume of WAL being written causes WALInsertLock to be held across I/O. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support
Simon Riggs <simon@2ndQuadrant.com> writes: > I think its a traffic jam. > After checkpoint in XLogInsert(), we discover that we now have to backup > a block that we didn't think so previously. So we have to drop the lock > and then re-access WALInsertLock. So every backend has to go through the > queue twice the first time it tries to write WAL immediately after a > checkpoint. Also, suddenly, every block needs to be copied to WAL, so > the CRC checks make each lock holder take longer than normal, so the > whole queue begins to backup. Then, because of wal_buffers being small > we find that the increased volume of WAL being written causes > WALInsertLock to be held across I/O. Hm, I'm not sure I believe any of that except the last bit, seeing that he's got plenty of excess CPU capability. But the last bit fits with the wimpy-I/O problem, and it also offers something we could test. Dan, please see what happens when you vary the wal_buffers setting. (Note you need a postmaster restart to change that.) regards, tom lane
> Hm, I'm not sure I believe any of that except the last bit, seeing that > he's got plenty of excess CPU capability. But the last bit fits with > the wimpy-I/O problem, and it also offers something we could test. > Dan, please see what happens when you vary the wal_buffers setting. > (Note you need a postmaster restart to change that.) > > regards, tom lane > Ok, I tried a few different values - 32kb, 64kb, 512kb, 2MB and 10MB. I'm not seeing any highly noticeable change in behaviour with any setting - it wasn't a scientific test, but I seem to have about the same size hiccup with each setting. The hiccup may be slightly shorter with the 10MB setting, but barely, if it is. Thanks, Dan
On Fri, 2009-07-10 at 14:25 -0500, Dan Armbrust wrote: > > Hm, I'm not sure I believe any of that except the last bit, seeing that > > he's got plenty of excess CPU capability. But the last bit fits with > > the wimpy-I/O problem, and it also offers something we could test. > > Dan, please see what happens when you vary the wal_buffers setting. > > (Note you need a postmaster restart to change that.) > > > Ok, I tried a few different values - 32kb, 64kb, 512kb, 2MB and 10MB. > > I'm not seeing any highly noticeable change in behaviour with any > setting - it wasn't a scientific test, but I seem to have about the > same size hiccup with each setting. The hiccup may be slightly > shorter with the 10MB setting, but barely, if it is. OK, so its not the "last bit". Let's look at the first idea again: In more detail, the explanation relates to a section of code in XLogInsert() in xlog.c that has a comment /* * Check to see if my RedoRecPtr is out of date. If so, may have to go * back and recompute everything. This can only happen just after a * checkpoint, so it's better to be slow in this case and fast otherwise. * * If we aren't doing full-page writes then RedoRecPtr doesn't actually * affect the contents of the XLOG record, so we'll update our local copy * but not force a recomputation. */ This causes us to queue for the WALInsertLock twice at exactly the time when every caller needs to calculate the CRC for complete blocks. So we queue twice when the lock-hold-time is consistently high, causing queue lengths to go ballistic. That also means that XLogFlush never achieves a piggy-back flush, so WALWriteLock hold times go up also. I bet that takes a significant amount of time to settle down once initiated, probably a function of the write transaction ratio and number of frequently accessed blocks. We haven't got *any* measurements that show how long traffic jams take to clear and I think we need them. Also think piggy-back writes works OK, but stops working when work begins to queue, which will cause a "stall" catastrophe, like in aerodynamics. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support
Simon Riggs <simon@2ndQuadrant.com> writes: > This causes us to queue for the WALInsertLock twice at exactly the time > when every caller needs to calculate the CRC for complete blocks. So we > queue twice when the lock-hold-time is consistently high, causing queue > lengths to go ballistic. You keep saying that, and it keeps not being true, because the CRC calculation is *not* done while holding the lock. It is true that the very first XLogInsert call in each backend after a checkpoint starts will have to go back and redo its CRC calculation, but that's a one-time waste of CPU. It's hard to see how it could have continuing effects over several seconds, especially in a system that has CPU to spare. What I think might be the cause is that just after a checkpoint starts, quite a large proportion of XLogInserts will include full-page buffer copies, thus leading to an overall higher rate of WAL creation. That means longer hold times for WALInsertLock due to spending more time copying data into the WAL buffers, and it also means more WAL that has to be synced to disk before a transaction can commit. I'm still convinced that Dan's problem ultimately comes down to inadequate disk bandwidth, so I think the latter point is probably the key. So this thought leads to a couple of other things Dan could test. First, see if turning off full_page_writes makes the hiccup go away. If so, we know the problem is in this area (though still not exactly which reason); if not we need another idea. That's not a good permanent fix though, since it reduces crash safety. The other knobs to experiment with are synchronous_commit and wal_sync_method. If the stalls are due to commits waiting for additional xlog to get written, then async commit should stop them. I'm not sure if changing wal_sync_method can help, but it'd be worth experimenting with. regards, tom lane
On Sun, 2009-07-12 at 13:10 -0400, Tom Lane wrote: > It's hard to see how it could have continuing effects over several > seconds, especially in a system that has CPU to spare. Any queueing situation takes a while to resolve and over-damped systems can take a long time to resolve themselves. We simply don't know anything at all about how well damped the system is. Regrettably physicists spend a lot of time looking at oscillating systems... http://en.wikipedia.org/wiki/Vibration so my viewpoint is that the system is much more dynamic than we have yet considered it to be. You've not commented on the double-take on the WALInsertLock which seems to be the initiating factor here. I think multiple CPU systems may simply exacerbate the queueing problem and spare CPU is usually an indication of contention. As ever, this is conjecture to attempt to explain the problem, not firm black and white knowledge. Anyway, I agree that turning synchronous_commit = off and setting full_page_writes = off will get us closer here. Also, I'm very happy that the delay is only 1 sec. We had much greater effects two years ago, so we've definitely moved forwards. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support
> So this thought leads to a couple of other things Dan could test. > First, see if turning off full_page_writes makes the hiccup go away. > If so, we know the problem is in this area (though still not exactly > which reason); if not we need another idea. That's not a good permanent > fix though, since it reduces crash safety. The other knobs to > experiment with are synchronous_commit and wal_sync_method. If the > stalls are due to commits waiting for additional xlog to get written, > then async commit should stop them. I'm not sure if changing > wal_sync_method can help, but it'd be worth experimenting with. > > regards, tom lane > All of my testing to date has been done with synchronous_commit=off I just tried setting full_page_writes=off - and like magic, the entire hiccup went away. Thanks, Dan
On Mon, 2009-07-13 at 15:53 -0500, Dan Armbrust wrote: > > So this thought leads to a couple of other things Dan could test. > > First, see if turning off full_page_writes makes the hiccup go away. > > If so, we know the problem is in this area (though still not exactly > > which reason); if not we need another idea. That's not a good permanent > > fix though, since it reduces crash safety. The other knobs to > > experiment with are synchronous_commit and wal_sync_method. If the > > stalls are due to commits waiting for additional xlog to get written, > > then async commit should stop them. I'm not sure if changing > > wal_sync_method can help, but it'd be worth experimenting with. > > > All of my testing to date has been done with synchronous_commit=off > > I just tried setting full_page_writes=off - and like magic, the entire > hiccup went away. OK, that seems clear. I mistakenly referred to the CRC calculation happening while the lock was held, which confused the discussion. The lock *is* held for longer when we have backup blocks and the lock does need to be acquired twice immediately after a checkpoint. Neither of the above two effects appear, on their own, sufficient to explain the delay. We should conjecture that a traffic jam exists and go looking for it. Propose a DTrace probe immediately after the "goto begin" at line 740 of xlog.c, so we can start tracing from the first backend following checkpoint, and turn off tracing when all backends have completed a transaction. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support
> > Propose a DTrace probe immediately after the "goto begin" at line 740 of > xlog.c, so we can start tracing from the first backend following > checkpoint, and turn off tracing when all backends have completed a > transaction. > That's greek to me. But I'm happy to test things if you send me patches or custom code. Thanks, Dan
On Mon, Jul 13, 2009 at 3:53 PM, Dan Armbrust<daniel.armbrust.list@gmail.com> wrote: >> So this thought leads to a couple of other things Dan could test. >> First, see if turning off full_page_writes makes the hiccup go away. >> If so, we know the problem is in this area (though still not exactly >> which reason); if not we need another idea. That's not a good permanent >> fix though, since it reduces crash safety. The other knobs to >> experiment with are synchronous_commit and wal_sync_method. If the >> stalls are due to commits waiting for additional xlog to get written, >> then async commit should stop them. I'm not sure if changing >> wal_sync_method can help, but it'd be worth experimenting with. >> >> regards, tom lane >> > > > All of my testing to date has been done with synchronous_commit=off > > I just tried setting full_page_writes=off - and like magic, the entire > hiccup went away. > > Thanks, > > Dan I haven't done any other testing for this issue since discovering that setting "full_page_writes=off" makes my performance hiccup go away. I'm about to move on to some different tests and reset this setup that I was using to test out this issue - is there anything else that anyone would like to have gathered before I axe this system? I can redo it again later too, it will just take a fair bit longer for me to reconfigure the test environment. Thanks, Dan
Dan Armbrust wrote: > > All of my testing to date has been done with synchronous_commit=off > > I just tried setting full_page_writes=off - and like magic, the entire > hiccup went away. > Why is the full_page_write happening before the commit returns when synchronous_commit is set to off? Is there a way to fix this? Thanks, Tomer Amiaz -- View this message in context: http://www.nabble.com/Checkpoint-Tuning-Question-tp24396082p24607396.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.