Thread: Long Running Commits - Not Checkpoints

Long Running Commits - Not Checkpoints

From
Brad Nicholson
Date:
I'm having a problem with long running commits appearing in my database
logs.  It may be hardware related, as the problem appeared when we moved
the database to a new server connected to a different disk array.  The
disk array is a lower class array, but still more than powerful enough
to handle the IO requirements.  One big difference though is that the
old array had 16 GB of cache, the new one has 4 GB.

Running Postgres 8.1.8 on AIX 5.3

We have enough IO to spare that we have the bgwriter cranked up pretty
high, dirty buffers are getting quickly.  Vmstat indicates 0 io wait
time, no swapping or anything nasty like that going on.

The long running commits do not line up with checkpoint times.

The postgresql.conf config are identical except that wal_buffers was 8
on the old master, and it is set to 16 on the new one.

We have other installations of this product running on the same array
(different servers though) and they are not suffering from this
problem.

The only other thing of note is that the wal files sit on the same disk
as the data directory.  This has not changed between the old and new
config, but the installs that are running fine do have their wal files
on a separate partition.

Any ideas where the problem could lie?  Could having the wal files on
the same data partition cause long running commits when there is plenty
of IO to spare?

--
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.


Re: Long Running Commits - Not Checkpoints

From
Brad Nicholson
Date:
On Thu, 2007-09-13 at 10:15 -0400, Brad Nicholson wrote:
> I'm having a problem with long running commits appearing in my database
> logs.  It may be hardware related, as the problem appeared when we moved
> the database to a new server connected to a different disk array.  The
> disk array is a lower class array, but still more than powerful enough
> to handle the IO requirements.  One big difference though is that the
> old array had 16 GB of cache, the new one has 4 GB.
>
> Running Postgres 8.1.8 on AIX 5.3
>
> We have enough IO to spare that we have the bgwriter cranked up pretty
> high, dirty buffers are getting quickly.  Vmstat indicates 0 io wait
> time, no swapping or anything nasty like that going on.
>
> The long running commits do not line up with checkpoint times.
>
> The postgresql.conf config are identical except that wal_buffers was 8
> on the old master, and it is set to 16 on the new one.
>
> We have other installations of this product running on the same array
> (different servers though) and they are not suffering from this
> problem.
>
> The only other thing of note is that the wal files sit on the same disk
> as the data directory.  This has not changed between the old and new
> config, but the installs that are running fine do have their wal files
> on a separate partition.
>
> Any ideas where the problem could lie?  Could having the wal files on
> the same data partition cause long running commits when there is plenty
> of IO to spare?

More on this - we also have long running commits on installations that
do have the wal files on a separate partition.

--
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.


Re: Long Running Commits - Not Checkpoints

From
Tom Lane
Date:
Brad Nicholson <bnichols@ca.afilias.info> writes:
> On Thu, 2007-09-13 at 10:15 -0400, Brad Nicholson wrote:
>> I'm having a problem with long running commits appearing in my database
>> logs.  It may be hardware related, as the problem appeared when we moved
>> the database to a new server connected to a different disk array.

> More on this - we also have long running commits on installations that
> do have the wal files on a separate partition.

What's your definition of "long running commit" --- seconds? milliseconds?
Exactly what are you measuring?  Can you correlate the problem with what
the transaction was doing?

            regards, tom lane

Re: Long Running Commits - Not Checkpoints

From
Brad Nicholson
Date:
On Thu, 2007-09-13 at 11:10 -0400, Tom Lane wrote:
> Brad Nicholson <bnichols@ca.afilias.info> writes:
> > On Thu, 2007-09-13 at 10:15 -0400, Brad Nicholson wrote:
> >> I'm having a problem with long running commits appearing in my database
> >> logs.  It may be hardware related, as the problem appeared when we moved
> >> the database to a new server connected to a different disk array.
>
> > More on this - we also have long running commits on installations that
> > do have the wal files on a separate partition.
>
> What's your definition of "long running commit" --- seconds? milliseconds?
> Exactly what are you measuring?  Can you correlate the problem with what

log_min_duration is set to 150ms

Commits running over that up to 788ms.  Here is what we see in the logs
(with obfuscated dbname, username and IP):

2007-09-13 10:01:49.787 CUT [782426] dbname username 1.2.3.171 LOG:
duration: 224.286 ms  statement: EXECUTE <unnamed>  [PREPARE:  commit]
2007-09-13 10:19:16.373 CUT [737404] dbname username 1.2.3.174 LOG:
duration: 372.545 ms  statement: EXECUTE <unnamed>  [PREPARE:  commit]
2007-09-13 10:19:24.437 CUT [1806498] dbname username 11.2.3.171 LOG:
duration: 351.544 ms  statement: EXECUTE <unnamed>  [PREPARE:  commit]
2007-09-13 10:33:11.204 CUT [962598] dbname username 1.2.3.170 LOG:
duration: 504.057 ms  statement: EXECUTE <unnamed>  [PREPARE:  commit]
2007-09-13 10:40:33.735 CUT [1282104] dbname username 1.2.3.174 LOG:
duration: 250.127 ms  statement: EXECUTE <unnamed>  [PREPARE:  commit]
2007-09-13 10:49:54.752 CUT [1188032] dbname username 1.2.3.170 LOG:
duration: 382.781 ms  statement: EXECUTE <unnamed>  [PREPARE:  commit]
2007-09-13 11:30:43.339 CUT [1589464] dbname username 1.2.3.172 LOG:
duration: 408.463 ms  statement: EXECUTE <unnamed>  [PREPARE:  commit]


--
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.


Re: Long Running Commits - Not Checkpoints

From
Greg Smith
Date:
On Thu, 13 Sep 2007, Brad Nicholson wrote:

> One big difference though is that the old array had 16 GB of cache, the
> new one has 4 GB.
>
> We have enough IO to spare that we have the bgwriter cranked up pretty
> high, dirty buffers are getting quickly.

If your system is very active, running the bgwriter very aggressively will
result in the cache on the disk array being almost continuously filled
with pending writes that then trickle their way onto real disk eventually.
If the typical working set on this system results in the amount of
background writer cached writes regularly being >4GB but <16GB, that would
explain what you're seeing.  The resolution is actually unexpected by most
people:  you make the background writer less aggressive so that it's
spewing less redundant writes clogging the array's cache, leaving more
cache to buffer the actual commits so they don't block.  This will
increase the odds that there will be a checkpoint block instead, but if
you're seeing none of them right now you may have some margin there to
reduce the BGW activity without aggrevating checkpoints.

Since you're probably not monitoring I/O waits and similar statistics on
how the disk array's cache is being used, whether this is happening or not
to you won't be obvious from what the operating system is reporting.  One
or two blocked writes every couple of minutes won't even show up on the
gross statistics for operating system I/O waits; on average, they'll still
be 0.  So it's possible you may think you have plenty of I/O to spare,
when in fact you don't quite have enough--what you've got is enough cache
that the OS can't see where the real I/O bottleneck is.

I'd be curious to see how you've got your background writer configured to
see if it matches situations like this I've seen in the past.  The
parameters controlling the all scan are the ones you'd might consider
turning down, definately the percentage and possibly the maxpages as well.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: Long Running Commits - Not Checkpoints

From
Brad Nicholson
Date:
On Thu, 2007-09-13 at 12:12 -0400, Greg Smith wrote:
> On Thu, 13 Sep 2007, Brad Nicholson wrote:

> I'd be curious to see how you've got your background writer configured to
> see if it matches situations like this I've seen in the past.  The
> parameters controlling the all scan are the ones you'd might consider
> turning down, definately the percentage and possibly the maxpages as well.


bgwriter_delay = 50                     # 10-10000 milliseconds between
rounds
bgwriter_lru_percent = 20.0             # 0-100% of LRU buffers
scanned/round
bgwriter_lru_maxpages = 300             # 0-1000 buffers max
written/round
bgwriter_all_percent = 20               # 0-100% of all buffers
scanned/round
bgwriter_all_maxpages = 600             # 0-1000 buffers max
written/round


--
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.


Re: Long Running Commits - Not Checkpoints

From
Brad Nicholson
Date:
On Thu, 2007-09-13 at 12:19 -0400, Brad Nicholson wrote:
> On Thu, 2007-09-13 at 12:12 -0400, Greg Smith wrote:
> > On Thu, 13 Sep 2007, Brad Nicholson wrote:
>
> > I'd be curious to see how you've got your background writer configured to
> > see if it matches situations like this I've seen in the past.  The
> > parameters controlling the all scan are the ones you'd might consider
> > turning down, definately the percentage and possibly the maxpages as well.
>
>
> bgwriter_delay = 50                     # 10-10000 milliseconds between
> rounds
> bgwriter_lru_percent = 20.0             # 0-100% of LRU buffers
> scanned/round
> bgwriter_lru_maxpages = 300             # 0-1000 buffers max
> written/round
> bgwriter_all_percent = 20               # 0-100% of all buffers
> scanned/round
> bgwriter_all_maxpages = 600             # 0-1000 buffers max
> written/round

I should add, there are 6 back ends running on this disk array
(different servers and different data partitions) with these bgwriter
settings.

--
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.


Re: Long Running Commits - Not Checkpoints

From
Alvaro Herrera
Date:
Brad Nicholson wrote:
> On Thu, 2007-09-13 at 12:19 -0400, Brad Nicholson wrote:
> > On Thu, 2007-09-13 at 12:12 -0400, Greg Smith wrote:
> > > On Thu, 13 Sep 2007, Brad Nicholson wrote:
> >
> > > I'd be curious to see how you've got your background writer configured to
> > > see if it matches situations like this I've seen in the past.  The
> > > parameters controlling the all scan are the ones you'd might consider
> > > turning down, definately the percentage and possibly the maxpages as well.
> >
> >
> > bgwriter_delay = 50                     # 10-10000 milliseconds between
> > rounds
> > bgwriter_lru_percent = 20.0             # 0-100% of LRU buffers
> > scanned/round
> > bgwriter_lru_maxpages = 300             # 0-1000 buffers max
> > written/round
> > bgwriter_all_percent = 20               # 0-100% of all buffers
> > scanned/round
> > bgwriter_all_maxpages = 600             # 0-1000 buffers max
> > written/round
>
> I should add, there are 6 back ends running on this disk array
> (different servers and different data partitions) with these bgwriter
> settings.

Maybe it is running deferred triggers or something?

--
Alvaro Herrera                        http://www.advogato.org/person/alvherre
"I suspect most samba developers are already technically insane...
Of course, since many of them are Australians, you can't tell." (L. Torvalds)

Re: Long Running Commits - Not Checkpoints

From
Brad Nicholson
Date:
On Thu, 2007-09-13 at 12:12 -0400, Greg Smith wrote:
> Since you're probably not monitoring I/O waits and similar statistics on
> how the disk array's cache is being used, whether this is happening or not
> to you won't be obvious from what the operating system is reporting.


A sysadmin looked at cache usage on the disk array.  The read cache is
being used heavily, and the write cache is not.

--
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.


Re: Long Running Commits - Not Checkpoints

From
Greg Smith
Date:
On Thu, 13 Sep 2007, Brad Nicholson wrote:

> A sysadmin looked at cache usage on the disk array.  The read cache is
> being used heavily, and the write cache is not.

Given that information, you can take the below (which I was just about to
send before the above update came in) as something to think about and test
but perhaps not your primary line of attack.  Even if my theory about the
exact mechanism involved isn't correct, the background writer is still
problematic in terms of its impact on the system when run as aggressively
as you're doing it; I'm not sure but I think that's even more true on 8.1
than it is on 8.2 where I did most my testing in this area.

> bgwriter_delay = 50
> bgwriter_lru_percent = 20.0
> bgwriter_lru_maxpages = 300
> bgwriter_all_percent = 20
> bgwriter_all_maxpages = 600

That was what I was expecting.  Your all scan has the potential to be
writing 600*8K*(1/50 msec)=98MB/sec worth of data to your disk array.
Since some of this data has a random access component to it, your array
cannot be expected to keep with a real peak load; the only thing saving
you if something starts dirtying buffers as far as possible is that the
array cache is buffering things.  And that 4GB worth of cache could be
filling in very little time.

Every time the all scan writes a buffer that is frequently used, that
write has a good chance that it was wasted because the block will be
modified again before checkpoint time.  Your settings are beyond regular
aggressive and into the hyperactive terrority where I'd expect such
redundant writes are happening often.  I'd suggest you try to move toward
dropping bgwriter_all_percent dramatically from its current setting and
see how far down you can go before it starts to introduce blocks at
checkpoint time.  With bgwriter_delay set to 1/4 the default, I would
expect that even 5% would be a high setting for you.  That may be a more
dramatic change than you want to make at once though, so lowering it in
that direction more slowly (perhaps drop 5% each day) and seeing whether
things improve as that happens may make more sense.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: Long Running Commits - Not Checkpoints

From
"Peter Childs"
Date:


On 13/09/2007, Greg Smith <gsmith@gregsmith.com> wrote:

Every time the all scan writes a buffer that is frequently used, that
write has a good chance that it was wasted because the block will be
modified again before checkpoint time.  Your settings are beyond regular
aggressive and into the hyperactive terrority where I'd expect such
redundant writes are happening often.  I'd suggest you try to move toward
dropping bgwriter_all_percent dramatically from its current setting and
see how far down you can go before it starts to introduce blocks at
checkpoint time.  With bgwriter_delay set to 1/4 the default, I would
expect that even 5% would be a high setting for you.  That may be a more
dramatic change than you want to make at once though, so lowering it in
that direction more slowly (perhaps drop 5% each day) and seeing whether
things improve as that happens may make more sense.


Are you suggesting that reducing bgwriter_delay and bg_writer_percent would reduce the time spent doing commits?

I get quite a few commits that take over 500ms (the point when i start logging queries). I always thought oh just one of those things but if they can be reduced by changing a few config variables that would be great. I'm just trying to workout what figures are worth trying to see if I can reduce them.

From time to time I get commits that take 6 or 7 seconds but not all the time.

I'm currently working with the defaults.

Peter Childs

Re: Long Running Commits - Not Checkpoints

From
"Peter Childs"
Date:


On 14/09/2007, Peter Childs <peterachilds@gmail.com> wrote:


On 13/09/2007, Greg Smith < gsmith@gregsmith.com> wrote:

Every time the all scan writes a buffer that is frequently used, that
write has a good chance that it was wasted because the block will be
modified again before checkpoint time.  Your settings are beyond regular
aggressive and into the hyperactive terrority where I'd expect such
redundant writes are happening often.  I'd suggest you try to move toward
dropping bgwriter_all_percent dramatically from its current setting and
see how far down you can go before it starts to introduce blocks at
checkpoint time.  With bgwriter_delay set to 1/4 the default, I would
expect that even 5% would be a high setting for you.  That may be a more
dramatic change than you want to make at once though, so lowering it in
that direction more slowly (perhaps drop 5% each day) and seeing whether
things improve as that happens may make more sense.


Are you suggesting that reducing bgwriter_delay and bg_writer_percent would reduce the time spent doing commits?

I get quite a few commits that take over 500ms (the point when i start logging queries). I always thought oh just one of those things but if they can be reduced by changing a few config variables that would be great. I'm just trying to workout what figures are worth trying to see if I can reduce them.

From time to time I get commits that take 6 or 7 seconds but not all the time.

I'm currently working with the defaults.

Peter Childs

Hmm Always read the manual, Increase them from the defaults.......

Peter.

Re: Long Running Commits - Not Checkpoints

From
db@zigo.dhs.org
Date:
> I'm having a problem with long running commits appearing in my database
> logs.  It may be hardware related, as the problem appeared when we moved
> the database to a new server connected to a different disk array.  The
> disk array is a lower class array, but still more than powerful enough
> to handle the IO requirements.  One big difference though is that the
> old array had 16 GB of cache, the new one has 4 GB.

Maybe the old disk array had battery backed up ram that was used as a
write cache and the new only has read cache? Without battery backed up ram
(or flash or whatever) then a commit need to flush data down onto the
pysical disk.

/Dennis


Re: Long Running Commits - Not Checkpoints

From
Greg Smith
Date:
On Fri, 14 Sep 2007, Peter Childs wrote:

> Are you suggesting that reducing bgwriter_delay and bg_writer_percent
> would reduce the time spent doing commits? I get quite a few commits
> that take over 500ms (the point when i start logging queries).

One very common cause for transactions blocking for as much as several
seconds is hitting a checkpoint, which in current versions causes a large
amount of data to be written and synchronized to the physical disk.  If
you're already tracking long transactions and trying to figure out what's
causing them, you should set checkpoint_warning to a high value (the
maximum of 3600 is generally fine).  That will give you a harmless warning
every time a checkpoint occurs.  If the slow transactions line up with
checkpoints, then you might consider tuning or re-tuning the background
writer to elimite the delay.

In this particular case, their system had already been tuned so
aggressively that I wondered if the background writer was being a problem
rather than a solution.  Reducing the percentage would turn it down a bit;
so would *increasing* the delay--they had already decreased it
considerably, making it 4X as active as the default.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD