Thread: random slow query

random slow query

From
Sean Ma
Date:
I have a table about 50 million rows. There are a few writers to pump
data into the table at the rate of 40000 row/hours. Most the time, the
SELECT is less than 100 ms. However sometime it is very slow, from 30
seconds to 500 seconds. The database is vacuum analyze regularly.

One months ago, this type of slow query happened about a few time per
day. But recently, the slow query happens more frequent at the rate of
once every 10 minutes or less. There seesm not relation to the
database loading or the type of query. If I manually execute these
query, it is returns in less than 1 seconds.

I just wonder where should I start to look?

Thanks

Shawn.

Re: random slow query

From
Andres Freund
Date:
On 06/29/2009 03:33 PM, Sean Ma wrote:
> I have a table about 50 million rows. There are a few writers to pump
> data into the table at the rate of 40000 row/hours. Most the time, the
> SELECT is less than 100 ms. However sometime it is very slow, from 30
> seconds to 500 seconds. The database is vacuum analyze regularly.
>
> One months ago, this type of slow query happened about a few time per
> day. But recently, the slow query happens more frequent at the rate of
> once every 10 minutes or less. There seesm not relation to the
> database loading or the type of query. If I manually execute these
> query, it is returns in less than 1 seconds.
>
> I just wonder where should I start to look?
The slow queries could be waiting for locks - so you could enable
log_lock_waits to see if that is the issue.

Andres


Re: random slow query

From
"Kevin Grittner"
Date:
Sean Ma <seanxma@gmail.com> wrote:
> I have a table about 50 million rows. There are a few writers to
> pump data into the table at the rate of 40000 row/hours. Most the
> time, the SELECT is less than 100 ms. However sometime it is very
> slow, from 30 seconds to 500 seconds. The database is vacuum analyze
> regularly.

What version of PostgreSQL is this?  On what OS?  What hardware?

We had similar problems on some of our servers under 8.2 and earlier
due to the tendency of PostgreSQL to build up a very large set of
dirty pages and then throw them all at the drives with an immediate
fsync.  The RAID controller queued up the requests, and fast reads got
stuck in the queue behind all those writes.  You may want to look at
this excellent coverage of the topic by Greg Smith:

http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm

We got around the problem by keeping the checkpoint interval and
shared buffer size fairly small, and making the background writer
fairly aggressive.  What works for you, if this is your problem, may
be different.  I've heard that some have had to tune their OS caching
configuration.

-Kevin

Re: random slow query

From
Mike Ivanov
Date:
Hi Sean,

Sean Ma wrote:
> One months ago, this type of slow query happened about a few time per
> day. But recently, the slow query happens more frequent at the rate of
> once every 10 minutes or less. There seesm not relation to th

What is your hardware (memory, CPU type and such)?

This seems like a cache issue to me, but I can't tell for sure without
some additional information on your system:

1) What is the amount of a) available memory b) free memory and c)
memory available to i/o buffers?

2) What is the swap usage if any?

3) What is the CPU load? Any noticeable patterns in CPU load?

You can use /usr/bin/top to obtain most of this information.

Mike



Re: random slow query

From
Sean Ma
Date:
top - 10:18:58 up 224 days, 15:10,  2 users,  load average: 6.27, 7.33, 6
Tasks: 239 total,   1 running, 238 sleeping,   0 stopped,   0 zombie
Cpu(s):  5.0%us,  0.7%sy,  0.0%ni, 61.5%id, 32.7%wa,  0.0%hi,  0.1%si,  0
Mem:  32962804k total, 32802612k used,   160192k free,   325360k buffers
Swap:  8193140k total,   224916k used,  7968224k free, 30829456k cached

Didn't really see the pattern, typical the cpu load is only about 40%

On Mon, Jun 29, 2009 at 7:26 PM, Mike Ivanov<mikei@activestate.com> wrote:
> Hi Sean,
>
> Sean Ma wrote:
>>
>> One months ago, this type of slow query happened about a few time per
>> day. But recently, the slow query happens more frequent at the rate of
>> once every 10 minutes or less. There seesm not relation to th
>
> What is your hardware (memory, CPU type and such)?
>
> This seems like a cache issue to me, but I can't tell for sure without some
> additional information on your system:
>
> 1) What is the amount of a) available memory b) free memory and c) memory
> available to i/o buffers?
>
> 2) What is the swap usage if any?
>
> 3) What is the CPU load? Any noticeable patterns in CPU load?
>
> You can use /usr/bin/top to obtain most of this information.
>
> Mike
>
>
>

Re: random slow query

From
Mike Ivanov
Date:
Hi Sean,

Well, the overall impression is your machine is badly overloaded. Look:

> top - 10:18:58 up 224 days, 15:10,  2 users,  load average: 6.27, 7.33, 6
>
The load average of 6.5 means there are six and a half processes
competing for the same CPU (and this system apparently has only one).
This approximately equals to 500% overload.

Recommendation: either add more CPU's or eliminate process competition
by moving them to other boxes.

> Tasks: 239 total,   1 running, 238 sleeping,   0 stopped,   0 zombie
>
This supports what I said above. There are only 92 processes running on
my laptop and I think it is too much. Do you have Apache running on the
same machine?

> Cpu(s):  5.0%us,  0.7%sy,  0.0%ni, 61.5%id, 32.7%wa,  0.0%hi,  0.1%si,  0
>
Waiting time (wa) is rather high, which means processes wait on locks or
for IO, another clue for concurrency issues on this machine.

> Mem:  32962804k total, 32802612k used,   160192k free,   325360k buffers
>
Buffers are about 10% of all the memory which is OK, but I tend to give
buffers some more room.

Recommendation: eliminate unneeded processes, decrease (yes, decrease)
the Postgres cache buffers if they are set too high.

> Swap:  8193140k total,   224916k used,  7968224k free, 30829456k cached
>
200M paged out. It should be zero except of an emergency. 3G of cached
swap is a sign of some crazy paging activity in thepast. Those
unexplainable slowdowns are very likely caused by that.

> Didn't really see the pattern, typical the cpu load is only about 40%
>
40% is too much, really. I start worrying when it is above 10%.

Conclusion:

- the system bears more load than it can handle
- the machine needs an upgrade
- Postges is competing with something (presumably Apache) - separate them.

That should help.

Cheers,
Mike


Re: random slow query

From
Scott Marlowe
Date:
On Tue, Jun 30, 2009 at 11:23 AM, Mike Ivanov<mikei@activestate.com> wrote:
> Hi Sean,
>
> Well, the overall impression is your machine is badly overloaded. Look:
>
>> top - 10:18:58 up 224 days, 15:10,  2 users,  load average: 6.27, 7.33, 6
>>
>
> The load average of 6.5 means there are six and a half processes competing
> for the same CPU (and this system apparently has only one). This
> approximately equals to 500% overload.
>
> Recommendation: either add more CPU's or eliminate process competition by
> moving them to other boxes.

Well, we can't be sure OP's only got one core.  However, given that
the OPs posting shows mostly idle and wait state, the real issue isn't
the number of cores, it's the IO subsystem is too slow for the load.
More cores wouldn't fix that.

>> Tasks: 239 total,   1 running, 238 sleeping,   0 stopped,   0 zombie
>>
>
> This supports what I said above. There are only 92 processes running on my
> laptop and I think it is too much. Do you have Apache running on the same
> machine?

My production PG server that runs ONLY pg has 222 processes on it.
It's no big deal.  Unless they're all trying to get cpu time, which
generally isn't the case.

>> Cpu(s):  5.0%us,  0.7%sy,  0.0%ni, 61.5%id, 32.7%wa,  0.0%hi,  0.1%si,  0
>>
>
> Waiting time (wa) is rather high, which means processes wait on locks or for
> IO, another clue for concurrency issues on this machine.

More likely just a slow IO subsystem.  Like a single drive or
something.  adding drives in a RAID-1 or RAID-10 etc usually helps.

>> Mem:  32962804k total, 32802612k used,   160192k free,   325360k buffers
>>
>
> Buffers are about 10% of all the memory which is OK, but I tend to give
> buffers some more room.

This is kernel buffers, not pg buffers.  It's set by the OS
semi-automagically.  In this case it's 325M out of 32 Gig, so it's
well under 10%, which is typical.

>> Swap:  8193140k total,   224916k used,  7968224k free, 30829456k cached
>>
>
> 200M paged out. It should be zero except of an emergency.

Not true.  Linux will happily swap out seldom used processes to make
room in memory for more kernel cache etc.  You can adjust this
tendency by setting swappiness.


> 3G of cached swap
> is a sign of some crazy paging activity in thepast. Those unexplainable
> slowdowns are very likely caused by that.

No, they're not.  It's 30G btw, and it's not swap that's cached, it's
the kernel using extra memory to cache data to / from the hard drives.
 It's normal, and shouldn't worry anybody.  In fact it's a good sign
that you're not using way too much memory for any one process.

>> Didn't really see the pattern, typical the cpu load is only about 40%
>>
>
> 40% is too much, really. I start worrying when it is above 10%.

Really?  I have eight cores on my production servers and many batch
jobs I run put all 8 cores at 90% for extended periods.  Since that
machine is normally doing a lot of smaller cached queries, it hardly
even notices.

> Conclusion:
>
> - the system bears more load than it can handle

Yes, too much IO load.  I agree on that.

> - the machine needs an upgrade

Yes, more hard drives / better caching RAID controller.

Re: random slow query

From
Sean Ma
Date:
Hi Mike,

Thanks for the details. Yes, besides another mysql server running on
the same server, there is also an homegrown application that frequent
read/write the file system.

The postgres shared cache is at 4G, is that too big?

Thanks

Sean

On Tue, Jun 30, 2009 at 1:23 PM, Mike Ivanov<mikei@activestate.com> wrote:
> Hi Sean,
>
> Well, the overall impression is your machine is badly overloaded. Look:
>
>> top - 10:18:58 up 224 days, 15:10,  2 users,  load average: 6.27, 7.33, 6
>>
>
> The load average of 6.5 means there are six and a half processes competing
> for the same CPU (and this system apparently has only one). This
> approximately equals to 500% overload.
>
> Recommendation: either add more CPU's or eliminate process competition by
> moving them to other boxes.
>
>> Tasks: 239 total,   1 running, 238 sleeping,   0 stopped,   0 zombie
>>
>
> This supports what I said above. There are only 92 processes running on my
> laptop and I think it is too much. Do you have Apache running on the same
> machine?
>
>> Cpu(s):  5.0%us,  0.7%sy,  0.0%ni, 61.5%id, 32.7%wa,  0.0%hi,  0.1%si,  0
>>
>
> Waiting time (wa) is rather high, which means processes wait on locks or for
> IO, another clue for concurrency issues on this machine.
>
>> Mem:  32962804k total, 32802612k used,   160192k free,   325360k buffers
>>
>
> Buffers are about 10% of all the memory which is OK, but I tend to give
> buffers some more room.
>
> Recommendation: eliminate unneeded processes, decrease (yes, decrease) the
> Postgres cache buffers if they are set too high.
>
>> Swap:  8193140k total,   224916k used,  7968224k free, 30829456k cached
>>
>
> 200M paged out. It should be zero except of an emergency. 3G of cached swap
> is a sign of some crazy paging activity in thepast. Those unexplainable
> slowdowns are very likely caused by that.
>
>> Didn't really see the pattern, typical the cpu load is only about 40%
>>
>
> 40% is too much, really. I start worrying when it is above 10%.
>
> Conclusion:
>
> - the system bears more load than it can handle
> - the machine needs an upgrade
> - Postges is competing with something (presumably Apache) - separate them.
>
> That should help.
>
> Cheers,
> Mike
>
>

Re: random slow query

From
Scott Marlowe
Date:
On Tue, Jun 30, 2009 at 11:49 AM, Sean Ma<seanxma@gmail.com> wrote:
> Hi Mike,
>
> Thanks for the details. Yes, besides another mysql server running on
> the same server, there is also an homegrown application that frequent
> read/write the file system.
>
> The postgres shared cache is at 4G, is that too big?

Not for a machine with 32Gig of ram.

Re: random slow query

From
Mike Ivanov
Date:
Sean,

> Yes, besides another mysql server running on
> the same server,
Which is a really bad idea :-)

> The postgres shared cache is at 4G, is that too big?
>
OK, I have misread the total memory amount which was 32G, and I thought
it was 3G. Thanks to Scott Marlow who pointed that out. In this case 4G
for shared buffers is good.

Actually, I take back my words on swap, too. 200M swapped is less
important when you have a plenty of memory.

Regards,
Mike


Re: random slow query

From
Mike Ivanov
Date:
Scott Marlowe wrote:
>>
>> The postgres shared cache is at 4G, is that too big?
>>
>
> Not for a machine with 32Gig of ram.
>
>

He could even add some more.

Mike


Re: random slow query

From
Scott Marlowe
Date:
On Tue, Jun 30, 2009 at 12:01 PM, Mike Ivanov<mikei@activestate.com> wrote:
> Scott Marlowe wrote:
>>>
>>> The postgres shared cache is at 4G, is that too big?
>>>
>>
>> Not for a machine with 32Gig of ram.
>>
>>
>
> He could even add some more.

Definitely.  Really depends on how big his data set is, and how well
pgsql is at caching it versus the kernel.  I've found that with a
really big dataset, like 250G to 1T range, the kernel is almost always
better at caching a lot of it, and if you're operating on a few
hundred meg at a time anyway, then smaller shared_buffers helps.

OTOH, if you're working on a 5G data set, it's often helpful to turn
up shared_buffers enough to cover that.

OTOH, if you're running a busy transaction oriented db (lots of small
updates) larger shared_buffers will slow you down quite a bit.

Re: random slow query

From
Mike Ivanov
Date:
Hi Scott,

> Well, we can't be sure OP's only got one core.

In fact, we can, Sean posted what top -b -n 1 says. There was only one
CPU line.

> the number of cores, it's the IO subsystem is too slow for the load.
> More cores wouldn't fix that.
>

While I agree on the IO, more cores would definitely help to improve
~6.5 load average.

> My production PG server that runs ONLY pg has 222 processes on it.
> It's no big deal.  Unless they're all trying to get cpu time, which
> generally isn't the case.
>
222 / 8 cores = ridiculous 27 processes per core, while the OP has 239.

> More likely just a slow IO subsystem.  Like a single drive or
> something.  adding drives in a RAID-1 or RAID-10 etc usually helps.
>

Absolutely.

> This is kernel buffers, not pg buffers.  It's set by the OS
> semi-automagically.  In this case it's 325M out of 32 Gig, so it's
> well under 10%, which is typical.
>

You can control the FS buffers indirectly by not allowing running
processes to take too much memory. If you have like 40% free, there are
good chances the system will use that memory for buffers. If you let
them eat up 90% and swap out some more, there is no room for buffers and
the system will have to swap out something when it really needs it.

> Not true.  Linux will happily swap out seldom used processes to make
> room in memory for more kernel cache etc.  You can adjust this
> tendency by setting swappiness.
>

This is fine until one of those processes wakes up. Then your FS cache
is dumped.

> It's 30G btw,

Yeah, I couldn't believe my eyes :-)

> > 3G of cached swap
> and it's not swap that's cached, it's
> the kernel using extra memory to cache data to / from the hard drives.
>

Oh please.. it *is*:
http://www.linux-tutorial.info/modules.php?name=MContent&pageid=314

>  It's normal, and shouldn't worry anybody.  In fact it's a good sign
> that you're not using way too much memory for any one process.
>

It says exactly the opposite.

> Really?  I have eight cores on my production servers and many batch
> jobs I run put all 8 cores at 90% for extended periods.  Since that
> machine is normally doing a lot of smaller cached queries, it hardly
> even notices.
>

The OP's machine is doing a lot of write ops, which is different.

> Yes, more hard drives / better caching RAID controller.
>
+1

BTW, nearly full file system can be another source of problems.

Cheers,
Mike



Re: random slow query

From
Scott Marlowe
Date:
On Tue, Jun 30, 2009 at 12:22 PM, Mike Ivanov<mikei@activestate.com> wrote:
> Hi Scott,
>
>> Well, we can't be sure OP's only got one core.
>
> In fact, we can, Sean posted what top -b -n 1 says. There was only one CPU
> line.

Missed that.

>
>> the number of cores, it's the IO subsystem is too slow for the load.
>> More cores wouldn't fix that.
>>
>
> While I agree on the IO, more cores would definitely help to improve ~6.5
> load average.

No, it won't.  You can have 1000 cores, and if they're all waiting on
IO,  you'll have the same load.

>> My production PG server that runs ONLY pg has 222 processes on it.
>> It's no big deal.  Unless they're all trying to get cpu time, which
>> generally isn't the case.
>>
>
> 222 / 8 cores = ridiculous 27 processes per core, while the OP has 239.

But most of those processes are asleep and doing nothing.  My
production machine is an RHEL 5.2 machine doing only one thing really,
and it's got that many processes on it.  It's fine.


>> More likely just a slow IO subsystem.  Like a single drive or
>> something.  adding drives in a RAID-1 or RAID-10 etc usually helps.
>>
>
> Absolutely.
>
>> This is kernel buffers, not pg buffers.  It's set by the OS
>> semi-automagically.  In this case it's 325M out of 32 Gig, so it's
>> well under 10%, which is typical.
>>
>
> You can control the FS buffers indirectly by not allowing running processes
> to take too much memory. If you have like 40% free, there are good chances
> the system will use that memory for buffers. If you let them eat up 90% and
> swap out some more, there is no room for buffers and the system will have to
> swap out something when it really needs it.

Close, but it'll use that memory for cache.  Large buffers are not
typical in linux, large kernel caches are.

>> Not true.  Linux will happily swap out seldom used processes to make
>> room in memory for more kernel cache etc.  You can adjust this
>> tendency by setting swappiness.
>>
>
> This is fine until one of those processes wakes up. Then your FS cache is
> dumped.

Yep.

>> > 3G of cached swap
>> and it's not swap that's cached, it's
>> the kernel using extra memory to cache data to / from the hard drives.
>
> Oh please.. it *is*:
> http://www.linux-tutorial.info/modules.php?name=MContent&pageid=314

If that tutorial says that, then that tutorial is wrong.  I'm guessing
what that tutorial is talking about, and what top is saying are two
very different things though.

>>  It's normal, and shouldn't worry anybody.  In fact it's a good sign
>> that you're not using way too much memory for any one process.
>>
>
> It says exactly the opposite.

Sorry, but you are wrong here.  Look up a better tutorial on what the
cache entry for top means.  It's most assuredly not about swap cache,
it's kernel cache.

>> Yes, more hard drives / better caching RAID controller.
>>
>
> +1
>
> BTW, nearly full file system can be another source of problems.

Yeah, ran into that a while back, causes lots of fragmentation.

Re: random slow query

From
Scott Marlowe
Date:
On Tue, Jun 30, 2009 at 12:22 PM, Mike Ivanov<mikei@activestate.com> wrote:
>> > 3G of cached swap
>> and it's not swap that's cached, it's
>> the kernel using extra memory to cache data to / from the hard drives.
>>
>
> Oh please.. it *is*:
> http://www.linux-tutorial.info/modules.php?name=MContent&pageid=314

Also think about it, the OP has 8G of swap and 30Gig cached.  How /
why would you be caching 30Gigs worth of data when there's only 8G to
cache anyway?

Re: random slow query

From
Alan Hodgson
Date:
On Tuesday 30 June 2009, Mike Ivanov <mikei@activestate.com> wrote:
> Hi Scott,
>
> > Well, we can't be sure OP's only got one core.
>
> In fact, we can, Sean posted what top -b -n 1 says. There was only one
> CPU line.
>

Recent versions of top on Linux (on RedHat 5 anyway) may show only one
combined CPU line unless you break them out with an option.

> > the number of cores, it's the IO subsystem is too slow for the load.
> > More cores wouldn't fix that.
>
> While I agree on the IO, more cores would definitely help to improve
> ~6.5 load average.

No, I agree with the previous poster. His load is entirely due to IO wait.
Only one of those processes was trying to do anything. IO wait shows up as
high load averages.

Re: random slow query

From
Mike Ivanov
Date:
Scott Marlowe wrote:
> Also think about it, the OP has 8G of swap and 30Gig cached.  How /
> why would you be caching 30Gigs worth of data when there's only 8G to
> cache anyway?
>

You're right, I have misread it again :-)

Cheers,
Mike




Re: random slow query

From
Jean-David Beyer
Date:
Alan Hodgson wrote:
> On Tuesday 30 June 2009, Mike Ivanov <mikei@activestate.com> wrote:
>> Hi Scott,
>>
>>> Well, we can't be sure OP's only got one core.
>> In fact, we can, Sean posted what top -b -n 1 says. There was only one
>> CPU line.
>>
>
> Recent versions of top on Linux (on RedHat 5 anyway) may show only one
> combined CPU line unless you break them out with an option.

I have not noticed that to be the case. I ran RHEL3 from early 2004 until a
little after RHEL5 came out. I now run that (updated whenever updates come
out), and I do not recall ever setting any flag to get it to split the CPU
into 4 pieces.

I know the flag is there, but I do not recall ever setting it.
>
>>> the number of cores, it's the IO subsystem is too slow for the load.
>>> More cores wouldn't fix that.
>> While I agree on the IO, more cores would definitely help to improve
>> ~6.5 load average.
>
> No, I agree with the previous poster. His load is entirely due to IO wait.
> Only one of those processes was trying to do anything. IO wait shows up as
> high load averages.
>
If you run xosview, you can see all that stuff broken out, in my case at
one-second intervals. It shows user, nice, system, idle, wait, hardware
interrupt, software interrupt.

It also shows disk read, write, and idle time.

Lots of other stuff too.

--
   .~.  Jean-David Beyer          Registered Linux User 85642.
   /V\  PGP-Key: 9A2FC99A         Registered Machine   241939.
  /( )\ Shrewsbury, New Jersey    http://counter.li.org
  ^^-^^ 14:55:01 up 12 days, 1:44, 3 users, load average: 4.34, 4.36, 4.41

Re: random slow query

From
Mike Ivanov
Date:
Scott Marlowe wrote:
> Close, but it'll use that memory for cache. Large buffers are not
> typical in linux, large kernel caches are.
>
OK, we're talking about different things. You're right.

> If that tutorial says that, then that tutorial is wrong.  I'm guessing
> what that tutorial is talking about, and what top is saying are two
> very different things though.
>
Then it is an amazingly common misconception. I guess it first appeared
in some book and then reproduced by zillion blogs. Essentially this is
what Goolgle brings you on 'swap cache' query.

Thanks for clearing that out.

>>>  It's normal, and shouldn't worry anybody.  In fact it's a good sign
>>> that you're not using way too much memory for any one process
>> It says exactly the opposite.
>>

This time I agree :-)

Cheers,
Mike


Re: random slow query

From
Scott Carey
Date:
Well, this is going to be a bit redundant but:


On 6/30/09 11:22 AM, "Mike Ivanov" <mikei@activestate.com> wrote:

> Hi Scott,
>
>> Well, we can't be sure OP's only got one core.
>
> In fact, we can, Sean posted what top -b -n 1 says. There was only one
> CPU line.

I do not believe that setting means what you think it means.  Here is the
same output for a machine with two quad-core cpus.

$ top -b -n 1
top - 12:43:06 up 264 days,  1:47,  5 users,  load average: 0.24, 0.25, 0.71
Tasks: 253 total,   1 running, 252 sleeping,   0 stopped,   0 zombie
Cpu(s):  5.1%us,  0.5%sy,  0.0%ni, 93.9%id,  0.5%wa,  0.0%hi,  0.1%si,
0.0%st
Mem:  16432232k total, 13212684k used,  3219548k free,     5992k buffers
Swap:  2040244k total,      180k used,  2040064k free,  7775732k cached

From the man page:

When  you  see  ¹Cpu(s):¹ in the summary area, the ¹1¹ toggle is On and all
cpu information is gathered in a single
              line.  Otherwise, each cpu is displayed separately as: ¹Cpu0,
Cpu1, ...¹



>
>> the number of cores, it's the IO subsystem is too slow for the load.
>> More cores wouldn't fix that.
>>
>
> While I agree on the IO, more cores would definitely help to improve
> ~6.5 load average.
>

Load average is one of the more useless values to look at on a system unless
you are looking at a DELTA of the load average from one condition to
another.  All alone, it doesn't say much.

The CPU was 60% idle, and ~35% in io wait.  If those processes were waiting
on CPU resources to be available, the idle % would be very low.  Or, the OS
scheduler is broken.

>> My production PG server that runs ONLY pg has 222 processes on it.
>> It's no big deal.  Unless they're all trying to get cpu time, which
>> generally isn't the case.
>>
> 222 / 8 cores = ridiculous 27 processes per core, while the OP has 239.
>

That's not rediculous at all.  Modern OS's handle thousands of idle
processes just fine.


>> This is kernel buffers, not pg buffers.  It's set by the OS
>> semi-automagically.  In this case it's 325M out of 32 Gig, so it's
>> well under 10%, which is typical.
>>
>
> You can control the FS buffers indirectly by not allowing running
> processes to take too much memory. If you have like 40% free, there are
> good chances the system will use that memory for buffers. If you let
> them eat up 90% and swap out some more, there is no room for buffers and
> the system will have to swap out something when it really needs it.
>

Or you can control the behavior with the following kenrnel params:
vm.swappiness
vm.dirty_ratio
vm.dirty_background ratio


>> Not true.  Linux will happily swap out seldom used processes to make
>> room in memory for more kernel cache etc.  You can adjust this
>> tendency by setting swappiness.
>>
>
> This is fine until one of those processes wakes up. Then your FS cache
> is dumped.

Actually, no.  When a process wakes up only the pages that are needed are
accessed.  For most idle processes that wake up from time to time, a small
bit of work is done, then they go back to sleep.  This initial allocation
does NOT come from the page cache, but from the "buffers" line in top.  The
os tries to keep some ammount of free buffers not allocated to processes or
pages available, so that allocation demands can be met without having to
synchronously decide which buffers from page cache to eject.

>>> 3G of cached swap
>> and it's not swap that's cached, it's
>> the kernel using extra memory to cache data to / from the hard drives.
>>
>
> Oh please.. it *is*:
> http://www.linux-tutorial.info/modules.php?name=MContent&pageid=314
>

There is no such thing as "cached swap".  What would there be to cache?  A
process' page is either in RAM or swap, and a file is either in buffer cache
or not.
That line entry is the size of the file page cache.
Read about 'free' and compare the values to top.

>>  It's normal, and shouldn't worry anybody.  In fact it's a good sign
>> that you're not using way too much memory for any one process.
>>
>
> It says exactly the opposite.

It says a ton of space is used caching files.

>
>> Really?  I have eight cores on my production servers and many batch
>> jobs I run put all 8 cores at 90% for extended periods.  Since that
>> machine is normally doing a lot of smaller cached queries, it hardly
>> even notices.
>>
>
> The OP's machine is doing a lot of write ops, which is different.
>

Not when it comes to CPU use percentage.  The overlap with disk I/O and CPU
on linux shows up in time spent by the kernel (system time), and often
kswapd processor time (shows up as system time).  Everything else is i/o
wait.


The OP has a I/O bottleneck.  Suggestions other than new hardware:

* Put the xlogs on a separate partition and if ext3 mount with
data=writeback or use ext2.
* Use the deadline scheduler.

If queries are intermittently causing problems, it might be due to
checkpoints.  Make sure that the kernel parameters for
dirty_background_ratio is 5 or less, and dirty_ratio is 10 or less.

Search this group for information about tuning postgres checkpoints.

If using a hardware RAID card with a battery back-up, make sure its cache
mode is set to write-back.

A larger shared_buffers size can help if sequential scans are infrequent and
kick out pages from the OS page cache.
Postgres does not let sequential scans kick out index pages or pages
accessed randomly from its buffer cache, but the OS (Linux) is more prone to
that.

Whether larger or smaller shared_buffers will help is HIGHLY load and use
case dependant.


Re: random slow query

From
Scott Carey
Date:
On 6/30/09 12:06 PM, "Jean-David Beyer" <jeandavid8@verizon.net> wrote:

> Alan Hodgson wrote:
>> On Tuesday 30 June 2009, Mike Ivanov <mikei@activestate.com> wrote:
>>> Hi Scott,
>>>
>>>> Well, we can't be sure OP's only got one core.
>>> In fact, we can, Sean posted what top -b -n 1 says. There was only one
>>> CPU line.
>>>
>>
>> Recent versions of top on Linux (on RedHat 5 anyway) may show only one
>> combined CPU line unless you break them out with an option.
>
> I have not noticed that to be the case. I ran RHEL3 from early 2004 until a
> little after RHEL5 came out. I now run that (updated whenever updates come
> out), and I do not recall ever setting any flag to get it to split the CPU
> into 4 pieces.
>
> I know the flag is there, but I do not recall ever setting it.

Top now has storable defaults so how it behaves depends on what the user has
stored for their defaults.  For example: go to interactive mode by just
typing top.
Now, hit "1".  Or hit "c", or try "M".  Now, toggle the '1' flag until it
shows one (and reports Cpu(s) not Cpu0, Cpu1, etc) and hit shift-w.
Now your defaults are changed and it will spit out one line for all cpus
unless you tell it not to.

The output can be highly customized and your preferences stored.  Hit 'h'
for more info.

Another way to put it, is that Linux' top has mostly caught up to the
proprietary commmand line interactive tools on Solaris and AIX that used to
be light-years ahead.


Re: random slow query

From
Scott Carey
Date:


On 6/30/09 1:08 PM, "Scott Carey" <scott@richrelevance.com> wrote:
>
> A larger shared_buffers size can help if sequential scans are infrequent and
> kick out pages from the OS page cache.
> Postgres does not let sequential scans kick out index pages or pages
> accessed randomly from its buffer cache, but the OS (Linux) is more prone to
> that.

Let me qualify the above:
Postgres 8.3+ doesn't let full page scans push out pages from its
shared_buffers.  It uses a ring buffer for full page scans and vacuums.


>
> Whether larger or smaller shared_buffers will help is HIGHLY load and use
> case dependant.
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>


Re: random slow query

From
Mike Ivanov
Date:
Scott Carey wrote:
>> 222 / 8 cores = ridiculous 27 processes per core, while the OP has 239
> That's not rediculous at all.  Modern OS's handle thousands of idle
> processes just fine.
>
>
I meant that 27 was a ridiculously small number.

> Or you can control the behavior with the following kenrnel params:
> vm.swappiness
> vm.dirty_ratio
> vm.dirty_background ratio
>
Thanks for pointing that out!

> Actually, no.  When a process wakes up only the pages that are needed are
> accessed.  For most idle processes that wake up from time to time, a small
> bit of work is done, then they go back to sleep.  This initial allocation
> does NOT come from the page cache, but from the "buffers" line in top.  The
> os tries to keep some ammount of free buffers not allocated to processes or
> pages available, so that allocation demands can be met without having to
> synchronously decide which buffers from page cache to eject.
>
Wait a second, I'm trying to understand that :-)
Did you mean that FS cache pages are first allocated from the buffer
pages or that process memory being paged out to swap is first written to
buffers? Could you clarify please?

> If queries are intermittently causing problems, it might be due to
> checkpoints.  Make sure that the kernel parameters for
> dirty_background_ratio is 5 or less, and dirty_ratio is 10 or less.
>
Scott, isn't dirty_ratio supposed to be less than
dirty_background_ratio? I've heard that system would automatically set
dirty_ratio = dirty_background_ratio / 2 if that's not the case. Also,
how dirty_ratio could be less than 5 if 5 is the minimal value?

Regards,
Mike


Re: random slow query

From
Scott Carey
Date:
On 6/30/09 2:39 PM, "Mike Ivanov" <mikei@activestate.com> wrote:

> Scott Carey wrote:
>>> 222 / 8 cores = ridiculous 27 processes per core, while the OP has 239
>> That's not rediculous at all.  Modern OS's handle thousands of idle
>> processes just fine.
>>
>>
> I meant that 27 was a ridiculously small number.
>
>> Or you can control the behavior with the following kenrnel params:
>> vm.swappiness
>> vm.dirty_ratio
>> vm.dirty_background ratio
>>
> Thanks for pointing that out!
>
>> Actually, no.  When a process wakes up only the pages that are needed are
>> accessed.  For most idle processes that wake up from time to time, a small
>> bit of work is done, then they go back to sleep.  This initial allocation
>> does NOT come from the page cache, but from the "buffers" line in top.  The
>> os tries to keep some ammount of free buffers not allocated to processes or
>> pages available, so that allocation demands can be met without having to
>> synchronously decide which buffers from page cache to eject.
>>
> Wait a second, I'm trying to understand that :-)
> Did you mean that FS cache pages are first allocated from the buffer
> pages or that process memory being paged out to swap is first written to
> buffers? Could you clarify please?
>

There are some kernel parameters that control how much RAM the OS tries to
keep in a state that is not allocated to page cache or processes.  I've
forgotten what these are exactly.

But the purpose is to prevent the virtual memory system from having to make
the decision on what memory to kick out of the page cache, or what pages to
swap to disk, when memory is allocated.  Rather, it can do this in the
background most of the time.   So, the first use of this is when a process
allocates memory.  Pulling a swapped page off disk probably uses this too
but I'm not sure.  It would make sense.  Pages being written to swap go
directly to swap and deallocated.
File pages are either on disk or in the page cache.   Process pages are
either in memory or swap.
But when either of these is first put in memory (process allocation,
page-in, file read), the OS can either quickly allocate to the process or
the page cache from the free buffers, or more slowly take from the page
cache, or even more slowly page out a process page.

>> If queries are intermittently causing problems, it might be due to
>> checkpoints.  Make sure that the kernel parameters for
>> dirty_background_ratio is 5 or less, and dirty_ratio is 10 or less.
>>
> Scott, isn't dirty_ratio supposed to be less than
> dirty_background_ratio? I've heard that system would automatically set
> dirty_ratio = dirty_background_ratio / 2 if that's not the case. Also,
> how dirty_ratio could be less than 5 if 5 is the minimal value?
>

dirty_ratio is the percentage of RAM that can be in the page cache and not
yet written to disk before all writes in the system block.
dirty_background_ratio is the percentage of RAM that can be filled with
dirty file pages before a background thread is started by the OS to start
flushing to disk.  Flushing to disk also occurs on timed intervals or other
triggers.

By default, Linux 2.6.18 (RHEL5/Centos5, etc) has the former at 40 and the
latter at 10, which on a 32GB system means over 13GB can be in memory and
not yet on disk!   Sometime near 2.6.22 or so the default became 10 and 5,
respectively.  For some systems, this is still too much.

I like to use the '5 second rule'.  dirty_background_ratio should be sized
so that it takes about 5 seconds to flush to disk in optimal conditions.
dirty_ratio should be 2x to 5x this depending on your application's needs --
for a system with well tuned postgres checkpoints, smaller tends to be
better to limit stalls while waiting for the checkpoint fsync to finish.

> Regards,
> Mike
>
>


Re: random slow query

From
Mike Ivanov
Date:
Scott Carey wrote:
> the OS can either quickly allocate to the process or
> the page cache from the free buffers, or more slowly take from the page
> cache, or even more slowly page out a process page.
>

Aha, now it all makes sense.

> I like to use the '5 second rule'.  dirty_background_ratio should be sized
> so that it takes about 5 seconds to flush to disk in optimal conditions.
> dirty_ratio should be 2x to 5x this depending on your application's needs --
> for a system with well tuned postgres checkpoints, smaller tends to be
> better to limit stalls while waiting for the checkpoint fsync to finish.
>

Thanks a lot, this is invaluable information.


Regards,
Mike