Thread: slow queries, possibly disk io

slow queries, possibly disk io

From
Josh Close
Date:
I have some queries that have significan't slowed down in the last
couple days. It's gone from 10 seconds to over 2 mins.

The cpu has never gone over 35% in the servers lifetime, but the load
average is over 8.0 right now. I'm assuming this is probably due to
disk io.

I need some help setting up postgres so that it doesn't need to go to
disk. I think the shared_buffers and effective_cache_size values are
the one's I need to look at.

Would setting shmmax and smmall to 90% or so of available mem and
putting a lot for postgres be helpful?

Effective cach size says this:
Sets the planner's assumption about the effective size of the disk
cache (that is, the portion of the kernel's disk cache that will be
used for PostgreSQL data files).

Does that mean the total available ram? Or what's left over from shared_buffers?

I've tried different things and not much has been working. Is there a
good way to ensure that most of the tables accessed in postgres will
be cached in mem and not have to go to disk?

If I'm joining a lot of tables, should the sort_mem be set high also?
Do shared_buffers, effective_cache_size, and sort_mem all use
different mem? Or are they seperate?

I've looked for information and haven't found any useful pages about this.

Any help would be greatly appreciated.

Thanks.

-Josh

Re: slow queries, possibly disk io

From
John Arbash Meinel
Date:
Josh Close wrote:

>I have some queries that have significan't slowed down in the last
>couple days. It's gone from 10 seconds to over 2 mins.
>
>The cpu has never gone over 35% in the servers lifetime, but the load
>average is over 8.0 right now. I'm assuming this is probably due to
>disk io.
>
>I need some help setting up postgres so that it doesn't need to go to
>disk. I think the shared_buffers and effective_cache_size values are
>the one's I need to look at.
>
>Would setting shmmax and smmall to 90% or so of available mem and
>putting a lot for postgres be helpful?
>
>
Setting shared buffers above something like 10-30% of memory is counter
productive.

>Effective cach size says this:
>Sets the planner's assumption about the effective size of the disk
>cache (that is, the portion of the kernel's disk cache that will be
>used for PostgreSQL data files).
>
>Does that mean the total available ram? Or what's left over from shared_buffers?
>
>I've tried different things and not much has been working. Is there a
>good way to ensure that most of the tables accessed in postgres will
>be cached in mem and not have to go to disk?
>
>If I'm joining a lot of tables, should the sort_mem be set high also?
>Do shared_buffers, effective_cache_size, and sort_mem all use
>different mem? Or are they seperate?
>
>
>
Increasing sort_mem can help with various activities, but increasing it
too much can cause you to swap, which kills performance. The caution is
that you will likely use at least 1 sort_mem per connection, and can
likely use more than one if the query is complicated.

effective_cache_size changes how Postgres plans queries, but given the
same query plan, it doesn't change performance at all.

>I've looked for information and haven't found any useful pages about this.
>
>Any help would be greatly appreciated.
>
>Thanks.
>
>-Josh
>
>

John
=:->


Attachment

Re: slow queries, possibly disk io

From
Dawid Kuroczko
Date:
On 5/26/05, Josh Close <narshe@gmail.com> wrote:
> I have some queries that have significan't slowed down in the last
> couple days. It's gone from 10 seconds to over 2 mins.
>
> The cpu has never gone over 35% in the servers lifetime, but the load
> average is over 8.0 right now. I'm assuming this is probably due to
> disk io.
>
> I need some help setting up postgres so that it doesn't need to go to
> disk. I think the shared_buffers and effective_cache_size values are
> the one's I need to look at.

Few "mandatory" questions:

1. Do you vacuum your db on regular basis? :)

2. Perhaps statistics for tables in question are out of date, did you
    try alter table set statistics?

3. explain analyze of the slow query?

4. if you for some reason cannot give explain analyze, please try to
describe the type of query (what kind of join(s)) and amount of data
found in the tables.

2 minutes from 10 seconds is a huge leap, and it may mean that
PostgreSQL for some reason is not planning as well as it could.
Throwing more RAM at the problem can help, but it would be better
to hint the planner to do the right thing.  It may be a good time to
play with planner variables. :)

   Regards,
     Dawid

Re: slow queries, possibly disk io

From
Christopher Kings-Lynne
Date:
> I have some queries that have significan't slowed down in the last
> couple days. It's gone from 10 seconds to over 2 mins.
>
> The cpu has never gone over 35% in the servers lifetime, but the load
> average is over 8.0 right now. I'm assuming this is probably due to
> disk io.

You sure it's not a severe lack of vacuuming that's the problem?

Chris

Re: slow queries, possibly disk io

From
Josh Close
Date:
> Setting shared buffers above something like 10-30% of memory is counter
> productive.

What is the reason behind it being counter productive? If shared
buffers are at 30%, should effective cache size be at 70%? How do
those two relate?

>
> Increasing sort_mem can help with various activities, but increasing it
> too much can cause you to swap, which kills performance. The caution is
> that you will likely use at least 1 sort_mem per connection, and can
> likely use more than one if the query is complicated.

I have a max of 100 connections and 2 gigs of mem. Right now the sort
mem is a 4 megs. How much higher could I put that?

-Josh

slow queries, possibly disk io

From
Josh Close
Date:
> Few "mandatory" questions:
>
> 1. Do you vacuum your db on regular basis? :)

It's vacuumed once every hour. The table sizes and data are constantly changing.

>
> 2. Perhaps statistics for tables in question are out of date, did you
>     try alter table set statistics?

No I haven't. What would that do for me?

>
> 3. explain analyze of the slow query?

Here is the function that is ran:

CREATE OR REPLACE FUNCTION adaption.funmsgspermin()
  RETURNS int4 AS
'
DECLARE
    this_rServerIds  RECORD;
    this_sQuery      TEXT;
    this_iMsgsPerMin INT;
    this_rNumSent    RECORD;

BEGIN
    this_iMsgsPerMin := 0;
    FOR this_rServerIds IN
        SELECT iId
        FROM adaption.tblServers
    LOOP
        this_sQuery := \'
            SELECT COALESCE( SUM( iNumSent ), 0 ) AS iNumSent
            FROM adaption.tblBatchHistory_\' || this_rServerIds.iId || \'
            WHERE tStamp > now() - interval \'\'5 mins\'\';
        \';
        FOR this_rNumSent IN EXECUTE this_sQuery LOOP
            this_iMsgsPerMin := this_iMsgsPerMin + this_rNumSent.iNumSent;
        END LOOP;
    END LOOP;

    this_iMsgsPerMin := this_iMsgsPerMin / 5;

    RETURN this_iMsgsPerMin;
END;
'
LANGUAGE 'plpgsql' VOLATILE;

Here is the explain analyze of one loops of the sum:

Aggregate  (cost=31038.04..31038.04 rows=1 width=4) (actual
time=14649.602..14649.604 rows=1 loops=1)
  ->  Seq Scan on tblbatchhistory_1  (cost=0.00..30907.03 rows=52401
width=4) (actual time=6339.223..14648.433 rows=919 loops=1)
        Filter: (tstamp > (now() - '00:05:00'::interval))
Total runtime: 14649.709 ms

>
> 4. if you for some reason cannot give explain analyze, please try to
> describe the type of query (what kind of join(s)) and amount of data
> found in the tables.
>
> 2 minutes from 10 seconds is a huge leap, and it may mean that
> PostgreSQL for some reason is not planning as well as it could.
> Throwing more RAM at the problem can help, but it would be better
> to hint the planner to do the right thing.  It may be a good time to
> play with planner variables. :)

Is there any documentation on planner vars? And how would I throw more
ram at it? It has 2 gigs right now. How do I know if postgres is using
that?

-Josh

Re: slow queries, possibly disk io

From
Josh Close
Date:
On 5/26/05, Christopher Kings-Lynne <chriskl@familyhealth.com.au> wrote:
> > I have some queries that have significan't slowed down in the last
> > couple days. It's gone from 10 seconds to over 2 mins.
> >
> > The cpu has never gone over 35% in the servers lifetime, but the load
> > average is over 8.0 right now. I'm assuming this is probably due to
> > disk io.
>
> You sure it's not a severe lack of vacuuming that's the problem?
>

It's vacuumed hourly. If it needs to be more than that I could do it I
guess. But from everything I've been told, hourly should be enough.

-Josh

Re: slow queries, possibly disk io

From
Tom Lane
Date:
Josh Close <narshe@gmail.com> writes:
>         this_sQuery := \'
>             SELECT COALESCE( SUM( iNumSent ), 0 ) AS iNumSent
>             FROM adaption.tblBatchHistory_\' || this_rServerIds.iId || \'
>             WHERE tStamp > now() - interval \'\'5 mins\'\';
>         \';

> Here is the explain analyze of one loops of the sum:

> Aggregate  (cost=31038.04..31038.04 rows=1 width=4) (actual
> time=14649.602..14649.604 rows=1 loops=1)
>   ->  Seq Scan on tblbatchhistory_1  (cost=0.00..30907.03 rows=52401
> width=4) (actual time=6339.223..14648.433 rows=919 loops=1)
>         Filter: (tstamp > (now() - '00:05:00'::interval))
> Total runtime: 14649.709 ms

I think you really want that seqscan to be an indexscan, instead.
I'm betting this is PG 7.4.something?  If so, probably the only
way to make it happen is to simplify the now() expression to a constant:

            SELECT COALESCE( SUM( iNumSent ), 0 ) AS iNumSent
            FROM adaption.tblBatchHistory_\' || this_rServerIds.iId || \'
            WHERE tStamp > \\\'' || (now() - interval \'5 mins\')::text ||
            \'\\\'\';

because pre-8.0 the planner won't realize that the inequality is
selective enough to favor an indexscan, unless it's comparing to
a simple constant.

(BTW, 8.0's dollar quoting makes this sort of thing a lot less painful)

            regards, tom lane

Re: slow queries, possibly disk io

From
Josh Close
Date:
> I think you really want that seqscan to be an indexscan, instead.
> I'm betting this is PG 7.4.something?  If so, probably the only
> way to make it happen is to simplify the now() expression to a constant:
>
>             SELECT COALESCE( SUM( iNumSent ), 0 ) AS iNumSent
>             FROM adaption.tblBatchHistory_\' || this_rServerIds.iId || \'
>             WHERE tStamp > \\\'' || (now() - interval \'5 mins\')::text ||
>             \'\\\'\';

The dollar sign thing would be a lot easier. I can't get this to work.
I'm using a db manager where I can just use ' instead of \'. How would
it look for that? In other words, it doesn't have the "create or
replace function as ' --stuff ' language 'plpgsql'" it just has the
actual function. Makes things a little easier. I'm getting an error at
or near "5".

>
> because pre-8.0 the planner won't realize that the inequality is
> selective enough to favor an indexscan, unless it's comparing to
> a simple constant.
>
> (BTW, 8.0's dollar quoting makes this sort of thing a lot less painful)
>
>                         regards, tom lane
>


--
-Josh

Re: slow queries, possibly disk io

From
Josh Close
Date:
Doing the query

explain
SELECT COALESCE( SUM( iNumSent ), 0 ) AS iNumSent
FROM adaption.tblBatchHistory_1
WHERE tStamp > ( now() - interval '5 mins' )::text

gives me this:

Aggregate  (cost=32138.33..32138.33 rows=1 width=4)
->  Seq Scan on tblbatchhistory_1  (cost=0.00..31996.10 rows=56891 width=4)
Filter: ((tstamp)::text > ((now() - '00:05:00'::interval))::text)

Still not an index scan.

On 5/27/05, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Josh Close <narshe@gmail.com> writes:
> >         this_sQuery := \'
> >             SELECT COALESCE( SUM( iNumSent ), 0 ) AS iNumSent
> >             FROM adaption.tblBatchHistory_\' || this_rServerIds.iId || \'
> >             WHERE tStamp > now() - interval \'\'5 mins\'\';
> >         \';
>
> > Here is the explain analyze of one loops of the sum:
>
> > Aggregate  (cost=31038.04..31038.04 rows=1 width=4) (actual
> > time=14649.602..14649.604 rows=1 loops=1)
> >   ->  Seq Scan on tblbatchhistory_1  (cost=0.00..30907.03 rows=52401
> > width=4) (actual time=6339.223..14648.433 rows=919 loops=1)
> >         Filter: (tstamp > (now() - '00:05:00'::interval))
> > Total runtime: 14649.709 ms
>
> I think you really want that seqscan to be an indexscan, instead.
> I'm betting this is PG 7.4.something?  If so, probably the only
> way to make it happen is to simplify the now() expression to a constant:
>
>             SELECT COALESCE( SUM( iNumSent ), 0 ) AS iNumSent
>             FROM adaption.tblBatchHistory_\' || this_rServerIds.iId || \'
>             WHERE tStamp > \\\'' || (now() - interval \'5 mins\')::text ||
>             \'\\\'\';
>
> because pre-8.0 the planner won't realize that the inequality is
> selective enough to favor an indexscan, unless it's comparing to
> a simple constant.
>
> (BTW, 8.0's dollar quoting makes this sort of thing a lot less painful)
>
>                         regards, tom lane
>


--
-Josh

Re: slow queries, possibly disk io

From
Rudi Starcevic
Date:
Hi,

I had some disk io issues recently with NFS, I found the command 'iostat
-x 5' to be a great help when using Linux.

For example here is the output when I do a 10GB file transfer onto hdc
Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s
avgrq-sz avgqu-sz   await  svctm  %util
hdc          0.00 875.95  0.00 29.66    0.00 7244.89     0.00  3622.44
244.27     3.07  103.52   1.78   5.27

The last field show the disk is 5.27% busy.

I have seen this value at 100%, adding more server brought it under 100%.
It seems that if you hit 100% problems sort of cascade all over that
place. For example Apache connections went right up and hit their max.

I am not sure how accurate the % is but it has work pretty well for me.

Perhaps use this command in another window with you run your SQL and see
what it shows.

HTH.
Kind regards,
Rudi.

Re: slow queries, possibly disk io

From
Josh Close
Date:
I didn't see iostat as available to install, but I'm using dstat to see this.

The server has constant disk reads averaging around 50M and quite a
few in the 60M range. This is when selects are being done, which is
almost always. I would think if postgres is grabbing everything from
memory that this wouldn't happen. This is why I think there must be
some way to allocate more mem to postgres.

There is 2 gigs of mem in this server. Here are my current settings.

max_connections = 100
shared_buffers = 50000
sort_mem = 4096
vacuum_mem = 32768
effective_cache_size = 450000

Shared buffers is set to 10% of total mem. Effective cache size is 90% of mem.

Is there anything that can be done to have postgres grab more from
memory rather than disk?


On 5/30/05, Rudi Starcevic <tech@wildcash.com> wrote:
> Hi,
>
> I had some disk io issues recently with NFS, I found the command 'iostat
> -x 5' to be a great help when using Linux.
>
> For example here is the output when I do a 10GB file transfer onto hdc
> Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s
> avgrq-sz avgqu-sz   await  svctm  %util
> hdc          0.00 875.95  0.00 29.66    0.00 7244.89     0.00  3622.44
> 244.27     3.07  103.52   1.78   5.27
>
> The last field show the disk is 5.27% busy.
>
> I have seen this value at 100%, adding more server brought it under 100%.
> It seems that if you hit 100% problems sort of cascade all over that
> place. For example Apache connections went right up and hit their max.
>
> I am not sure how accurate the % is but it has work pretty well for me.
>
> Perhaps use this command in another window with you run your SQL and see
> what it shows.
>
> HTH.
> Kind regards,
> Rudi.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings
>


--
-Josh

Re: slow queries, possibly disk io

From
Tom Lane
Date:
Josh Close <narshe@gmail.com> writes:
> There is 2 gigs of mem in this server. Here are my current settings.

> max_connections = 100
> shared_buffers = 50000
> sort_mem = 4096
> vacuum_mem = 32768
> effective_cache_size = 450000

> Shared buffers is set to 10% of total mem. Effective cache size is 90% of mem.

Uh, shared_buffers and effective_cache_size are both measured in pages,
which are 8K apiece unless you built with a nondefault BLCKSZ.  So the
above calculations are off ...

> Is there anything that can be done to have postgres grab more from
> memory rather than disk?

It's not so much a matter of what Postgres will do as what the kernel
will do.  Check to see if there is some limit on how much memory the
kernel will set aside for disk buffers.  Plain old "top" will generally
tell you what is going on, though interpreting its output sometimes
requires some wizardry.

            regards, tom lane

Re: slow queries, possibly disk io

From
Josh Close
Date:
On 5/31/05, Martin Fandel <martin.fandel@alphyra-evs.de> wrote:
> In the documentation of
> http://www.powerpostgresql.com/Downloads/annotated_conf_80.html
> is the shared_buffers set to 1/3 of the availble RAM. You're set
> 50000*8/1024=391 MB SHMEM. The effective_cache_size in your
> configuration is 450000*8/1024=3516 MB SHMEM. That's 3907MB
> of RAM but you have less than 2048MB availble.


I wrote that wrong, there is actually 4 gigs of ram available.


>
> What value do you have in /proc/sys/kernel/shmmax?
>
> I'm really new at using postgres and i have not many experience
> but maybe you can try to use 1/3 (682MB/87424)for shared_buffers
> and 2/3 (1365MB/174720) for the effective_cache_size? But i these
> settings are to high too.
>
> best regards
> Martin

Re: slow queries, possibly disk io

From
Manfred Koizar
Date:
>On 5/31/05, Martin Fandel <martin.fandel@alphyra-evs.de> wrote:
>> In the documentation of
>> http://www.powerpostgresql.com/Downloads/annotated_conf_80.html
>> is the shared_buffers set to 1/3 of the availble RAM.

Well, it says "you should never use more than 1/3 of your available RAM"
which is not quite the same as "it is set to 1/3."  I'd even say, never
set it higher than 1/10 of your available RAM, unless you know what
you're doing and why you're doing it.

Servus
 Manfred

Re: slow queries, possibly disk io

From
Simon Riggs
Date:
On Fri, 2005-05-27 at 07:52 -0500, Josh Close wrote:
> > Setting shared buffers above something like 10-30% of memory is counter
> > productive.
>
> What is the reason behind it being counter productive? If shared
> buffers are at 30%, should effective cache size be at 70%? How do
> those two relate?

They don't relate.

shared_buffers = 50000 is enough. More than that will give bgwriter
issues.

effective_cache_size changes whether indexes are selected or not. Higher
settings favour indexed access.

> >
> > Increasing sort_mem can help with various activities, but increasing it
> > too much can cause you to swap, which kills performance. The caution is
> > that you will likely use at least 1 sort_mem per connection, and can
> > likely use more than one if the query is complicated.
>
> I have a max of 100 connections and 2 gigs of mem. Right now the sort
> mem is a 4 megs. How much higher could I put that?
>

Please post your server hardware config all in one go. You have more
than 2 CPUs, yes?

Also, mention what bgwriter settings are. You may need to turn those
down a bit.

Best Regards, Simon Riggs


TIP 9: the planner will ignore... & datatypes

From
"Marc Mamin"
Date:
TIP 9: the planner will ignore your desire to choose an index scan if your
      joining column's datatypes do not match


But INT2, INT4, INT8 and "SERIAL" are considered to be a unique datatype.
Am I Right?

Thanks,

Marc

--
Geschenkt: 3 Monate GMX ProMail gratis + 3 Ausgaben stern gratis
++ Jetzt anmelden & testen ++ http://www.gmx.net/de/go/promail ++

Re: TIP 9: the planner will ignore... & datatypes

From
Alvaro Herrera
Date:
On Wed, Jun 01, 2005 at 11:45:06AM +0200, Marc Mamin wrote:
> TIP 9: the planner will ignore your desire to choose an index scan if your
>       joining column's datatypes do not match
>
>
> But INT2, INT4, INT8 and "SERIAL" are considered to be a unique datatype.
> Am I Right?

No, they weren't when this tip was written.  As of 8.0 however this tip
is no longer the complete truth; we do allow cross-type index scans.

--
Alvaro Herrera (<alvherre[a]surnet.cl>)
"I suspect most samba developers are already technically insane...
Of course, since many of them are Australians, you can't tell." (L. Torvalds)