Thread: slow queries, possibly disk io
I have some queries that have significan't slowed down in the last couple days. It's gone from 10 seconds to over 2 mins. The cpu has never gone over 35% in the servers lifetime, but the load average is over 8.0 right now. I'm assuming this is probably due to disk io. I need some help setting up postgres so that it doesn't need to go to disk. I think the shared_buffers and effective_cache_size values are the one's I need to look at. Would setting shmmax and smmall to 90% or so of available mem and putting a lot for postgres be helpful? Effective cach size says this: Sets the planner's assumption about the effective size of the disk cache (that is, the portion of the kernel's disk cache that will be used for PostgreSQL data files). Does that mean the total available ram? Or what's left over from shared_buffers? I've tried different things and not much has been working. Is there a good way to ensure that most of the tables accessed in postgres will be cached in mem and not have to go to disk? If I'm joining a lot of tables, should the sort_mem be set high also? Do shared_buffers, effective_cache_size, and sort_mem all use different mem? Or are they seperate? I've looked for information and haven't found any useful pages about this. Any help would be greatly appreciated. Thanks. -Josh
Josh Close wrote: >I have some queries that have significan't slowed down in the last >couple days. It's gone from 10 seconds to over 2 mins. > >The cpu has never gone over 35% in the servers lifetime, but the load >average is over 8.0 right now. I'm assuming this is probably due to >disk io. > >I need some help setting up postgres so that it doesn't need to go to >disk. I think the shared_buffers and effective_cache_size values are >the one's I need to look at. > >Would setting shmmax and smmall to 90% or so of available mem and >putting a lot for postgres be helpful? > > Setting shared buffers above something like 10-30% of memory is counter productive. >Effective cach size says this: >Sets the planner's assumption about the effective size of the disk >cache (that is, the portion of the kernel's disk cache that will be >used for PostgreSQL data files). > >Does that mean the total available ram? Or what's left over from shared_buffers? > >I've tried different things and not much has been working. Is there a >good way to ensure that most of the tables accessed in postgres will >be cached in mem and not have to go to disk? > >If I'm joining a lot of tables, should the sort_mem be set high also? >Do shared_buffers, effective_cache_size, and sort_mem all use >different mem? Or are they seperate? > > > Increasing sort_mem can help with various activities, but increasing it too much can cause you to swap, which kills performance. The caution is that you will likely use at least 1 sort_mem per connection, and can likely use more than one if the query is complicated. effective_cache_size changes how Postgres plans queries, but given the same query plan, it doesn't change performance at all. >I've looked for information and haven't found any useful pages about this. > >Any help would be greatly appreciated. > >Thanks. > >-Josh > > John =:->
Attachment
On 5/26/05, Josh Close <narshe@gmail.com> wrote: > I have some queries that have significan't slowed down in the last > couple days. It's gone from 10 seconds to over 2 mins. > > The cpu has never gone over 35% in the servers lifetime, but the load > average is over 8.0 right now. I'm assuming this is probably due to > disk io. > > I need some help setting up postgres so that it doesn't need to go to > disk. I think the shared_buffers and effective_cache_size values are > the one's I need to look at. Few "mandatory" questions: 1. Do you vacuum your db on regular basis? :) 2. Perhaps statistics for tables in question are out of date, did you try alter table set statistics? 3. explain analyze of the slow query? 4. if you for some reason cannot give explain analyze, please try to describe the type of query (what kind of join(s)) and amount of data found in the tables. 2 minutes from 10 seconds is a huge leap, and it may mean that PostgreSQL for some reason is not planning as well as it could. Throwing more RAM at the problem can help, but it would be better to hint the planner to do the right thing. It may be a good time to play with planner variables. :) Regards, Dawid
> I have some queries that have significan't slowed down in the last > couple days. It's gone from 10 seconds to over 2 mins. > > The cpu has never gone over 35% in the servers lifetime, but the load > average is over 8.0 right now. I'm assuming this is probably due to > disk io. You sure it's not a severe lack of vacuuming that's the problem? Chris
> Setting shared buffers above something like 10-30% of memory is counter > productive. What is the reason behind it being counter productive? If shared buffers are at 30%, should effective cache size be at 70%? How do those two relate? > > Increasing sort_mem can help with various activities, but increasing it > too much can cause you to swap, which kills performance. The caution is > that you will likely use at least 1 sort_mem per connection, and can > likely use more than one if the query is complicated. I have a max of 100 connections and 2 gigs of mem. Right now the sort mem is a 4 megs. How much higher could I put that? -Josh
> Few "mandatory" questions: > > 1. Do you vacuum your db on regular basis? :) It's vacuumed once every hour. The table sizes and data are constantly changing. > > 2. Perhaps statistics for tables in question are out of date, did you > try alter table set statistics? No I haven't. What would that do for me? > > 3. explain analyze of the slow query? Here is the function that is ran: CREATE OR REPLACE FUNCTION adaption.funmsgspermin() RETURNS int4 AS ' DECLARE this_rServerIds RECORD; this_sQuery TEXT; this_iMsgsPerMin INT; this_rNumSent RECORD; BEGIN this_iMsgsPerMin := 0; FOR this_rServerIds IN SELECT iId FROM adaption.tblServers LOOP this_sQuery := \' SELECT COALESCE( SUM( iNumSent ), 0 ) AS iNumSent FROM adaption.tblBatchHistory_\' || this_rServerIds.iId || \' WHERE tStamp > now() - interval \'\'5 mins\'\'; \'; FOR this_rNumSent IN EXECUTE this_sQuery LOOP this_iMsgsPerMin := this_iMsgsPerMin + this_rNumSent.iNumSent; END LOOP; END LOOP; this_iMsgsPerMin := this_iMsgsPerMin / 5; RETURN this_iMsgsPerMin; END; ' LANGUAGE 'plpgsql' VOLATILE; Here is the explain analyze of one loops of the sum: Aggregate (cost=31038.04..31038.04 rows=1 width=4) (actual time=14649.602..14649.604 rows=1 loops=1) -> Seq Scan on tblbatchhistory_1 (cost=0.00..30907.03 rows=52401 width=4) (actual time=6339.223..14648.433 rows=919 loops=1) Filter: (tstamp > (now() - '00:05:00'::interval)) Total runtime: 14649.709 ms > > 4. if you for some reason cannot give explain analyze, please try to > describe the type of query (what kind of join(s)) and amount of data > found in the tables. > > 2 minutes from 10 seconds is a huge leap, and it may mean that > PostgreSQL for some reason is not planning as well as it could. > Throwing more RAM at the problem can help, but it would be better > to hint the planner to do the right thing. It may be a good time to > play with planner variables. :) Is there any documentation on planner vars? And how would I throw more ram at it? It has 2 gigs right now. How do I know if postgres is using that? -Josh
On 5/26/05, Christopher Kings-Lynne <chriskl@familyhealth.com.au> wrote: > > I have some queries that have significan't slowed down in the last > > couple days. It's gone from 10 seconds to over 2 mins. > > > > The cpu has never gone over 35% in the servers lifetime, but the load > > average is over 8.0 right now. I'm assuming this is probably due to > > disk io. > > You sure it's not a severe lack of vacuuming that's the problem? > It's vacuumed hourly. If it needs to be more than that I could do it I guess. But from everything I've been told, hourly should be enough. -Josh
Josh Close <narshe@gmail.com> writes: > this_sQuery := \' > SELECT COALESCE( SUM( iNumSent ), 0 ) AS iNumSent > FROM adaption.tblBatchHistory_\' || this_rServerIds.iId || \' > WHERE tStamp > now() - interval \'\'5 mins\'\'; > \'; > Here is the explain analyze of one loops of the sum: > Aggregate (cost=31038.04..31038.04 rows=1 width=4) (actual > time=14649.602..14649.604 rows=1 loops=1) > -> Seq Scan on tblbatchhistory_1 (cost=0.00..30907.03 rows=52401 > width=4) (actual time=6339.223..14648.433 rows=919 loops=1) > Filter: (tstamp > (now() - '00:05:00'::interval)) > Total runtime: 14649.709 ms I think you really want that seqscan to be an indexscan, instead. I'm betting this is PG 7.4.something? If so, probably the only way to make it happen is to simplify the now() expression to a constant: SELECT COALESCE( SUM( iNumSent ), 0 ) AS iNumSent FROM adaption.tblBatchHistory_\' || this_rServerIds.iId || \' WHERE tStamp > \\\'' || (now() - interval \'5 mins\')::text || \'\\\'\'; because pre-8.0 the planner won't realize that the inequality is selective enough to favor an indexscan, unless it's comparing to a simple constant. (BTW, 8.0's dollar quoting makes this sort of thing a lot less painful) regards, tom lane
> I think you really want that seqscan to be an indexscan, instead. > I'm betting this is PG 7.4.something? If so, probably the only > way to make it happen is to simplify the now() expression to a constant: > > SELECT COALESCE( SUM( iNumSent ), 0 ) AS iNumSent > FROM adaption.tblBatchHistory_\' || this_rServerIds.iId || \' > WHERE tStamp > \\\'' || (now() - interval \'5 mins\')::text || > \'\\\'\'; The dollar sign thing would be a lot easier. I can't get this to work. I'm using a db manager where I can just use ' instead of \'. How would it look for that? In other words, it doesn't have the "create or replace function as ' --stuff ' language 'plpgsql'" it just has the actual function. Makes things a little easier. I'm getting an error at or near "5". > > because pre-8.0 the planner won't realize that the inequality is > selective enough to favor an indexscan, unless it's comparing to > a simple constant. > > (BTW, 8.0's dollar quoting makes this sort of thing a lot less painful) > > regards, tom lane > -- -Josh
Doing the query explain SELECT COALESCE( SUM( iNumSent ), 0 ) AS iNumSent FROM adaption.tblBatchHistory_1 WHERE tStamp > ( now() - interval '5 mins' )::text gives me this: Aggregate (cost=32138.33..32138.33 rows=1 width=4) -> Seq Scan on tblbatchhistory_1 (cost=0.00..31996.10 rows=56891 width=4) Filter: ((tstamp)::text > ((now() - '00:05:00'::interval))::text) Still not an index scan. On 5/27/05, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Josh Close <narshe@gmail.com> writes: > > this_sQuery := \' > > SELECT COALESCE( SUM( iNumSent ), 0 ) AS iNumSent > > FROM adaption.tblBatchHistory_\' || this_rServerIds.iId || \' > > WHERE tStamp > now() - interval \'\'5 mins\'\'; > > \'; > > > Here is the explain analyze of one loops of the sum: > > > Aggregate (cost=31038.04..31038.04 rows=1 width=4) (actual > > time=14649.602..14649.604 rows=1 loops=1) > > -> Seq Scan on tblbatchhistory_1 (cost=0.00..30907.03 rows=52401 > > width=4) (actual time=6339.223..14648.433 rows=919 loops=1) > > Filter: (tstamp > (now() - '00:05:00'::interval)) > > Total runtime: 14649.709 ms > > I think you really want that seqscan to be an indexscan, instead. > I'm betting this is PG 7.4.something? If so, probably the only > way to make it happen is to simplify the now() expression to a constant: > > SELECT COALESCE( SUM( iNumSent ), 0 ) AS iNumSent > FROM adaption.tblBatchHistory_\' || this_rServerIds.iId || \' > WHERE tStamp > \\\'' || (now() - interval \'5 mins\')::text || > \'\\\'\'; > > because pre-8.0 the planner won't realize that the inequality is > selective enough to favor an indexscan, unless it's comparing to > a simple constant. > > (BTW, 8.0's dollar quoting makes this sort of thing a lot less painful) > > regards, tom lane > -- -Josh
Hi, I had some disk io issues recently with NFS, I found the command 'iostat -x 5' to be a great help when using Linux. For example here is the output when I do a 10GB file transfer onto hdc Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util hdc 0.00 875.95 0.00 29.66 0.00 7244.89 0.00 3622.44 244.27 3.07 103.52 1.78 5.27 The last field show the disk is 5.27% busy. I have seen this value at 100%, adding more server brought it under 100%. It seems that if you hit 100% problems sort of cascade all over that place. For example Apache connections went right up and hit their max. I am not sure how accurate the % is but it has work pretty well for me. Perhaps use this command in another window with you run your SQL and see what it shows. HTH. Kind regards, Rudi.
I didn't see iostat as available to install, but I'm using dstat to see this. The server has constant disk reads averaging around 50M and quite a few in the 60M range. This is when selects are being done, which is almost always. I would think if postgres is grabbing everything from memory that this wouldn't happen. This is why I think there must be some way to allocate more mem to postgres. There is 2 gigs of mem in this server. Here are my current settings. max_connections = 100 shared_buffers = 50000 sort_mem = 4096 vacuum_mem = 32768 effective_cache_size = 450000 Shared buffers is set to 10% of total mem. Effective cache size is 90% of mem. Is there anything that can be done to have postgres grab more from memory rather than disk? On 5/30/05, Rudi Starcevic <tech@wildcash.com> wrote: > Hi, > > I had some disk io issues recently with NFS, I found the command 'iostat > -x 5' to be a great help when using Linux. > > For example here is the output when I do a 10GB file transfer onto hdc > Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s > avgrq-sz avgqu-sz await svctm %util > hdc 0.00 875.95 0.00 29.66 0.00 7244.89 0.00 3622.44 > 244.27 3.07 103.52 1.78 5.27 > > The last field show the disk is 5.27% busy. > > I have seen this value at 100%, adding more server brought it under 100%. > It seems that if you hit 100% problems sort of cascade all over that > place. For example Apache connections went right up and hit their max. > > I am not sure how accurate the % is but it has work pretty well for me. > > Perhaps use this command in another window with you run your SQL and see > what it shows. > > HTH. > Kind regards, > Rudi. > > ---------------------------(end of broadcast)--------------------------- > TIP 7: don't forget to increase your free space map settings > -- -Josh
Josh Close <narshe@gmail.com> writes: > There is 2 gigs of mem in this server. Here are my current settings. > max_connections = 100 > shared_buffers = 50000 > sort_mem = 4096 > vacuum_mem = 32768 > effective_cache_size = 450000 > Shared buffers is set to 10% of total mem. Effective cache size is 90% of mem. Uh, shared_buffers and effective_cache_size are both measured in pages, which are 8K apiece unless you built with a nondefault BLCKSZ. So the above calculations are off ... > Is there anything that can be done to have postgres grab more from > memory rather than disk? It's not so much a matter of what Postgres will do as what the kernel will do. Check to see if there is some limit on how much memory the kernel will set aside for disk buffers. Plain old "top" will generally tell you what is going on, though interpreting its output sometimes requires some wizardry. regards, tom lane
On 5/31/05, Martin Fandel <martin.fandel@alphyra-evs.de> wrote: > In the documentation of > http://www.powerpostgresql.com/Downloads/annotated_conf_80.html > is the shared_buffers set to 1/3 of the availble RAM. You're set > 50000*8/1024=391 MB SHMEM. The effective_cache_size in your > configuration is 450000*8/1024=3516 MB SHMEM. That's 3907MB > of RAM but you have less than 2048MB availble. I wrote that wrong, there is actually 4 gigs of ram available. > > What value do you have in /proc/sys/kernel/shmmax? > > I'm really new at using postgres and i have not many experience > but maybe you can try to use 1/3 (682MB/87424)for shared_buffers > and 2/3 (1365MB/174720) for the effective_cache_size? But i these > settings are to high too. > > best regards > Martin
>On 5/31/05, Martin Fandel <martin.fandel@alphyra-evs.de> wrote: >> In the documentation of >> http://www.powerpostgresql.com/Downloads/annotated_conf_80.html >> is the shared_buffers set to 1/3 of the availble RAM. Well, it says "you should never use more than 1/3 of your available RAM" which is not quite the same as "it is set to 1/3." I'd even say, never set it higher than 1/10 of your available RAM, unless you know what you're doing and why you're doing it. Servus Manfred
On Fri, 2005-05-27 at 07:52 -0500, Josh Close wrote: > > Setting shared buffers above something like 10-30% of memory is counter > > productive. > > What is the reason behind it being counter productive? If shared > buffers are at 30%, should effective cache size be at 70%? How do > those two relate? They don't relate. shared_buffers = 50000 is enough. More than that will give bgwriter issues. effective_cache_size changes whether indexes are selected or not. Higher settings favour indexed access. > > > > Increasing sort_mem can help with various activities, but increasing it > > too much can cause you to swap, which kills performance. The caution is > > that you will likely use at least 1 sort_mem per connection, and can > > likely use more than one if the query is complicated. > > I have a max of 100 connections and 2 gigs of mem. Right now the sort > mem is a 4 megs. How much higher could I put that? > Please post your server hardware config all in one go. You have more than 2 CPUs, yes? Also, mention what bgwriter settings are. You may need to turn those down a bit. Best Regards, Simon Riggs
TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match But INT2, INT4, INT8 and "SERIAL" are considered to be a unique datatype. Am I Right? Thanks, Marc -- Geschenkt: 3 Monate GMX ProMail gratis + 3 Ausgaben stern gratis ++ Jetzt anmelden & testen ++ http://www.gmx.net/de/go/promail ++
On Wed, Jun 01, 2005 at 11:45:06AM +0200, Marc Mamin wrote: > TIP 9: the planner will ignore your desire to choose an index scan if your > joining column's datatypes do not match > > > But INT2, INT4, INT8 and "SERIAL" are considered to be a unique datatype. > Am I Right? No, they weren't when this tip was written. As of 8.0 however this tip is no longer the complete truth; we do allow cross-type index scans. -- Alvaro Herrera (<alvherre[a]surnet.cl>) "I suspect most samba developers are already technically insane... Of course, since many of them are Australians, you can't tell." (L. Torvalds)