Thread: measure database contention
Hi, we have a some bad queries (developers are working on that), some of them run in 17 secs and that is the average but when analyzing logs i found that from time to time some of them took upto 3 mins (the same query that normally runs in 17secs). so my question is: how could i look for contention problems? -- Atentamente, Jaime Casanova Soporte y capacitación de PostgreSQL Asesoría y desarrollo de sistemas Guayaquil - Ecuador Cel. +59387171157
Jaime Casanova wrote: > we have a some bad queries (developers are working on that), some of > them run in 17 secs and that is the average but when analyzing logs i > found that from time to time some of them took upto 3 mins (the same > query that normally runs in 17secs). > > so my question is: how could i look for contention problems? A good first step is to identify the bottleneck. Frequently, but not always, this is I/O. Do you see a lot of I/O wait? Are the disks busy? I don't know anything about your system, but I once experienced a similar problem with a 2.6 Linux system where things improved considerably after changing the I/O-scheduler to "elevator=deadline". Yours, Laurenz Albe
On Tue, Dec 16, 2008 at 2:32 PM, Jaime Casanova <jcasanov@systemguards.com.ec> wrote: > we have a some bad queries (developers are working on that), some of > them run in 17 secs and that is the average but when analyzing logs i > found that from time to time some of them took upto 3 mins (the same > query that normally runs in 17secs). > > so my question is: how could i look for contention problems? Is it the exact same query? Sometimes you might find that the query plan changes depending on the particular values you have in there; it is worth running "EXPLAIN ANALYZE" to look for such cases. You might also want to look at pg_locks. ...Robert
On Tue, Dec 16, 2008 at 2:32 PM, Jaime Casanova <jcasanov@systemguards.com.ec> wrote: > Hi, > > we have a some bad queries (developers are working on that), some of > them run in 17 secs and that is the average but when analyzing logs i > found that from time to time some of them took upto 3 mins (the same > query that normally runs in 17secs). > > so my question is: how could i look for contention problems? Sometimes queries can have fluctuating plans. For example this can happen if you have sorts or hashes that are very near the allowed limit set in work_mem. so you want to catch it both ways via explain analyze. merlin
On Wed, Dec 17, 2008 at 2:34 AM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote: > Jaime Casanova wrote: >> we have a some bad queries (developers are working on that), some of >> them run in 17 secs and that is the average but when analyzing logs i >> found that from time to time some of them took upto 3 mins (the same >> query that normally runs in 17secs). >> >> so my question is: how could i look for contention problems? > > A good first step is to identify the bottleneck. > > Frequently, but not always, this is I/O. > Do you see a lot of I/O wait? Are the disks busy? > the disks are an RAID 10 with 4 sata disks of 15000rpm and nop iostat reports avg of 0.12 iowait > I don't know anything about your system, but I once experienced a > similar problem with a 2.6 Linux system where things improved considerably > after changing the I/O-scheduler to "elevator=deadline". > i don't understand I/O-schedulers at all... anyone knows what is the recommended for postgres? -- Atentamente, Jaime Casanova Soporte y capacitación de PostgreSQL Asesoría y desarrollo de sistemas Guayaquil - Ecuador Cel. +59387171157
On Wed, Dec 17, 2008 at 9:18 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Tue, Dec 16, 2008 at 2:32 PM, Jaime Casanova > <jcasanov@systemguards.com.ec> wrote: >> we have a some bad queries (developers are working on that), some of >> them run in 17 secs and that is the average but when analyzing logs i >> found that from time to time some of them took upto 3 mins (the same >> query that normally runs in 17secs). >> >> so my question is: how could i look for contention problems? > > Is it the exact same query? is the exact query... i think it will be removed later today because is a bad query anyway... but my fear is that something like happens even with good ones... maybe chekpoints could be the problem? i have 8.3.5 and condigured checkpoint_timeout in 15 minutes, chekpoint_segments 6 and checkpoint_completion_target to 0.5 i'm putting log_checkpoints to on, but should be good if there is way to analyze them better than looking through the log > Sometimes you might find that the query > plan changes depending on the particular values you have in there; it > is worth running "EXPLAIN ANALYZE" to look for such cases. > don't think that could happen in this query, because there is no way it will choose something better than seqscan > You might also want to look at pg_locks. > Only Shared ones... PS: more info about my system (sorry for don't giving it in the first post) 2 PROCESSORS Xeon(R) CPU E5430 @ 2.66GHz with 4 cores each 18Gb in Ram (4gb in shared_buffers, 4mb in work_mem) the db size is 2gb (reported by pg_database_size) max. concurrent connections seen until now: 256 -- Atentamente, Jaime Casanova Soporte y capacitación de PostgreSQL Asesoría y desarrollo de sistemas Guayaquil - Ecuador Cel. +59387171157
> is the exact query... i think it will be removed later today because > is a bad query anyway... but my fear is that something like happens > even with good ones... > > maybe chekpoints could be the problem? > i have 8.3.5 and condigured checkpoint_timeout in 15 minutes, > chekpoint_segments 6 and checkpoint_completion_target to 0.5 Well, it might help if you could provide the query, and the EXPLAIN output. Unless the query is updating data (rather than just retrieving it), checkpoints shouldn't be involved (I think). ...Robert
On Wed, Dec 17, 2008 at 11:56 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> is the exact query... i think it will be removed later today because >> is a bad query anyway... but my fear is that something like happens >> even with good ones... >> >> maybe chekpoints could be the problem? >> i have 8.3.5 and condigured checkpoint_timeout in 15 minutes, >> chekpoint_segments 6 and checkpoint_completion_target to 0.5 > > Well, it might help if you could provide the query, and the EXPLAIN output. > ok... remember i say it's a bad query ;) actually, seems there's a suitable index for that query (i guess it is using it because of the order by) mic=# explain analyze mic-# SELECT * FROM tgen_persona ORDER BY empresa_id, persona_id ASC; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------- Index Scan using pk_tgen_persona on tgen_persona (cost=0.00..8534.09 rows=86547 width=884) (actual time=0.096..129.980 rows=86596 loops=1) Total runtime: 175.952 ms (2 rows) as you see, explain analyze says it will execute in 175.952ms and because of network transfer of data executing this from pgadmin in another machine it runs for 17s... but from time to time pgFouine is shown upto 345.11 sec -- Atentamente, Jaime Casanova Soporte y capacitación de PostgreSQL Asesoría y desarrollo de sistemas Guayaquil - Ecuador Cel. +59387171157
On Wed, Dec 17, 2008 at 11:19 AM, Jaime Casanova <jcasanov@systemguards.com.ec> wrote: > On Wed, Dec 17, 2008 at 11:56 AM, Robert Haas <robertmhaas@gmail.com> wrote: >>> is the exact query... i think it will be removed later today because >>> is a bad query anyway... but my fear is that something like happens >>> even with good ones... >>> >>> maybe chekpoints could be the problem? >>> i have 8.3.5 and condigured checkpoint_timeout in 15 minutes, >>> chekpoint_segments 6 and checkpoint_completion_target to 0.5 >> >> Well, it might help if you could provide the query, and the EXPLAIN output. >> > > ok... remember i say it's a bad query ;) > actually, seems there's a suitable index for that query (i guess it is > using it because of the order by) > > mic=# explain analyze > mic-# SELECT * FROM tgen_persona ORDER BY empresa_id, persona_id ASC; > QUERY PLAN > --------------------------------------------------------------------------------------------------------------------------------------------- > Index Scan using pk_tgen_persona on tgen_persona (cost=0.00..8534.09 > rows=86547 width=884) (actual time=0.096..129.980 rows=86596 loops=1) > Total runtime: 175.952 ms > (2 rows) > > as you see, explain analyze says it will execute in 175.952ms and > because of network transfer of data executing this from pgadmin in > another machine it runs for 17s... but from time to time pgFouine is > shown upto 345.11 sec I know it's a bad query but did you try clustering on that index? Then a seq scan followed by a sort would likely be cheaper and faster. 85k rows aren't that many really.
> as you see, explain analyze says it will execute in 175.952ms and > because of network transfer of data executing this from pgadmin in > another machine it runs for 17s... but from time to time pgFouine is > shown upto 345.11 sec Well, 86000 rows is not enough to give PostgreSQL a headache, even on mediocre hardware. So I think that most likely culprit is the application that is reading the data - pgadmin or pgFouine. ...Robert