Thread: measure database contention

measure database contention

From
"Jaime Casanova"
Date:
Hi,

we have a some bad queries (developers are working on that), some of
them run in 17 secs and that is the average but when analyzing logs i
found that from time to time some of them took upto 3 mins (the same
query that normally runs in 17secs).

so my question is: how could i look for contention problems?

--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157

Re: measure database contention

From
"Albe Laurenz"
Date:
Jaime Casanova wrote:
> we have a some bad queries (developers are working on that), some of
> them run in 17 secs and that is the average but when analyzing logs i
> found that from time to time some of them took upto 3 mins (the same
> query that normally runs in 17secs).
>
> so my question is: how could i look for contention problems?

A good first step is to identify the bottleneck.

Frequently, but not always, this is I/O.
Do you see a lot of I/O wait? Are the disks busy?

I don't know anything about your system, but I once experienced a
similar problem with a 2.6 Linux system where things improved considerably
after changing the I/O-scheduler to "elevator=deadline".

Yours,
Laurenz Albe

Re: measure database contention

From
"Robert Haas"
Date:
On Tue, Dec 16, 2008 at 2:32 PM, Jaime Casanova
<jcasanov@systemguards.com.ec> wrote:
> we have a some bad queries (developers are working on that), some of
> them run in 17 secs and that is the average but when analyzing logs i
> found that from time to time some of them took upto 3 mins (the same
> query that normally runs in 17secs).
>
> so my question is: how could i look for contention problems?

Is it the exact same query?  Sometimes you might find that the query
plan changes depending on the particular values you have in there; it
is worth running "EXPLAIN ANALYZE" to look for such cases.

You might also want to look at pg_locks.

...Robert

Re: measure database contention

From
"Merlin Moncure"
Date:
On Tue, Dec 16, 2008 at 2:32 PM, Jaime Casanova
<jcasanov@systemguards.com.ec> wrote:
> Hi,
>
> we have a some bad queries (developers are working on that), some of
> them run in 17 secs and that is the average but when analyzing logs i
> found that from time to time some of them took upto 3 mins (the same
> query that normally runs in 17secs).
>
> so my question is: how could i look for contention problems?

Sometimes queries can have fluctuating plans.  For example this can
happen if you have sorts or hashes that are very near the allowed
limit set in work_mem.  so you want to catch it both ways via explain
analyze.

merlin

Re: measure database contention

From
"Jaime Casanova"
Date:
On Wed, Dec 17, 2008 at 2:34 AM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:
> Jaime Casanova wrote:
>> we have a some bad queries (developers are working on that), some of
>> them run in 17 secs and that is the average but when analyzing logs i
>> found that from time to time some of them took upto 3 mins (the same
>> query that normally runs in 17secs).
>>
>> so my question is: how could i look for contention problems?
>
> A good first step is to identify the bottleneck.
>
> Frequently, but not always, this is I/O.
> Do you see a lot of I/O wait? Are the disks busy?
>

the disks are an RAID 10 with 4 sata disks of 15000rpm
and nop iostat reports avg of 0.12 iowait

> I don't know anything about your system, but I once experienced a
> similar problem with a 2.6 Linux system where things improved considerably
> after changing the I/O-scheduler to "elevator=deadline".
>

i don't understand I/O-schedulers at all... anyone knows what is the
recommended for postgres?

--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157

Re: measure database contention

From
"Jaime Casanova"
Date:
On Wed, Dec 17, 2008 at 9:18 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Dec 16, 2008 at 2:32 PM, Jaime Casanova
> <jcasanov@systemguards.com.ec> wrote:
>> we have a some bad queries (developers are working on that), some of
>> them run in 17 secs and that is the average but when analyzing logs i
>> found that from time to time some of them took upto 3 mins (the same
>> query that normally runs in 17secs).
>>
>> so my question is: how could i look for contention problems?
>
> Is it the exact same query?

is the exact query... i think it will be removed later today because
is a bad query anyway... but my fear is that something like happens
even with good ones...

maybe chekpoints could be the problem?
i have 8.3.5 and condigured checkpoint_timeout in 15 minutes,
chekpoint_segments 6 and checkpoint_completion_target to 0.5

i'm putting log_checkpoints to on, but should be good if there is way
to analyze them better than looking through the log

> Sometimes you might find that the query
> plan changes depending on the particular values you have in there; it
> is worth running "EXPLAIN ANALYZE" to look for such cases.
>

don't think that could happen in this query, because there is no way
it will choose something better than seqscan

> You might also want to look at pg_locks.
>

Only Shared ones...


PS: more info about my system (sorry for don't giving it in the first post)

2 PROCESSORS Xeon(R) CPU  E5430  @ 2.66GHz with 4 cores each
18Gb in Ram (4gb in shared_buffers, 4mb in work_mem)
the db size is 2gb (reported by pg_database_size)

max. concurrent connections seen until now: 256


--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157

Re: measure database contention

From
"Robert Haas"
Date:
> is the exact query... i think it will be removed later today because
> is a bad query anyway... but my fear is that something like happens
> even with good ones...
>
> maybe chekpoints could be the problem?
> i have 8.3.5 and condigured checkpoint_timeout in 15 minutes,
> chekpoint_segments 6 and checkpoint_completion_target to 0.5

Well, it might help if you could provide the query, and the EXPLAIN output.

Unless the query is updating data (rather than just retrieving it),
checkpoints shouldn't be involved (I think).

...Robert

Re: measure database contention

From
"Jaime Casanova"
Date:
On Wed, Dec 17, 2008 at 11:56 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> is the exact query... i think it will be removed later today because
>> is a bad query anyway... but my fear is that something like happens
>> even with good ones...
>>
>> maybe chekpoints could be the problem?
>> i have 8.3.5 and condigured checkpoint_timeout in 15 minutes,
>> chekpoint_segments 6 and checkpoint_completion_target to 0.5
>
> Well, it might help if you could provide the query, and the EXPLAIN output.
>

ok... remember i say it's a bad query ;)
actually, seems there's a suitable index for that query (i guess it is
using it because of the order by)

mic=# explain analyze
mic-# SELECT * FROM tgen_persona ORDER BY empresa_id, persona_id ASC;
                                                                 QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using pk_tgen_persona on tgen_persona  (cost=0.00..8534.09
rows=86547 width=884) (actual time=0.096..129.980 rows=86596 loops=1)
 Total runtime: 175.952 ms
(2 rows)

as you see, explain analyze says it will execute in 175.952ms and
because of network transfer of data executing this from pgadmin in
another machine it runs for 17s... but from time to time pgFouine is
shown upto 345.11 sec

--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157

Re: measure database contention

From
"Scott Marlowe"
Date:
On Wed, Dec 17, 2008 at 11:19 AM, Jaime Casanova
<jcasanov@systemguards.com.ec> wrote:
> On Wed, Dec 17, 2008 at 11:56 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>>> is the exact query... i think it will be removed later today because
>>> is a bad query anyway... but my fear is that something like happens
>>> even with good ones...
>>>
>>> maybe chekpoints could be the problem?
>>> i have 8.3.5 and condigured checkpoint_timeout in 15 minutes,
>>> chekpoint_segments 6 and checkpoint_completion_target to 0.5
>>
>> Well, it might help if you could provide the query, and the EXPLAIN output.
>>
>
> ok... remember i say it's a bad query ;)
> actually, seems there's a suitable index for that query (i guess it is
> using it because of the order by)
>
> mic=# explain analyze
> mic-# SELECT * FROM tgen_persona ORDER BY empresa_id, persona_id ASC;
>                                                                 QUERY PLAN
>
---------------------------------------------------------------------------------------------------------------------------------------------
>  Index Scan using pk_tgen_persona on tgen_persona  (cost=0.00..8534.09
> rows=86547 width=884) (actual time=0.096..129.980 rows=86596 loops=1)
>  Total runtime: 175.952 ms
> (2 rows)
>
> as you see, explain analyze says it will execute in 175.952ms and
> because of network transfer of data executing this from pgadmin in
> another machine it runs for 17s... but from time to time pgFouine is
> shown upto 345.11 sec

I know it's a bad query but did you try clustering on that index?
Then a seq scan followed by a sort would likely be cheaper and faster.
 85k rows aren't that many really.

Re: measure database contention

From
"Robert Haas"
Date:
> as you see, explain analyze says it will execute in 175.952ms and
> because of network transfer of data executing this from pgadmin in
> another machine it runs for 17s... but from time to time pgFouine is
> shown upto 345.11 sec

Well, 86000 rows is not enough to give PostgreSQL a headache, even on
mediocre hardware.  So I think that most likely culprit is the
application that is reading the data - pgadmin or pgFouine.

...Robert