Thread: 4s query want to run faster

4s query want to run faster

From
"Adonias Malosso"
Date:
Hi all,

The following query takes about 4s to run in a 16GB ram server. Any ideas why it doesn´t use index for the primary keys in the join conditions?

select i.inuid, count(*) as total
from cte.instrumentounidade i
inner join cte.pontuacao p on p.inuid = i.inuid
inner join cte.acaoindicador ai on ai.ptoid = p.ptoid
inner join cte.subacaoindicador si on si.aciid = ai.aciid
where i.itrid = 2 and p.ptostatus = 'A'
group by i.inuid
having count(*) > 0

HashAggregate  (cost=47905.87..47941.01 rows=2008 width=4)
 Filter: (count(*) > 0)
 ->  Hash Join  (cost=16307.79..46511.45 rows=185923 width=4)
       Hash Cond: (si.aciid = ai.aciid)
       ->  Seq Scan on subacaoindicador si  (cost=0.00..22812.17 rows=368817 width=4)
       ->  Hash  (cost=16211.40..16211.40 rows=38556 width=8)
             ->  Hash Join  (cost=9018.20..16211.40 rows=38556 width=8)
                   Hash Cond: (p.inuid = i.inuid)
                   ->  Hash Join  (cost=8908.41..15419.10 rows=39593 width=8)
                         Hash Cond: (ai.ptoid = p.ptoid)
                         ->  Seq Scan on acaoindicador ai  (cost=0.00..4200.84 rows=76484 width=8)
                         ->  Hash  (cost=8678.33..8678.33 rows=92034 width=8)
                               ->  Seq Scan on pontuacao p  (cost=0.00..8678.33 rows=92034 width=8)
                                     Filter: (ptostatus = 'A'::bpchar)
                   ->  Hash  (cost=104.78..104.78 rows=2008 width=4)
                         ->  Seq Scan on instrumentounidade i  (cost=0.00..104.78 rows=2008 width=4)
                               Filter: (itrid = 2)
 

Re: 4s query want to run faster

From
"Scott Marlowe"
Date:
On Thu, Feb 21, 2008 at 2:48 PM, Adonias Malosso <malosso@gmail.com> wrote:
> Hi all,
>
> The following query takes about 4s to run in a 16GB ram server. Any ideas
> why it doesn´t use index for the primary keys in the join conditions?
>
> select i.inuid, count(*) as total
> from cte.instrumentounidade i
>  inner join cte.pontuacao p on p.inuid = i.inuid
> inner join cte.acaoindicador ai on ai.ptoid = p.ptoid
> inner join cte.subacaoindicador si on si.aciid = ai.aciid
> where i.itrid = 2 and p.ptostatus = 'A'
> group by i.inuid
>  having count(*) > 0

What does explain analyze say about that query?

Re: 4s query want to run faster

From
"Adonias Malosso"
Date:
HashAggregate  (cost=47818.40..47853.12 rows=1984 width=4) (actual time=5738.879..5743.390 rows=1715 loops=1)
 Filter: (count(*) > 0)
 ->  Hash Join  (cost=16255.99..46439.06 rows=183912 width=4) (actual time=1887.974..5154.207 rows=241693 loops=1)
       Hash Cond: (si.aciid = ai.aciid)
       ->  Seq Scan on subacaoindicador si  (cost=0.00..22811.98 rows=368798 width=4) (actual time=0.108..1551.816 rows=368798 loops=1)
       ->  Hash  (cost=16160.64..16160.64 rows=38141 width=8) (actual time=1887.790..1887.790 rows=52236 loops=1)
             ->  Hash Join  (cost=9015.31..16160.64 rows=38141 width=8) (actual time=980.058..1773.530 rows=52236 loops=1)
                   Hash Cond: (p.inuid = i.inuid)
                   ->  Hash Join  (cost=8905.89..15376.11 rows=39160 width=8) (actual time=967.116..1568.028 rows=54225 loops=1)
                         Hash Cond: (ai.ptoid = p.ptoid)
                         ->  Seq Scan on acaoindicador ai  (cost=0.00..4200.84 rows=76484 width=8) (actual time=0.080..259.412 rows=76484 loops=1)
                         ->  Hash  (cost=8678.33..8678.33 rows=91026 width=8) (actual time=966.841..966.841 rows=92405 loops=1)
                               ->  Seq Scan on pontuacao p  (cost=0.00..8678.33 rows=91026 width=8) (actual time=0.087..746.528 rows=92405 loops=1)
                                     Filter: (ptostatus = 'A'::bpchar)
                   ->  Hash  (cost=104.46..104.46 rows=1984 width=4) (actual time=12.913..12.913 rows=1983 loops=1)
                         ->  Seq Scan on instrumentounidade i  (cost=0.00..104.46 rows=1984 width=4) (actual time=0.091..8.879 rows=1983 loops=1)
                               Filter: (itrid = 2)
Total runtime: 5746.415 ms


On Thu, Feb 21, 2008 at 5:58 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
On Thu, Feb 21, 2008 at 2:48 PM, Adonias Malosso <malosso@gmail.com> wrote:
> Hi all,
>
> The following query takes about 4s to run in a 16GB ram server. Any ideas
> why it doesn´t use index for the primary keys in the join conditions?
>
> select i.inuid, count(*) as total
> from cte.instrumentounidade i
>  inner join cte.pontuacao p on p.inuid = i.inuid
> inner join cte.acaoindicador ai on ai.ptoid = p.ptoid
> inner join cte.subacaoindicador si on si.aciid = ai.aciid
> where i.itrid = 2 and p.ptostatus = 'A'
> group by i.inuid
>  having count(*) > 0

What does explain analyze say about that query?

Re: 4s query want to run faster

From
"Claus Guttesen"
Date:
> The following query takes about 4s to run in a 16GB ram server. Any ideas
> why it doesn´t use index for the primary keys in the join conditions?

Maby random_page_cost is set too high? What version are you using?

--
regards
Claus

When lenity and cruelty play for a kingdom,
the gentlest gamester is the soonest winner.

Shakespeare

Re: 4s query want to run faster

From
"Adonias Malosso"
Date:


On Thu, Feb 21, 2008 at 6:10 PM, Claus Guttesen <kometen@gmail.com> wrote:
> The following query takes about 4s to run in a 16GB ram server. Any ideas
> why it doesn´t use index for the primary keys in the join conditions?

Maby random_page_cost is set too high? What version are you using?

Postgresql v. 8.2.1



--
regards
Claus

When lenity and cruelty play for a kingdom,
the gentlest gamester is the soonest winner.

Shakespeare

Re: 4s query want to run faster

From
"Claus Guttesen"
Date:
> > > why it doesn´t use index for the primary keys in the join conditions?
> >
> > Maby random_page_cost is set too high? What version are you using?
>
> Postgresql v. 8.2.1

You can try to lower this value. The default (in 8.3) is 4.

--
regards
Claus

When lenity and cruelty play for a kingdom,
the gentlest gamester is the soonest winner.

Shakespeare

Re: 4s query want to run faster

From
"Adonias Malosso"
Date:
Set random_page_cost = 2 solved the problem. thanks

On Thu, Feb 21, 2008 at 6:16 PM, Claus Guttesen <kometen@gmail.com> wrote:
> > > why it doesn´t use index for the primary keys in the join conditions?
> >
> > Maby random_page_cost is set too high? What version are you using?
>
> Postgresql v. 8.2.1

You can try to lower this value. The default (in 8.3) is 4.

--
regards
Claus

When lenity and cruelty play for a kingdom,
the gentlest gamester is the soonest winner.

Shakespeare

Re: 4s query want to run faster

From
"Scott Marlowe"
Date:
Well, all the row counts in expected and actual are pretty close.  I'm
guessing it's as optimized as it's likely to get.  you could try
mucking about with random_page_cost to force index usage, but indexes
are not always a win in pgsql, hence the seq scans etc...  If the
number of rows returned represents a large percentage of the total
number of rows in the table, then a seq scan is generally a win.  Note
that most all the time being spent in this query is on the Hash Join,
not on the seq scans.

Also, you should really update to 8.2.6 the latest 8.2 version.  Check
the release notes for the bugs that were fixed between 8.2.1 and 8.2.6

Re: 4s query want to run faster

From
Mark Kirkwood
Date:
The other parameter you might want to look at is effective_cache_size -
increasing it will encourage index use. On a machine with 16GB the
default is probably too small (there are various recommendations about
how to set this ISTR either Scott M or Greg Smith had a page somewhere
that covered this quite well - guys?).

Obviously, decreasing random_page_cost fixed this query for you, but if
find yourself needing to tweak it again for other queries, then look at
changing effective_cache_size.

Cheers

Mark


Adonias Malosso wrote:
> Set random_page_cost = 2 solved the problem. thanks
>


Re: 4s query want to run faster

From
"Scott Marlowe"
Date:
On Thu, Feb 21, 2008 at 4:59 PM, Mark Kirkwood <markir@paradise.net.nz> wrote:
> The other parameter you might want to look at is effective_cache_size -
>  increasing it will encourage index use. On a machine with 16GB the
>  default is probably too small (there are various recommendations about
>  how to set this ISTR either Scott M or Greg Smith had a page somewhere
>  that covered this quite well - guys?).
>
>  Obviously, decreasing random_page_cost fixed this query for you, but if
>  find yourself needing to tweak it again for other queries, then look at
>  changing effective_cache_size.

effective_cache_size is pretty easy to set, and it's not real
sensitive to small changes, so guesstimation is fine where it's
concerned.  Basically, let your machine run for a while, then  add the
cache and buffer your unix kernel has altogether (top and free will
tell you these things).  If you're running other apps on the server,
make a SWAG (scientific wild assed guess) how much the other apps are
pounding on the kernel cache / buffer and set effective_cache_size to
how much you think postgresql is using of the total and set it to
that.

If your data set fits into memory, then setting random page cost
closer to 1 makes a lot of sense, and the larger effective cache size.

Re: 4s query want to run faster

From
Dave Cramer
Date:
On 21-Feb-08, at 6:16 PM, Scott Marlowe wrote:

> On Thu, Feb 21, 2008 at 4:59 PM, Mark Kirkwood
> <markir@paradise.net.nz> wrote:
>> The other parameter you might want to look at is
>> effective_cache_size -
>> increasing it will encourage index use. On a machine with 16GB the
>> default is probably too small (there are various recommendations
>> about
>> how to set this ISTR either Scott M or Greg Smith had a page
>> somewhere
>> that covered this quite well - guys?).
>>
The default is always too small in my experience.

What are the rest of the configuration values ?

Dave


Re: 4s query want to run faster

From
Mark Kirkwood
Date:
Scott Marlowe wrote:
>
> effective_cache_size is pretty easy to set, and it's not real
> sensitive to small changes, so guesstimation is fine where it's
> concerned.  Basically, let your machine run for a while, then  add the
> cache and buffer your unix kernel has altogether (top and free will
> tell you these things).  If you're running other apps on the server,
> make a SWAG (scientific wild assed guess) how much the other apps are
> pounding on the kernel cache / buffer and set effective_cache_size to
> how much you think postgresql is using of the total and set it to
> that.
>

FWIW - The buffered|cached may well be called something different if you
are not on Linux (I didn't see any platform mentioned - sorry if I
missed it) - e.g for Freebsd it is "Inactive" that shows what the os is
caching and "Cached" actually means something slightly different... (yep
that's caused a lot of confusion in the past...)

Cheers

Mark