Thread: 4s query want to run faster
Hi all,
The following query takes about 4s to run in a 16GB ram server. Any ideas why it doesn´t use index for the primary keys in the join conditions?
select i.inuid, count(*) as total
from cte.instrumentounidade i
inner join cte.pontuacao p on p.inuid = i.inuid
inner join cte.acaoindicador ai on ai.ptoid = p.ptoid
inner join cte.subacaoindicador si on si.aciid = ai.aciid
where i.itrid = 2 and p.ptostatus = 'A'
group by i.inuid
having count(*) > 0
HashAggregate (cost=47905.87..47941.01 rows=2008 width=4)
Filter: (count(*) > 0)
-> Hash Join (cost=16307.79..46511.45 rows=185923 width=4)
Hash Cond: (si.aciid = ai.aciid)
-> Seq Scan on subacaoindicador si (cost=0.00..22812.17 rows=368817 width=4)
-> Hash (cost=16211.40..16211.40 rows=38556 width=8)
-> Hash Join (cost=9018.20..16211.40 rows=38556 width=8)
Hash Cond: (p.inuid = i.inuid)
-> Hash Join (cost=8908.41..15419.10 rows=39593 width=8)
Hash Cond: (ai.ptoid = p.ptoid)
-> Seq Scan on acaoindicador ai (cost=0.00..4200.84 rows=76484 width=8)
-> Hash (cost=8678.33..8678.33 rows=92034 width=8)
-> Seq Scan on pontuacao p (cost=0.00..8678.33 rows=92034 width=8)
Filter: (ptostatus = 'A'::bpchar)
-> Hash (cost=104.78..104.78 rows=2008 width=4)
-> Seq Scan on instrumentounidade i (cost=0.00..104.78 rows=2008 width=4)
Filter: (itrid = 2)
The following query takes about 4s to run in a 16GB ram server. Any ideas why it doesn´t use index for the primary keys in the join conditions?
select i.inuid, count(*) as total
from cte.instrumentounidade i
inner join cte.pontuacao p on p.inuid = i.inuid
inner join cte.acaoindicador ai on ai.ptoid = p.ptoid
inner join cte.subacaoindicador si on si.aciid = ai.aciid
where i.itrid = 2 and p.ptostatus = 'A'
group by i.inuid
having count(*) > 0
HashAggregate (cost=47905.87..47941.01 rows=2008 width=4)
Filter: (count(*) > 0)
-> Hash Join (cost=16307.79..46511.45 rows=185923 width=4)
Hash Cond: (si.aciid = ai.aciid)
-> Seq Scan on subacaoindicador si (cost=0.00..22812.17 rows=368817 width=4)
-> Hash (cost=16211.40..16211.40 rows=38556 width=8)
-> Hash Join (cost=9018.20..16211.40 rows=38556 width=8)
Hash Cond: (p.inuid = i.inuid)
-> Hash Join (cost=8908.41..15419.10 rows=39593 width=8)
Hash Cond: (ai.ptoid = p.ptoid)
-> Seq Scan on acaoindicador ai (cost=0.00..4200.84 rows=76484 width=8)
-> Hash (cost=8678.33..8678.33 rows=92034 width=8)
-> Seq Scan on pontuacao p (cost=0.00..8678.33 rows=92034 width=8)
Filter: (ptostatus = 'A'::bpchar)
-> Hash (cost=104.78..104.78 rows=2008 width=4)
-> Seq Scan on instrumentounidade i (cost=0.00..104.78 rows=2008 width=4)
Filter: (itrid = 2)
On Thu, Feb 21, 2008 at 2:48 PM, Adonias Malosso <malosso@gmail.com> wrote: > Hi all, > > The following query takes about 4s to run in a 16GB ram server. Any ideas > why it doesn´t use index for the primary keys in the join conditions? > > select i.inuid, count(*) as total > from cte.instrumentounidade i > inner join cte.pontuacao p on p.inuid = i.inuid > inner join cte.acaoindicador ai on ai.ptoid = p.ptoid > inner join cte.subacaoindicador si on si.aciid = ai.aciid > where i.itrid = 2 and p.ptostatus = 'A' > group by i.inuid > having count(*) > 0 What does explain analyze say about that query?
HashAggregate (cost=47818.40..47853.12 rows=1984 width=4) (actual time=5738.879..5743.390 rows=1715 loops=1)
Filter: (count(*) > 0)
-> Hash Join (cost=16255.99..46439.06 rows=183912 width=4) (actual time=1887.974..5154.207 rows=241693 loops=1)
Hash Cond: (si.aciid = ai.aciid)
-> Seq Scan on subacaoindicador si (cost=0.00..22811.98 rows=368798 width=4) (actual time=0.108..1551.816 rows=368798 loops=1)
-> Hash (cost=16160.64..16160.64 rows=38141 width=8) (actual time=1887.790..1887.790 rows=52236 loops=1)
-> Hash Join (cost=9015.31..16160.64 rows=38141 width=8) (actual time=980.058..1773.530 rows=52236 loops=1)
Hash Cond: (p.inuid = i.inuid)
-> Hash Join (cost=8905.89..15376.11 rows=39160 width=8) (actual time=967.116..1568.028 rows=54225 loops=1)
Hash Cond: (ai.ptoid = p.ptoid)
-> Seq Scan on acaoindicador ai (cost=0.00..4200.84 rows=76484 width=8) (actual time=0.080..259.412 rows=76484 loops=1)
-> Hash (cost=8678.33..8678.33 rows=91026 width=8) (actual time=966.841..966.841 rows=92405 loops=1)
-> Seq Scan on pontuacao p (cost=0.00..8678.33 rows=91026 width=8) (actual time=0.087..746.528 rows=92405 loops=1)
Filter: (ptostatus = 'A'::bpchar)
-> Hash (cost=104.46..104.46 rows=1984 width=4) (actual time=12.913..12.913 rows=1983 loops=1)
-> Seq Scan on instrumentounidade i (cost=0.00..104.46 rows=1984 width=4) (actual time=0.091..8.879 rows=1983 loops=1)
Filter: (itrid = 2)
Total runtime: 5746.415 ms
Filter: (count(*) > 0)
-> Hash Join (cost=16255.99..46439.06 rows=183912 width=4) (actual time=1887.974..5154.207 rows=241693 loops=1)
Hash Cond: (si.aciid = ai.aciid)
-> Seq Scan on subacaoindicador si (cost=0.00..22811.98 rows=368798 width=4) (actual time=0.108..1551.816 rows=368798 loops=1)
-> Hash (cost=16160.64..16160.64 rows=38141 width=8) (actual time=1887.790..1887.790 rows=52236 loops=1)
-> Hash Join (cost=9015.31..16160.64 rows=38141 width=8) (actual time=980.058..1773.530 rows=52236 loops=1)
Hash Cond: (p.inuid = i.inuid)
-> Hash Join (cost=8905.89..15376.11 rows=39160 width=8) (actual time=967.116..1568.028 rows=54225 loops=1)
Hash Cond: (ai.ptoid = p.ptoid)
-> Seq Scan on acaoindicador ai (cost=0.00..4200.84 rows=76484 width=8) (actual time=0.080..259.412 rows=76484 loops=1)
-> Hash (cost=8678.33..8678.33 rows=91026 width=8) (actual time=966.841..966.841 rows=92405 loops=1)
-> Seq Scan on pontuacao p (cost=0.00..8678.33 rows=91026 width=8) (actual time=0.087..746.528 rows=92405 loops=1)
Filter: (ptostatus = 'A'::bpchar)
-> Hash (cost=104.46..104.46 rows=1984 width=4) (actual time=12.913..12.913 rows=1983 loops=1)
-> Seq Scan on instrumentounidade i (cost=0.00..104.46 rows=1984 width=4) (actual time=0.091..8.879 rows=1983 loops=1)
Filter: (itrid = 2)
Total runtime: 5746.415 ms
On Thu, Feb 21, 2008 at 5:58 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
On Thu, Feb 21, 2008 at 2:48 PM, Adonias Malosso <malosso@gmail.com> wrote:What does explain analyze say about that query?
> Hi all,
>
> The following query takes about 4s to run in a 16GB ram server. Any ideas
> why it doesn´t use index for the primary keys in the join conditions?
>
> select i.inuid, count(*) as total
> from cte.instrumentounidade i
> inner join cte.pontuacao p on p.inuid = i.inuid
> inner join cte.acaoindicador ai on ai.ptoid = p.ptoid
> inner join cte.subacaoindicador si on si.aciid = ai.aciid
> where i.itrid = 2 and p.ptostatus = 'A'
> group by i.inuid
> having count(*) > 0
> The following query takes about 4s to run in a 16GB ram server. Any ideas > why it doesn´t use index for the primary keys in the join conditions? Maby random_page_cost is set too high? What version are you using? -- regards Claus When lenity and cruelty play for a kingdom, the gentlest gamester is the soonest winner. Shakespeare
On Thu, Feb 21, 2008 at 6:10 PM, Claus Guttesen <kometen@gmail.com> wrote:
Postgresql v. 8.2.1
> The following query takes about 4s to run in a 16GB ram server. Any ideasMaby random_page_cost is set too high? What version are you using?
> why it doesn´t use index for the primary keys in the join conditions?
Postgresql v. 8.2.1
--
regards
Claus
When lenity and cruelty play for a kingdom,
the gentlest gamester is the soonest winner.
Shakespeare
> > > why it doesn´t use index for the primary keys in the join conditions? > > > > Maby random_page_cost is set too high? What version are you using? > > Postgresql v. 8.2.1 You can try to lower this value. The default (in 8.3) is 4. -- regards Claus When lenity and cruelty play for a kingdom, the gentlest gamester is the soonest winner. Shakespeare
Set random_page_cost = 2 solved the problem. thanks
On Thu, Feb 21, 2008 at 6:16 PM, Claus Guttesen <kometen@gmail.com> wrote:
> > > why it doesn´t use index for the primary keys in the join conditions?You can try to lower this value. The default (in 8.3) is 4.
> >
> > Maby random_page_cost is set too high? What version are you using?
>
> Postgresql v. 8.2.1
--regards
Claus
When lenity and cruelty play for a kingdom,
the gentlest gamester is the soonest winner.
Shakespeare
Well, all the row counts in expected and actual are pretty close. I'm guessing it's as optimized as it's likely to get. you could try mucking about with random_page_cost to force index usage, but indexes are not always a win in pgsql, hence the seq scans etc... If the number of rows returned represents a large percentage of the total number of rows in the table, then a seq scan is generally a win. Note that most all the time being spent in this query is on the Hash Join, not on the seq scans. Also, you should really update to 8.2.6 the latest 8.2 version. Check the release notes for the bugs that were fixed between 8.2.1 and 8.2.6
The other parameter you might want to look at is effective_cache_size - increasing it will encourage index use. On a machine with 16GB the default is probably too small (there are various recommendations about how to set this ISTR either Scott M or Greg Smith had a page somewhere that covered this quite well - guys?). Obviously, decreasing random_page_cost fixed this query for you, but if find yourself needing to tweak it again for other queries, then look at changing effective_cache_size. Cheers Mark Adonias Malosso wrote: > Set random_page_cost = 2 solved the problem. thanks >
On Thu, Feb 21, 2008 at 4:59 PM, Mark Kirkwood <markir@paradise.net.nz> wrote: > The other parameter you might want to look at is effective_cache_size - > increasing it will encourage index use. On a machine with 16GB the > default is probably too small (there are various recommendations about > how to set this ISTR either Scott M or Greg Smith had a page somewhere > that covered this quite well - guys?). > > Obviously, decreasing random_page_cost fixed this query for you, but if > find yourself needing to tweak it again for other queries, then look at > changing effective_cache_size. effective_cache_size is pretty easy to set, and it's not real sensitive to small changes, so guesstimation is fine where it's concerned. Basically, let your machine run for a while, then add the cache and buffer your unix kernel has altogether (top and free will tell you these things). If you're running other apps on the server, make a SWAG (scientific wild assed guess) how much the other apps are pounding on the kernel cache / buffer and set effective_cache_size to how much you think postgresql is using of the total and set it to that. If your data set fits into memory, then setting random page cost closer to 1 makes a lot of sense, and the larger effective cache size.
On 21-Feb-08, at 6:16 PM, Scott Marlowe wrote: > On Thu, Feb 21, 2008 at 4:59 PM, Mark Kirkwood > <markir@paradise.net.nz> wrote: >> The other parameter you might want to look at is >> effective_cache_size - >> increasing it will encourage index use. On a machine with 16GB the >> default is probably too small (there are various recommendations >> about >> how to set this ISTR either Scott M or Greg Smith had a page >> somewhere >> that covered this quite well - guys?). >> The default is always too small in my experience. What are the rest of the configuration values ? Dave
Scott Marlowe wrote: > > effective_cache_size is pretty easy to set, and it's not real > sensitive to small changes, so guesstimation is fine where it's > concerned. Basically, let your machine run for a while, then add the > cache and buffer your unix kernel has altogether (top and free will > tell you these things). If you're running other apps on the server, > make a SWAG (scientific wild assed guess) how much the other apps are > pounding on the kernel cache / buffer and set effective_cache_size to > how much you think postgresql is using of the total and set it to > that. > FWIW - The buffered|cached may well be called something different if you are not on Linux (I didn't see any platform mentioned - sorry if I missed it) - e.g for Freebsd it is "Inactive" that shows what the os is caching and "Cached" actually means something slightly different... (yep that's caused a lot of confusion in the past...) Cheers Mark