Re: Performance of query - Mailing list pgsql-performance
From | Misa Simic |
---|---|
Subject | Re: Performance of query |
Date | |
Msg-id | CAH3i69mnD9ztA5Thjo2YvneyDnKKm5KCAyA-gAGGd_X44hPZ3Q@mail.gmail.com Whole thread Raw |
In response to | Re: Performance of query (Jeff Janes <jeff.janes@gmail.com>) |
Responses |
Re: Performance of query
Re: Performance of query |
List | pgsql-performance |
Hi Jeff,
It seems my previous mail has not showed up in the list... copied/pasted again belloew
However, you said something important:
"The join to the "state" table is not necessary. Between the foreign key and the primary key, you know that every state exists, and that every state exists only once. But, that will not solve your problem, as the join to the state table is not where the time goes."
I think it is something what planner could/should be "aware off"... and discard the join
" Merge Join (cost=0.00..2310285.02 rows=60057056 width=3) (actual time=38.424..41992.070 rows=60057057 loops=1)"
" Merge Cond: (state.state = busbase.state)"
this part from bellow plan would save significant time if planner didn't decide to take this step at all ....
Kind regards,
Misa
"
Hi Cindy
TBH - I don't know...
I have added this to list so maybe someone else can help...
To recap:
from start situation (table structure and indexes are in the first mail in this thread)
EXPLAIN ANALYZE
SELECT busbase.state AS c0, count(busbase.id) AS m0 FROM busbase INNER JOIN state USING (state)
GROUP BY busbase.state
says:
"HashAggregate (cost=7416975.58..7416976.09 rows=51 width=7) (actual time=285339.465..285339.473 rows=51 loops=1)"
" -> Hash Join (cost=2.15..7139961.94 rows=55402728 width=7) (actual time=0.066..269527.934 rows=60057057 loops=1)"
" Hash Cond: (busbase.state = state.state)"
" -> Seq Scan on busbase (cost=0.00..6378172.28 rows=55402728 width=7) (actual time=0.022..251029.307 rows=60057057 loops=1)"
" -> Hash (cost=1.51..1.51 rows=51 width=3) (actual time=0.028..0.028 rows=51 loops=1)"
" Buckets: 1024 Batches: 1 Memory Usage: 2kB"
" -> Seq Scan on state (cost=0.00..1.51 rows=51 width=3) (actual time=0.003..0.019 rows=51 loops=1)"
"Total runtime: 285339.516 ms"
on created composite index
CREATE INDEX comp_statidx2
ON busbase
USING btree
(state, id );
we got:
"GroupAggregate (cost=0.00..2610570.81 rows=51 width=3) (actual time=98.923..51033.888 rows=51 loops=1)"
" -> Merge Join (cost=0.00..2310285.02 rows=60057056 width=3) (actual time=38.424..41992.070 rows=60057057 loops=1)"
" Merge Cond: (state.state = busbase.state)"
" -> Index Only Scan using state_pkey on state (cost=0.00..13.02 rows=51 width=3) (actual time=0.008..0.148 rows=51 loops=1)"
" Heap Fetches: 51"
" -> Index Only Scan using comp_statidx2 on busbase (cost=0.00..1559558.68 rows=60057056 width=3) (actual time=38.408..12883.575 rows=60057057 loops=1)"
" Heap Fetches: 0"
"Total runtime: 51045.648 ms"
Question is - is it possible to improve it more?
"
pgsql-performance by date: