Re: Performance of query - Mailing list pgsql-performance

From Misa Simic
Subject Re: Performance of query
Date
Msg-id CAH3i69mnD9ztA5Thjo2YvneyDnKKm5KCAyA-gAGGd_X44hPZ3Q@mail.gmail.com
Whole thread Raw
In response to Re: Performance of query  (Jeff Janes <jeff.janes@gmail.com>)
Responses Re: Performance of query
Re: Performance of query
List pgsql-performance
Hi Jeff,

It seems my previous mail has not showed up in the list... copied/pasted again belloew

However, you said something important:

"The join to the "state" table is not necessary.  Between the foreign key and the primary key, you know that every state exists, and that every state exists only once.  But, that will not solve your problem, as the join to the state table is not where the time goes."

I think it is something what planner could/should be "aware off"... and discard the join 

" Merge Join  (cost=0.00..2310285.02 rows=60057056 width=3) (actual time=38.424..41992.070 rows=60057057 loops=1)"
"        Merge Cond: (state.state = busbase.state)"

this part from bellow plan  would save significant time if planner didn't decide to take this step at all ....

Kind regards,

Misa




"
Hi Cindy

TBH - I don't know...

I have added this to list so maybe someone else can help...

To recap:

from start situation (table structure and indexes are in the first mail in this thread)

EXPLAIN ANALYZE
SELECT busbase.state AS c0, count(busbase.id) AS m0 FROM busbase INNER JOIN state USING (state)
GROUP BY busbase.state

says:
"HashAggregate  (cost=7416975.58..7416976.09 rows=51 width=7) (actual time=285339.465..285339.473 rows=51 loops=1)"
"  ->  Hash Join  (cost=2.15..7139961.94 rows=55402728 width=7) (actual time=0.066..269527.934 rows=60057057 loops=1)"
"        Hash Cond: (busbase.state = state.state)"
"        ->  Seq Scan on busbase  (cost=0.00..6378172.28 rows=55402728 width=7) (actual time=0.022..251029.307 rows=60057057 loops=1)"
"        ->  Hash  (cost=1.51..1.51 rows=51 width=3) (actual time=0.028..0.028 rows=51 loops=1)"
"              Buckets: 1024  Batches: 1  Memory Usage: 2kB"
"              ->  Seq Scan on state  (cost=0.00..1.51 rows=51 width=3) (actual time=0.003..0.019 rows=51 loops=1)"
"Total runtime: 285339.516 ms"

on created composite index 
CREATE INDEX comp_statidx2
  ON busbase
  USING btree
  (state, id );


we got:

"GroupAggregate  (cost=0.00..2610570.81 rows=51 width=3) (actual time=98.923..51033.888 rows=51 loops=1)"
"  ->  Merge Join  (cost=0.00..2310285.02 rows=60057056 width=3) (actual time=38.424..41992.070 rows=60057057 loops=1)"
"        Merge Cond: (state.state = busbase.state)"
"        ->  Index Only Scan using state_pkey on state  (cost=0.00..13.02 rows=51 width=3) (actual time=0.008..0.148 rows=51 loops=1)"
"              Heap Fetches: 51"
"        ->  Index Only Scan using comp_statidx2 on busbase  (cost=0.00..1559558.68 rows=60057056 width=3) (actual time=38.408..12883.575 rows=60057057 loops=1)"
"              Heap Fetches: 0"
"Total runtime: 51045.648 ms"


Question is - is it possible to improve it more?
"

pgsql-performance by date:

Previous
From: Jeff Janes
Date:
Subject: Re: Performance of query
Next
From: Misa Simic
Date:
Subject: PostgreSQL planner