Re: index vs seqscan question - Mailing list pgsql-general
From | Frank Bax |
---|---|
Subject | Re: index vs seqscan question |
Date | |
Msg-id | 5.1.1.6.0.20021122163324.03acdc20@pop6.sympatico.ca Whole thread Raw |
In response to | index vs seqscan question (Frank Bax <fbax@sympatico.ca>) |
Responses |
Re: index vs seqscan question
(Tom Lane <tgl@sss.pgh.pa.us>)
|
List | pgsql-general |
OK, I should have finished testing my changes before posting - the new/faster query should have brackets around (typ=' ' OR typ='A'), but its still fast as lightning! I don't think it made a big difference to explain results, but it appears seqscan is cheaper than it was before? Sort (cost=17171.83..17171.83 rows=488 width=56) -> Merge Join (cost=17023.63..17150.05 rows=488 width=56) -> Sort (cost=17002.73..17002.73 rows=10075 width=28) -> Subquery Scan ts (cost=15325.25..16332.79 rows=10075 width=28) -> Aggregate (cost=15325.25..16332.79 rows=10075 width=28) -> Group (cost=15325.25..15577.14 rows=100754 width=28) -> Sort (cost=15325.25..15325.25 rows=100754 width=28) -> Seq Scan on timesheet (cost=0.00..5410.22 rows=100754 width=28) -> Sort (cost=20.89..20.89 rows=38 width=36) -> Seq Scan on employee (cost=0.00..19.89 rows=38 width=36) At 04:15 PM 11/22/02, Frank Bax wrote: >I was looking and one of my SQL statements today and decided to try a >slight variation to my query - and what a difference it made!! The query >went from over 10 minutes down to under 30 seconds. I curious though; if >I read the explain output correctly... >-> Index Scan using timesheet_index_emp on >timesheet (cost=0.00..19056.43 rows=122207 width=40) >-> Seq Scan on timesheet (cost=0.00..7244.02 rows=122207 width=28) > >These statements imply the planner knows a seqscan is quicker than an >index scan (only 3 times faster by its estimate, actually much more), why >does it still use an index scan? > >Additional information: >- pgsql version 7.1 >- vacuum analyse is run nightly. >- the "earncode in..." clause by itself will select 85% of rows. >- earncode = ' ' selects 80% of rows in table. >- typ ' ' or 'A' selects 99% of rows in table. >- final result is just approx 100 rows. > >Frank > > >The original (slow) version: >gwacl=> explain SELECT * FROM (SELECT emp, SUM (CASE WHEN earncode IN (' >','A','O','P','Q','X','Z') THEN >date_part('epoch',age(hi_shift,lo_shift))/(60*60) ELSE 0 END) AS hours >FROM timesheet WHERE typ=' ' OR typ='A' GROUP BY emp) AS ts INNER JOIN >(SELECT emp, first, last FROM employee WHERE status='A') AS emp ON (ts.emp >= emp.emp) WHERE hours BETWEEN 0 AND 1250 ORDER BY hours DESC; >NOTICE: QUERY PLAN: > >Sort (cost=21441.77..21441.77 rows=592 width=56) > -> Merge Join (cost=21261.28..21414.52 rows=592 width=56) > -> Sort (cost=21300.99..21300.99 rows=12221 width=40) > -> Subquery Scan ts (cost=0.00..20278.50 rows=12221 width=40) > -> Aggregate (cost=0.00..20278.50 rows=12221 width=40) > -> Group (cost=0.00..19361.94 rows=122207 > width=40) > -> Index Scan using timesheet_index_emp > on timesheet (cost=0.00..19056.43 rows=122207 width=40) > -> Sort (cost=20.89..20.89 rows=38 width=36) > -> Seq Scan on employee (cost=0.00..19.89 rows=38 width=36) >EXPLAIN > >The revised (much improved) version: >gwacl=> explain SELECT * FROM (SELECT emp, >SUM(date_part('epoch',age(hi_shift,lo_shift))/(60*60)) as hours FROM >timesheet WHERE typ=' ' OR typ='A' AND earncode IN (' >','A','O','P','Q','X','Z') GROUP BY emp) AS ts INNER JOIN (SELECT emp, >first, last FROM employee WHERE status='A') AS emp ON (ts.emp = emp.emp) >WHERE hours BETWEEN 0 AND 1250 ORDER BY hours DESC; >NOTICE: QUERY PLAN: > >Sort (cost=23993.79..23993.79 rows=592 width=56) > -> Merge Join (cost=23813.31..23966.55 rows=592 width=56) > -> Sort (cost=23816.60..23816.60 rows=12221 width=28) > -> Subquery Scan ts (cost=21608.46..22830.53 rows=12221 > width=28) > -> Aggregate (cost=21608.46..22830.53 rows=12221 > width=28) > -> Group (cost=21608.46..21913.97 rows=122207 > width=28) > -> Sort (cost=21608.46..21608.46 > rows=122207 width=28) > -> Seq Scan on > timesheet (cost=0.00..7244.02 rows=122207 width=28) > -> Sort (cost=20.89..20.89 rows=38 width=36) > -> Seq Scan on employee (cost=0.00..19.89 rows=38 width=36) >EXPLAIN > >---------------------------(end of broadcast)--------------------------- >TIP 6: Have you searched our list archives? > >http://archives.postgresql.org
pgsql-general by date: