Re: index vs seqscan question - Mailing list pgsql-general

From Frank Bax
Subject Re: index vs seqscan question
Date
Msg-id 5.1.1.6.0.20021122163324.03acdc20@pop6.sympatico.ca
Whole thread Raw
In response to index vs seqscan question  (Frank Bax <fbax@sympatico.ca>)
Responses Re: index vs seqscan question  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
OK, I should have finished testing my changes before posting - the
new/faster query should have brackets around (typ=' ' OR typ='A'), but its
still fast as lightning!  I don't think it made a big difference to explain
results, but it appears seqscan is cheaper than it was before?

Sort  (cost=17171.83..17171.83 rows=488 width=56)
   ->  Merge Join  (cost=17023.63..17150.05 rows=488 width=56)
         ->  Sort  (cost=17002.73..17002.73 rows=10075 width=28)
               ->  Subquery Scan ts  (cost=15325.25..16332.79 rows=10075
width=28)
                     ->  Aggregate  (cost=15325.25..16332.79 rows=10075
width=28)
                           ->  Group  (cost=15325.25..15577.14 rows=100754
width=28)
                                 ->  Sort  (cost=15325.25..15325.25
rows=100754 width=28)
                                       ->  Seq Scan on
timesheet  (cost=0.00..5410.22 rows=100754 width=28)
         ->  Sort  (cost=20.89..20.89 rows=38 width=36)
               ->  Seq Scan on employee  (cost=0.00..19.89 rows=38 width=36)



At 04:15 PM 11/22/02, Frank Bax wrote:
>I was looking and one of my SQL statements today and decided to try a
>slight variation to my query - and what a difference it made!!  The query
>went from over 10 minutes down to under 30 seconds.  I curious though; if
>I read the explain output correctly...
>->  Index Scan using timesheet_index_emp on
>timesheet  (cost=0.00..19056.43 rows=122207 width=40)
>->  Seq Scan on timesheet  (cost=0.00..7244.02 rows=122207 width=28)
>
>These statements imply the planner knows a seqscan is quicker than an
>index scan (only 3 times faster by its estimate, actually much more), why
>does it still use an index scan?
>
>Additional information:
>- pgsql version 7.1
>- vacuum analyse is run nightly.
>- the "earncode in..." clause by itself will select 85% of rows.
>- earncode = ' ' selects 80% of rows in table.
>- typ ' ' or 'A' selects 99% of rows in table.
>- final result is just approx 100 rows.
>
>Frank
>
>
>The original (slow) version:
>gwacl=> explain SELECT * FROM (SELECT emp, SUM (CASE WHEN earncode IN ('
>','A','O','P','Q','X','Z') THEN
>date_part('epoch',age(hi_shift,lo_shift))/(60*60) ELSE 0 END) AS hours
>FROM timesheet WHERE typ=' ' OR typ='A' GROUP BY emp) AS ts INNER JOIN
>(SELECT emp, first, last FROM employee WHERE status='A') AS emp ON (ts.emp
>= emp.emp) WHERE hours BETWEEN 0 AND 1250 ORDER BY hours DESC;
>NOTICE:  QUERY PLAN:
>
>Sort  (cost=21441.77..21441.77 rows=592 width=56)
>   ->  Merge Join  (cost=21261.28..21414.52 rows=592 width=56)
>         ->  Sort  (cost=21300.99..21300.99 rows=12221 width=40)
>               ->  Subquery Scan ts  (cost=0.00..20278.50 rows=12221 width=40)
>                     ->  Aggregate  (cost=0.00..20278.50 rows=12221 width=40)
>                           ->  Group  (cost=0.00..19361.94 rows=122207
> width=40)
>                                 ->  Index Scan using timesheet_index_emp
> on timesheet  (cost=0.00..19056.43 rows=122207 width=40)
>         ->  Sort  (cost=20.89..20.89 rows=38 width=36)
>               ->  Seq Scan on employee  (cost=0.00..19.89 rows=38 width=36)
>EXPLAIN
>
>The revised (much improved) version:
>gwacl=> explain SELECT * FROM (SELECT emp,
>SUM(date_part('epoch',age(hi_shift,lo_shift))/(60*60)) as hours FROM
>timesheet WHERE typ=' ' OR typ='A' AND earncode IN ('
>','A','O','P','Q','X','Z') GROUP BY emp) AS ts INNER JOIN (SELECT emp,
>first, last FROM employee WHERE status='A') AS emp ON (ts.emp = emp.emp)
>WHERE hours BETWEEN 0 AND 1250 ORDER BY hours DESC;
>NOTICE:  QUERY PLAN:
>
>Sort  (cost=23993.79..23993.79 rows=592 width=56)
>   ->  Merge Join  (cost=23813.31..23966.55 rows=592 width=56)
>         ->  Sort  (cost=23816.60..23816.60 rows=12221 width=28)
>               ->  Subquery Scan ts  (cost=21608.46..22830.53 rows=12221
> width=28)
>                     ->  Aggregate  (cost=21608.46..22830.53 rows=12221
> width=28)
>                           ->  Group  (cost=21608.46..21913.97 rows=122207
> width=28)
>                                 ->  Sort  (cost=21608.46..21608.46
> rows=122207 width=28)
>                                      ->  Seq Scan on
> timesheet  (cost=0.00..7244.02 rows=122207 width=28)
>         ->  Sort  (cost=20.89..20.89 rows=38 width=36)
>               ->  Seq Scan on employee  (cost=0.00..19.89 rows=38 width=36)
>EXPLAIN
>
>---------------------------(end of broadcast)---------------------------
>TIP 6: Have you searched our list archives?
>
>http://archives.postgresql.org

pgsql-general by date:

Previous
From: Bruno Wolff III
Date:
Subject: Re: UNION and array types
Next
From: Tom Lane
Date:
Subject: Re: index vs seqscan question