Re: Seq scan vs. Index scan with different query - Mailing list pgsql-performance

From Andrew McMillan
Subject Re: Seq scan vs. Index scan with different query
Date
Msg-id 1089027854.6664.197.camel@lamb.mcmillan.net.nz
Whole thread Raw
In response to Seq scan vs. Index scan with different query conditions  (eleven@ludojad.itpp.pl)
Responses Re: Seq scan vs. Index scan with different query conditions  (eleven@ludojad.itpp.pl)
List pgsql-performance
On Mon, 2004-07-05 at 12:15 +0200, eleven@ludojad.itpp.pl wrote:
> Hello,
>
> Can anybody suggest any hint on this:
>
> temp=> EXPLAIN SELECT DISTINCT "number" FROM "tablex" WHERE "Date" BETWEEN '2004-06-28'::date AND '2004-07-04'::date
AND"Time" BETWEEN '00:00:00'::time AND '18:01:00'::time; 
>
> Unique  (cost=305669.92..306119.43 rows=89 width=8)
>     ->  Sort  (cost=305669.92..305894.67 rows=89903 width=8)
>         Sort Key: "number"
>             ->  Index Scan using "DateTimeIndex" on "tablex"  (cost=0.00..298272.66 rows=89903 width=8)
>                 Index Cond: (("Date" >= '2004-06-28'::date) AND ("Date" <= '2004-07-04'::date) AND ("Time" >=
'00:00:00'::timewithout time zone) AND ("Time" <= '18:01:00'::time without time zone)) 
>
>
> temp=> EXPLAIN SELECT DISTINCT "number" FROM "tablex" WHERE "Date" BETWEEN '2004-06-28'::date AND '2004-07-04'::date
AND"Time" BETWEEN '00:00:00'::time AND '19:01:00'::time; 
>
> Unique  (cost=315252.77..315742.27 rows=97 width=8)
>     ->  Sort  (cost=315252.77..315497.52 rows=97900 width=8)
>         Sort Key: "number"
>             ->  Seq Scan on "tablex"  (cost=0.00..307137.34 rows=97900 width=8)
>             Filter: (("Date" >= '2004-06-28'::date) AND ("Date" <= '2004-07-04'::date) AND ("Time" >=
'00:00:00'::timewithout time zone) AND ("Time" <= '19:01:00'::time without time zone)) 
>
> Basically, the difference is in upper "Time" value (as you can see, it's
> 18:01:00 in the first query and 19:01:00 in the other one).
> The question is - why does it use index in first case and
> it tries to do full sequential scan when the upper "Time" value
> is different?
>
> DateTimeIndex was created on both columns (Date/Time):
> CREATE INDEX "DateTimeIndex" ON "tablex" USING btree ("Date", "Time");

PostgreSQL is always going to switch at some point, where the number of
rows that have to be read from the table exceed some percentage of the
total rows in the table.

We can possibly be more helpful if you send EXPLAIN ANALYZE, rather than
just EXPLAIN.

A few things to be careful of:

- Is this supposed to be a slice of midnight to 6pm, for each day
between 28 June and 4 July?  If you want a continuous period from
Midnight 28 June -> 6pm 4 July you're better to have a single timestamp
field.

- It is unlikely that the , "Time" on your index is adding much to your
selectivity, and it may be that you would be better off without it.

- the DISTINCT can screw up your results, and it usually means that the
SQL is not really the best it could be.  A _real_ need for DISTINCT is
quite rare in my experience, and from what I have seen it adds overhead
and tends to encourage bad query plans when used unnecessarily.

Hope this is some help.

Regards,
                    Andrew McMillan

-------------------------------------------------------------------------
Andrew @ Catalyst .Net .NZ  Ltd,  PO Box 11-053, Manners St,  Wellington
WEB: http://catalyst.net.nz/            PHYS: Level 2, 150-154 Willis St
DDI: +64(4)803-2201      MOB: +64(272)DEBIAN      OFFICE: +64(4)499-2267
      Make things as simple as possible, but no simpler -- Einstein
-------------------------------------------------------------------------

Attachment

pgsql-performance by date:

Previous
From: Richard Huxton
Date:
Subject: Re: Seq scan vs. Index scan with different query conditions
Next
From: eleven@ludojad.itpp.pl
Date:
Subject: Re: Seq scan vs. Index scan with different query conditions