Thread: Optimizing >= and <= for numbers and dates

Optimizing >= and <= for numbers and dates

From
"Dimitri Nagiev"
Date:
Hi all,

I haven't found any official documentation about the postgres sql optimizer
on the web, so please forgive me if there is such a document and point me to
the right direction.

I've got the following problem: I cannot make the postgres SQL Optimizer use
an index on a date field to filter out a date range, e.g.

select * from mytable where mydate >= '2003-10-01';

 Seq Scan on mytable  (cost=0.00..2138.11 rows=12203 width=543)
   Filter: (mydate >= '2003-09-01'::date)


the index is created as follows:

create index query on mytable(mydate);

Testing for equality gives me the index optimization:

select * from mytable where mydate = '2003-10-01';

Index Scan using query on mytable  (cost=0.00..54.93 rows=44 width=543)
   Index Cond: (mydate = '2003-09-01'::date)


I have run vacuum analyze on the table. Also the table contains 25.000
records, so the index should be used in my opinion. Am I missing something ?
The
same seems to apply to integers.

Thank you very much in advance
Dimi

PS The postgres version is as follows:

 PostgreSQL 7.3.2 on i386-redhat-linux-gnu, compiled by GCC
i386-redhat-linux-gcc (GCC) 3.2.2 20030213 (Red Hat Linux 8.0 3.2.2-1)



--
NEU FÜR ALLE - GMX MediaCenter - für Fotos, Musik, Dateien...
Fotoalbum, File Sharing, MMS, Multimedia-Gruß, GMX FotoService

Jetzt kostenlos anmelden unter http://www.gmx.net

+++ GMX - die erste Adresse für Mail, Message, More! +++


Re: Optimizing >= and <= for numbers and dates

From
Rod Taylor
Date:
On Wed, 2003-10-01 at 13:30, Dimitri Nagiev wrote:
> Hi all,
>
> I haven't found any official documentation about the postgres sql optimizer
> on the web, so please forgive me if there is such a document and point me to
> the right direction.
>
> I've got the following problem: I cannot make the postgres SQL Optimizer use
> an index on a date field to filter out a date range, e.g.
>
> select * from mytable where mydate >= '2003-10-01';
>
>  Seq Scan on mytable  (cost=0.00..2138.11 rows=12203 width=543)
>    Filter: (mydate >= '2003-09-01'::date)

EXPLAIN ANALYZE output please.

Attachment

Re: Optimizing >= and <= for numbers and dates

From
"Dimitri Nagiev"
Date:
here goes the EXPLAIN ANALYZE output:


template1=# VACUUM analyze mytable;
VACUUM
template1=# explain analyze select * from mytable where
mydate>='2003-09-01';
                                                  QUERY PLAN


---------------------------------------------------------------------------------------------------------------
 Seq Scan on mytable  (cost=0.00..2209.11 rows=22274 width=562) (actual
time=0.06..267.30 rows=22677 loops=1)
   Filter: (mydate >= '2003-09-01'::date)
 Total runtime: 307.71 msec
(3 rows)


template1=# explain analyze select * from mytable where mydate='2003-09-01';
                                                          QUERY PLAN



------------------------------------------------------------------------------------------------------------------------------
 Index Scan using mytable_query on mytable  (cost=0.00..148.56 rows=43
width=562) (actual time=41.22..41.27 rows=4 loops=1)
   Index Cond: (mydate = '2003-09-01'::date)
 Total runtime: 41.34 msec
(3 rows)



> On Wed, 2003-10-01 at 13:30, Dimitri Nagiev wrote:
> > Hi all,
> >
> > I haven't found any official documentation about the postgres sql
> optimiz
> er
> > on the web, so please forgive me if there is such a document and point
> me
>  to
> > the right direction.
> >
> > I've got the following problem: I cannot make the postgres SQL Optimizer
> use
> > an index on a date field to filter out a date range, e.g.
> >
> > select * from mytable where mydate >= '2003-10-01';
> >
> >  Seq Scan on mytable  (cost=0.00..2138.11 rows=12203 width=543)
> >    Filter: (mydate >= '2003-09-01'::date)
>
> EXPLAIN ANALYZE output please.
>

--
NEU FÜR ALLE - GMX MediaCenter - für Fotos, Musik, Dateien...
Fotoalbum, File Sharing, MMS, Multimedia-Gruß, GMX FotoService

Jetzt kostenlos anmelden unter http://www.gmx.net

+++ GMX - die erste Adresse für Mail, Message, More! +++


Re: Optimizing >= and <= for numbers and dates

From
Neil Conway
Date:
On Wed, 2003-10-01 at 13:45, Dimitri Nagiev wrote:
> template1=# explain analyze select * from mytable where
> mydate>='2003-09-01';
>                                                   QUERY PLAN
>
>
> ---------------------------------------------------------------------------------------------------------------
>  Seq Scan on mytable  (cost=0.00..2209.11 rows=22274 width=562) (actual
> time=0.06..267.30 rows=22677 loops=1)
>    Filter: (mydate >= '2003-09-01'::date)
>  Total runtime: 307.71 msec
> (3 rows)

It may well be the case that a seqscan is faster than an index scan for
this query. Try disabling sequential scans (SET enable_seqscan = false)
and re-running EXPLAIN ANALYZE: see if the total runtime is smaller or
larger.

-Neil



Re: Optimizing >= and <= for numbers and dates

From
Manfred Koizar
Date:
On Wed, 1 Oct 2003 19:45:29 +0200 (MEST), "Dimitri Nagiev"
<dnagiev@gmx.de> wrote:
>template1=# explain analyze select * from mytable where
>mydate>='2003-09-01';
> Seq Scan on mytable  (cost=0.00..2209.11 rows=22274 width=562) (actual time=0.06..267.30 rows=22677 loops=1)
>   Filter: (mydate >= '2003-09-01'::date)
> Total runtime: 307.71 msec

Didn't you say that there are 25000 rows in the table?  I can't
believe that for selecting 90% of all rows an index scan would be
faster.  Try

    SET enable_seqscan = 0;
    explain analyze
     select * from mytable where mydate>='2003-09-01';

If you find the index scan to be faster, there might be lots of dead
tuples in which case you should

    VACUUM FULL mytable;

Servus
 Manfred

Re: Optimizing >= and <= for numbers and dates

From
"scott.marlowe"
Date:
On Wed, 1 Oct 2003, Dimitri Nagiev wrote:

> here goes the EXPLAIN ANALYZE output:
>
>
> template1=# VACUUM analyze mytable;
> VACUUM
> template1=# explain analyze select * from mytable where
> mydate>='2003-09-01';
>                                                   QUERY PLAN
>
>
> ---------------------------------------------------------------------------------------------------------------
>  Seq Scan on mytable  (cost=0.00..2209.11 rows=22274 width=562) (actual
> time=0.06..267.30 rows=22677 loops=1)
>    Filter: (mydate >= '2003-09-01'::date)
>  Total runtime: 307.71 msec
> (3 rows)

How many rows are there in this table?  If the number is only two or three
times as many as the number of rows returned (22677) then a seq scan is
preferable.

The way to tune your random_page_cost is to keep making your range more
selective until you get an index scan.  Then, see what the difference is
in speed between the two queries that sit on either side of that number,
i.e. if a query that returns 1000 rows switches to index scan, and takes
100 msec, while one that returns 1050 uses seq scan and takes 200 msec,
then you might want to lower your random page cost.

Ideally, what should happen is that as the query returns more and more
rows, the switch to seq scan should happen so that it's taking about the
same amount of time as the index scan, maybe just a little more.


Re: Optimizing >= and <= for numbers and dates

From
"scott.marlowe"
Date:
Oh, to followup on my previously sent post, make sure you've got
effective_cache_size set right BEFORE you go trying to set
random_page_cost, and you might wanna run a select * from table to load
the table into kernel buffer cache before testing, then also test it with
the cache cleared out (select * from a_different_really_huge_table will
usually do that.)