Thread: Optimizing >= and <= for numbers and dates
Hi all, I haven't found any official documentation about the postgres sql optimizer on the web, so please forgive me if there is such a document and point me to the right direction. I've got the following problem: I cannot make the postgres SQL Optimizer use an index on a date field to filter out a date range, e.g. select * from mytable where mydate >= '2003-10-01'; Seq Scan on mytable (cost=0.00..2138.11 rows=12203 width=543) Filter: (mydate >= '2003-09-01'::date) the index is created as follows: create index query on mytable(mydate); Testing for equality gives me the index optimization: select * from mytable where mydate = '2003-10-01'; Index Scan using query on mytable (cost=0.00..54.93 rows=44 width=543) Index Cond: (mydate = '2003-09-01'::date) I have run vacuum analyze on the table. Also the table contains 25.000 records, so the index should be used in my opinion. Am I missing something ? The same seems to apply to integers. Thank you very much in advance Dimi PS The postgres version is as follows: PostgreSQL 7.3.2 on i386-redhat-linux-gnu, compiled by GCC i386-redhat-linux-gcc (GCC) 3.2.2 20030213 (Red Hat Linux 8.0 3.2.2-1) -- NEU FÜR ALLE - GMX MediaCenter - für Fotos, Musik, Dateien... Fotoalbum, File Sharing, MMS, Multimedia-Gruß, GMX FotoService Jetzt kostenlos anmelden unter http://www.gmx.net +++ GMX - die erste Adresse für Mail, Message, More! +++
On Wed, 2003-10-01 at 13:30, Dimitri Nagiev wrote: > Hi all, > > I haven't found any official documentation about the postgres sql optimizer > on the web, so please forgive me if there is such a document and point me to > the right direction. > > I've got the following problem: I cannot make the postgres SQL Optimizer use > an index on a date field to filter out a date range, e.g. > > select * from mytable where mydate >= '2003-10-01'; > > Seq Scan on mytable (cost=0.00..2138.11 rows=12203 width=543) > Filter: (mydate >= '2003-09-01'::date) EXPLAIN ANALYZE output please.
Attachment
here goes the EXPLAIN ANALYZE output: template1=# VACUUM analyze mytable; VACUUM template1=# explain analyze select * from mytable where mydate>='2003-09-01'; QUERY PLAN --------------------------------------------------------------------------------------------------------------- Seq Scan on mytable (cost=0.00..2209.11 rows=22274 width=562) (actual time=0.06..267.30 rows=22677 loops=1) Filter: (mydate >= '2003-09-01'::date) Total runtime: 307.71 msec (3 rows) template1=# explain analyze select * from mytable where mydate='2003-09-01'; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------ Index Scan using mytable_query on mytable (cost=0.00..148.56 rows=43 width=562) (actual time=41.22..41.27 rows=4 loops=1) Index Cond: (mydate = '2003-09-01'::date) Total runtime: 41.34 msec (3 rows) > On Wed, 2003-10-01 at 13:30, Dimitri Nagiev wrote: > > Hi all, > > > > I haven't found any official documentation about the postgres sql > optimiz > er > > on the web, so please forgive me if there is such a document and point > me > to > > the right direction. > > > > I've got the following problem: I cannot make the postgres SQL Optimizer > use > > an index on a date field to filter out a date range, e.g. > > > > select * from mytable where mydate >= '2003-10-01'; > > > > Seq Scan on mytable (cost=0.00..2138.11 rows=12203 width=543) > > Filter: (mydate >= '2003-09-01'::date) > > EXPLAIN ANALYZE output please. > -- NEU FÜR ALLE - GMX MediaCenter - für Fotos, Musik, Dateien... Fotoalbum, File Sharing, MMS, Multimedia-Gruß, GMX FotoService Jetzt kostenlos anmelden unter http://www.gmx.net +++ GMX - die erste Adresse für Mail, Message, More! +++
On Wed, 2003-10-01 at 13:45, Dimitri Nagiev wrote: > template1=# explain analyze select * from mytable where > mydate>='2003-09-01'; > QUERY PLAN > > > --------------------------------------------------------------------------------------------------------------- > Seq Scan on mytable (cost=0.00..2209.11 rows=22274 width=562) (actual > time=0.06..267.30 rows=22677 loops=1) > Filter: (mydate >= '2003-09-01'::date) > Total runtime: 307.71 msec > (3 rows) It may well be the case that a seqscan is faster than an index scan for this query. Try disabling sequential scans (SET enable_seqscan = false) and re-running EXPLAIN ANALYZE: see if the total runtime is smaller or larger. -Neil
On Wed, 1 Oct 2003 19:45:29 +0200 (MEST), "Dimitri Nagiev" <dnagiev@gmx.de> wrote: >template1=# explain analyze select * from mytable where >mydate>='2003-09-01'; > Seq Scan on mytable (cost=0.00..2209.11 rows=22274 width=562) (actual time=0.06..267.30 rows=22677 loops=1) > Filter: (mydate >= '2003-09-01'::date) > Total runtime: 307.71 msec Didn't you say that there are 25000 rows in the table? I can't believe that for selecting 90% of all rows an index scan would be faster. Try SET enable_seqscan = 0; explain analyze select * from mytable where mydate>='2003-09-01'; If you find the index scan to be faster, there might be lots of dead tuples in which case you should VACUUM FULL mytable; Servus Manfred
On Wed, 1 Oct 2003, Dimitri Nagiev wrote: > here goes the EXPLAIN ANALYZE output: > > > template1=# VACUUM analyze mytable; > VACUUM > template1=# explain analyze select * from mytable where > mydate>='2003-09-01'; > QUERY PLAN > > > --------------------------------------------------------------------------------------------------------------- > Seq Scan on mytable (cost=0.00..2209.11 rows=22274 width=562) (actual > time=0.06..267.30 rows=22677 loops=1) > Filter: (mydate >= '2003-09-01'::date) > Total runtime: 307.71 msec > (3 rows) How many rows are there in this table? If the number is only two or three times as many as the number of rows returned (22677) then a seq scan is preferable. The way to tune your random_page_cost is to keep making your range more selective until you get an index scan. Then, see what the difference is in speed between the two queries that sit on either side of that number, i.e. if a query that returns 1000 rows switches to index scan, and takes 100 msec, while one that returns 1050 uses seq scan and takes 200 msec, then you might want to lower your random page cost. Ideally, what should happen is that as the query returns more and more rows, the switch to seq scan should happen so that it's taking about the same amount of time as the index scan, maybe just a little more.
Oh, to followup on my previously sent post, make sure you've got effective_cache_size set right BEFORE you go trying to set random_page_cost, and you might wanna run a select * from table to load the table into kernel buffer cache before testing, then also test it with the cache cleared out (select * from a_different_really_huge_table will usually do that.)