Thread: index usage
Hi, everyone! I have a simple query which takes almost 3 seconds to complete, but disabling sequence scans leads to a new plan using index. This second plan takes less than 1 millisecond to run. So, I'd like to hear any comments and suggestions. Details. CREATE TABLE MediumStats ( year SMALLINT NOT NULL, month SMALLINT NOT NULL, day SMALLINT NOT NULL, hour SMALLINT NOT NULL, --- and then goes few data fields figureId INTEGER NOT NULL, typeId INTEGER NOT NULL PRIMARY KEY (figureId, typeId, year, month, day, hour) ); CREATE FUNCTION indexHelper (INT2, INT2, INT2, INT2) RETURNS CHARACTER(10) AS ' return sprintf("%d%02d%02d%02d", @_); ' LANGUAGE 'plperl' WITH (isCachable); CREATE INDEX timeIndex ON MediumStats (indexHelper(year,month,day,hour)); and that is the query: SELECT * FROM MediumStats WHERE indexHelper(year,month,day,hour) < '2002121500' LIMIT 1; First, original plan: Limit (cost=0.00..0.09 rows=1 width=22) (actual time=2969.30..2969.30 rows=0 loops=1) -> Seq Scan on mediumstats (cost=0.00..1332.33 rows=15185 width=22) (actual time=2969.29..2969.29 rows=0 loops=1) Total runtime: 2969.39 msec Second plan, seq scans disabled: Limit (cost=0.00..0.19 rows=1 width=6) (actual time=0.43..0.43 rows=0 loops=1) -> Index Scan using timeindex on mediumstats (cost=0.00..2898.96 rows=15185 width=6) (actual time=0.42..0.42 rows=0 loops=1) Total runtime: 0.54 msec Table MediumStats currently has 45000 rows, all rows belong to this month.
Timur Irmatov <thor@sarkor.com> writes: > Limit (cost=0.00..0.19 rows=1 width=6) (actual time=0.43..0.43 rows=0 loops=1) > -> Index Scan using timeindex on mediumstats (cost=0.00..2898.96 rows=15185 width=6) (actual time=0.42..0.42 rows=0loops=1) The planner has absolutely no clue about the behavior of your function, and so its estimate of the number of rows matched is way off, leading to a poor estimate of the cost of an indexscan. There is not much to be done about this in the current system (though I've speculated about the possibility of computing statistics for functional indexes). Just out of curiosity, why don't you lose all this year/month/day stuff and use a timestamp column? Less space, more functionality. regards, tom lane
TL> Timur Irmatov <thor@sarkor.com> writes: >> Limit (cost=0.00..0.19 rows=1 width=6) (actual time=0.43..0.43 rows=0 loops=1) >> -> Index Scan using timeindex on mediumstats (cost=0.00..2898.96 rows=15185 width=6) (actual time=0.42..0.42 rows=0loops=1) TL> The planner has absolutely no clue about the behavior of your function, TL> and so its estimate of the number of rows matched is way off, leading to TL> a poor estimate of the cost of an indexscan. There is not much to be TL> done about this in the current system (though I've speculated about the TL> possibility of computing statistics for functional indexes). you're absolutely right. thanks. TL> Just out of curiosity, why don't you lose all this year/month/day stuff TL> and use a timestamp column? Less space, more functionality. :-) Well, I've a seen a lot of people on pgsql-general mailing list with problems with dates, timestamps, and I was just scared of using PostreSQL date and time types and functions.. May be, I should just try it myself before doing it other way...
On Fri, Jan 17, 2003 at 08:08:14PM +0500, Timur Irmatov wrote: > Well, I've a seen a lot of people on pgsql-general mailing list with > problems with dates, timestamps, and I was just scared of using > PostreSQL date and time types and functions.. What problems? The only problems I know of with datetime stuff are on those machines with the utterly silly glibc hobbling, and even that has been worked around in recent releases. I think the date and time handling in PostgreSQL beats most systems. It just works, and handles all the time-zone conversions for you and everything. A -- ---- Andrew Sullivan 204-4141 Yonge Street Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M2P 2A8 +1 416 646 3304 x110