Thread: index usage

index usage

From

Timur Irmatov

Date:

17 January 2003, 09:47:55

Hi, everyone!

I have a simple query which takes almost 3 seconds to complete, but
disabling sequence scans leads to a new plan using index.  This second
plan takes less than 1 millisecond to run.

So, I'd like to hear any comments and suggestions.

Details.

CREATE TABLE MediumStats (
        year    SMALLINT NOT NULL,
        month   SMALLINT NOT NULL,
        day     SMALLINT NOT NULL,
        hour    SMALLINT NOT NULL,
        --- and then goes few data fields
        figureId INTEGER NOT NULL,
        typeId   INTEGER NOT NULL
        PRIMARY KEY (figureId, typeId, year, month, day, hour)
);

CREATE FUNCTION indexHelper (INT2, INT2, INT2, INT2)
RETURNS CHARACTER(10) AS '
return sprintf("%d%02d%02d%02d", @_);
' LANGUAGE 'plperl' WITH (isCachable);

CREATE INDEX timeIndex ON MediumStats (indexHelper(year,month,day,hour));

and that is the query:
SELECT * FROM MediumStats
WHERE indexHelper(year,month,day,hour) < '2002121500'
LIMIT 1;

First, original plan:
Limit  (cost=0.00..0.09 rows=1 width=22) (actual time=2969.30..2969.30 rows=0 loops=1)
  ->  Seq Scan on mediumstats  (cost=0.00..1332.33 rows=15185 width=22) (actual time=2969.29..2969.29 rows=0 loops=1)
Total runtime: 2969.39 msec

Second plan, seq scans disabled:

Limit  (cost=0.00..0.19 rows=1 width=6) (actual time=0.43..0.43 rows=0 loops=1)
  ->  Index Scan using timeindex on mediumstats  (cost=0.00..2898.96 rows=15185 width=6) (actual time=0.42..0.42 rows=0
loops=1)
Total runtime: 0.54 msec

Table MediumStats currently has 45000 rows, all rows belong to this
month.

Re: index usage

From

Tom Lane

Date:

17 January 2003, 12:57:02

Timur Irmatov <thor@sarkor.com> writes:
> Limit  (cost=0.00..0.19 rows=1 width=6) (actual time=0.43..0.43 rows=0 loops=1)
>   ->  Index Scan using timeindex on mediumstats  (cost=0.00..2898.96 rows=15185 width=6) (actual time=0.42..0.42
rows=0loops=1) 

The planner has absolutely no clue about the behavior of your function,
and so its estimate of the number of rows matched is way off, leading to
a poor estimate of the cost of an indexscan.  There is not much to be
done about this in the current system (though I've speculated about the
possibility of computing statistics for functional indexes).

Just out of curiosity, why don't you lose all this year/month/day stuff
and use a timestamp column?  Less space, more functionality.

            regards, tom lane

Re: index usage

From

Timur Irmatov

Date:

17 January 2003, 13:09:30

TL> Timur Irmatov <thor@sarkor.com> writes:
>> Limit  (cost=0.00..0.19 rows=1 width=6) (actual time=0.43..0.43 rows=0 loops=1)
>>   ->  Index Scan using timeindex on mediumstats  (cost=0.00..2898.96 rows=15185 width=6) (actual time=0.42..0.42
rows=0loops=1) 

TL> The planner has absolutely no clue about the behavior of your function,
TL> and so its estimate of the number of rows matched is way off, leading to
TL> a poor estimate of the cost of an indexscan.  There is not much to be
TL> done about this in the current system (though I've speculated about the
TL> possibility of computing statistics for functional indexes).

you're absolutely right.
thanks.

TL> Just out of curiosity, why don't you lose all this year/month/day stuff
TL> and use a timestamp column?  Less space, more functionality.

:-)
Well, I've a seen a lot of people on pgsql-general mailing list with
problems with dates, timestamps, and I was just scared of using
PostreSQL date and time types and functions..

May be, I should just try it myself before doing it other way...

Re: index usage

From

Andrew Sullivan

Date:

17 January 2003, 13:33:42

On Fri, Jan 17, 2003 at 08:08:14PM +0500, Timur Irmatov wrote:
> Well, I've a seen a lot of people on pgsql-general mailing list with
> problems with dates, timestamps, and I was just scared of using
> PostreSQL date and time types and functions..

What problems?  The only problems I know of with datetime stuff are
on those machines with the utterly silly glibc hobbling, and even
that has been worked around in recent releases.  I think the date and
time handling in PostgreSQL beats most systems.  It just works, and
handles all the time-zone conversions for you and everything.

A

--
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110