Help with query: indexes on timestamps - Mailing list pgsql-general
From | Keith C. Perry |
---|---|
Subject | Help with query: indexes on timestamps |
Date | |
Msg-id | 1089219845.40ec2d0503fb7@webmail.vcsn.com Whole thread Raw |
Responses |
Re: Help with query: indexes on timestamps
|
List | pgsql-general |
Ok, I've tried a number of things here and I know I'm missing something but at this point my head is spinning (i.e. lack of sleep, too much coffee, etc...) My environment is PG 7.4.3 on Linux with 512Mb of ram and swap. This was just upgraded from 7.4 (just to make sure I'm current). Some of my settings in postgresql are giving fatal errors but I don't think my issue is related to my query problems. I also have a laptop running with the same basic specs (no raid, slower processor). I use a recent pgadmin-III as my client. We're also running this query in MS-SQL. I have a table with with 1 million records in it. Here is the definition CREATE TABLE report ( match int4, action varchar(16), stamp timestamptz, account varchar(32), ipaddress inet, profile varchar(16), rating text, url text ) WITHOUT OIDS; The is one index: CREATE INDEX stamp_idx ON report USING btree (stamp); That query I'm running is: SELECT date_part('hour'::text, report.stamp) AS "hour", count(*) AS count FROM report GROUP BY date_part('hour'::text, report.stamp) ORDER BY date_part('hour'::text, report.stamp); Here is the plan I get: QUERY PLAN ---------------------------------------------------------------------------- Sort (cost=47420.64..47421.14 rows=200 width=8) Sort Key: date_part('hour'::text, stamp) -> HashAggregate (cost=47412.00..47413.00 rows=200 width=8) -> Seq Scan on report (cost=0.00..42412.00 rows=1000000 width=8) (4 rows) Now from from I understand that, the index I created would not be used since I would be looking at every row to do the date part. The query under 7.4 ran in about 8 seconds. In 7.4.3, its taking 37 seconds for the same plan (which is fine for the system not be tuned yet). On my laptop its taking 6 seconds. MS-SQL is taking 8 seconds. These runs are after I do vacuum full, vacuum analyse and reindex on the database and table respectively My question: How can I get this query to use an index build on the date_part function. On the MS-SQL side, creating a computed column with the date part and then don't an index on that column bring the query done to 2 seconds. I tried creating this function: CREATE OR REPLACE FUNCTION whathour(timestamptz) RETURNS int4 AS 'begin return date_part(\'hour\',$1); end;' LANGUAGE 'plpgsql' IMMUTABLE; and then and index: CREATE INDEX hour_idx ON report USING btree (stamp) WHERE whathour(stamp) >= 0 AND whathour(stamp) <= 23; but I get the same plan- which makes sense to me because I'm again inspect quiet a few row. I'm sure I'm missing something... I couldn't see from the docs how to make a column equal a function (like MS-SQL's computed column) but to me it seems like I should not have to do something like that since it really is wasting space in the table. I hoping a partial index or a function index will solve this and be just as efficient. However, that method **does** work. Is there a better way? Thanks to all in advance. -- Keith C. Perry, MS E.E. Director of Networks & Applications VCSN, Inc. http://vcsn.com ____________________________________ This email account is being host by: VCSN, Inc : http://vcsn.com
pgsql-general by date: