Help with query: indexes on timestamps - Mailing list pgsql-general

From Keith C. Perry
Subject Help with query: indexes on timestamps
Date
Msg-id 1089219845.40ec2d0503fb7@webmail.vcsn.com
Whole thread Raw
Responses Re: Help with query: indexes on timestamps  (Richard Huxton <dev@archonet.com>)
List pgsql-general
Ok, I've tried a number of things here and I know I'm missing something but at
this point my head is spinning (i.e. lack of sleep, too much coffee, etc...)

My environment is PG 7.4.3 on Linux with 512Mb of ram and swap.  This was just
upgraded from 7.4 (just to make sure I'm current).  Some of my settings in
postgresql are giving fatal errors but I don't think my issue is related to my
query problems.  I also have a laptop running with the same basic specs (no
raid, slower processor).

I use a recent pgadmin-III as my client.

We're also running this query in MS-SQL.

I have a table with with 1 million records in it.  Here is the definition

CREATE TABLE report
(
  match int4,
  action varchar(16),
  stamp timestamptz,
  account varchar(32),
  ipaddress inet,
  profile varchar(16),
  rating text,
  url text
)
WITHOUT OIDS;

The is one index:

CREATE INDEX stamp_idx
  ON report
  USING btree
  (stamp);

That query I'm running is:

 SELECT date_part('hour'::text, report.stamp) AS "hour", count(*) AS count
   FROM report
  GROUP BY date_part('hour'::text, report.stamp)
  ORDER BY date_part('hour'::text, report.stamp);



Here is the plan I get:

                                 QUERY PLAN
----------------------------------------------------------------------------
 Sort  (cost=47420.64..47421.14 rows=200 width=8)
   Sort Key: date_part('hour'::text, stamp)
   ->  HashAggregate  (cost=47412.00..47413.00 rows=200 width=8)
         ->  Seq Scan on report  (cost=0.00..42412.00 rows=1000000 width=8)
(4 rows)


Now from from I understand that, the index I created would not be used since I
would be looking at every row to do the date part.  The query under 7.4 ran in
about 8 seconds.  In 7.4.3, its taking 37 seconds for the same plan (which is
fine for the system not be tuned yet).  On my laptop its taking 6 seconds.
MS-SQL is taking 8 seconds.  These runs are after I do vacuum full, vacuum
analyse and reindex on the database and table respectively

My question:  How can I get this query to use an index build on the date_part
function.  On the MS-SQL side, creating a computed column with the date part and
then don't an index on that column bring the query done to 2 seconds.

I tried creating this function:

CREATE OR REPLACE FUNCTION whathour(timestamptz)
  RETURNS int4 AS
'begin
  return date_part(\'hour\',$1);
end;'
  LANGUAGE 'plpgsql' IMMUTABLE;

and then and index:

CREATE INDEX hour_idx
  ON report
  USING btree
  (stamp)
  WHERE whathour(stamp) >= 0 AND whathour(stamp) <= 23;

but I get the same plan- which makes sense to me because I'm again inspect
quiet a few row.  I'm sure I'm missing something...

I couldn't see from the docs how to make a column equal a function (like
MS-SQL's computed column) but to me it seems like I should not have to do
something like that since it really is wasting space in the table.  I hoping a
partial index or a function index will solve this and be just as efficient.
However, that method **does** work.  Is there a better way?

Thanks to all in advance.

--
Keith C. Perry, MS E.E.
Director of Networks & Applications
VCSN, Inc.
http://vcsn.com

____________________________________
This email account is being host by:
VCSN, Inc : http://vcsn.com

pgsql-general by date:

Previous
From: Bruno Wolff III
Date:
Subject: Re: Grant Select privileges for all tables in schema
Next
From: Richard Huxton
Date:
Subject: Re: Help with query: indexes on timestamps