Quoting Richard Huxton <dev@archonet.com>:
> Keith C. Perry wrote:
> >
> > I have a table with with 1 million records in it. Here is the definition
> >
> > CREATE TABLE report
> > (
> > match int4,
> > action varchar(16),
> > stamp timestamptz,
> > account varchar(32),
> > ipaddress inet,
> > profile varchar(16),
> > rating text,
> > url text
> > )
> > WITHOUT OIDS;
> >
> > The is one index:
> >
> > CREATE INDEX stamp_idx
> > ON report
> > USING btree
> > (stamp);
> >
> > That query I'm running is:
> >
> > SELECT date_part('hour'::text, report.stamp) AS "hour", count(*) AS count
> > FROM report
> > GROUP BY date_part('hour'::text, report.stamp)
> > ORDER BY date_part('hour'::text, report.stamp);
>
> You will always get a sequential scan with this query - there is no
> other way to count the rows.
>
> With PostgreSQL being MVCC based, you can't know whether a row is
> visible to you without checking it - visiting the index won't help. Even
> if it could, you'd still have to visit every row in the index.
>
> Assuming the table is a log, with always increasing timestamps, I'd
> create a summary table and query that.
Yea, actually it a proxy server log each month the databasae is 500k records. I
have two months loaded only to put some stress on the server. Some ever month
I'm loading the data just so I can do some analysis. The optimization question
came up when one of the other database folks wanted to play with the database in
MS-SQL server.
How can I add a column that respresents a function that returns just the
date_part? I wondering if that will increase the speed of the query in similar
fashion as the MS-SQL did.
I hadn't though about the MVCC vs. file locking issue. The MS-SQL server does
not have any load on it and I'm sure if other users were hitting it the same
table with the same query, PG would be perform better.
--
Keith C. Perry, MS E.E.
Director of Networks & Applications
VCSN, Inc.
http://vcsn.com
____________________________________
This email account is being host by:
VCSN, Inc : http://vcsn.com