Thread: why doesn't an index help my simple query?

why doesn't an index help my simple query?

From
Peter Bierman
Date:
I am indeed a novice, but I've done plenty of googling for an answer,
and haven't had any new ideas in a while, so I thought I'd ask...


I have what I would think is a very simple database. I use it to log
temperature probe readings. 8 different probes are recorded each
minute. There are two tables, one of which maps probe-id's to more
info about that probe, and another that holds all of the logged data,
as time, probe-id, and temp.

There is an index on the time column. Vacuum Analyze is run every night.

As you can see below, using an index doesn't seem to have any
significant impact on the query speed. Why not? I would expect a
b-tree index to be amazingly fast for this sort of query. Why doesn't
the index-scan stop once the filter threshold is crossed, since the
index is sorted (right?)?

-pmb



pooldb=> set enable_indexscan=false;
SET
pooldb=> explain analyze
pooldb-> select * from events where time > now() - '2
minutes'::reltime order by time desc;
                                                      QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
  Sort  (cost=8142.85..8286.17 rows=71659 width=20) (actual
time=15086.27..15086.32 rows=16 loops=1)
    Sort Key: "time"
    ->  Seq Scan on events  (cost=0.00..3519.76 rows=71659 width=20)
(actual time=15030.85..15032.23 rows=16 loops=1)
          Filter: (("time")::timestamp with time zone > (now() -
'00:02'::interval))
  Total runtime: 15086.55 msec
(5 rows)

pooldb=> set enable_indexscan=true;
SET
pooldb=> explain analyze
pooldb-> select * from events where time > now() - '2
minutes'::reltime order by time desc;
                                                                  QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------
  Index Scan Backward using events_time_key on events
(cost=0.00..4556.74 rows=71659 width=20) (actual time=0.84..16495.36
rows=16 loops=1)
    Filter: (("time")::timestamp with time zone > (now() - '00:02'::interval))
  Total runtime: 16495.62 msec
(3 rows)

pooldb=> select count(time) from events;
  count
--------
  221974
(1 row)

Re: why doesn't an index help my simple query?

From
Tom Lane
Date:
Peter Bierman <bierman@apple.com> writes:
> As you can see below, using an index doesn't seem to have any
> significant impact on the query speed. Why not?

>   Index Scan Backward using events_time_key on events
> (cost=0.00..4556.74 rows=71659 width=20) (actual time=0.84..16495.36
> rows=16 loops=1)
>     Filter: (("time")::timestamp with time zone > (now() - '00:02'::interval))

Hm, why is that shown as a "filter" and not an "index condition"?  And
why is there an explicit conversion to timestamp with time zone in
there?  Better tell us about the exact data types involved here ...

            regards, tom lane

Re: why doesn't an index help my simple query?

From
Peter Bierman
Date:
At 10:29 PM -0400 5/30/03, Tom Lane wrote:
>Peter Bierman <bierman@apple.com> writes:
>>  As you can see below, using an index doesn't seem to have any
>>  significant impact on the query speed. Why not?
>
>>    Index Scan Backward using events_time_key on events
>>  (cost=0.00..4556.74 rows=71659 width=20) (actual time=0.84..16495.36
>>  rows=16 loops=1)
>>      Filter: (("time")::timestamp with time zone > (now() -
>>'00:02'::interval))
>
>Hm, why is that shown as a "filter" and not an "index condition"?  And
>why is there an explicit conversion to timestamp with time zone in
>there?  Better tell us about the exact data types involved here ...


I was hoping you'd say 'hm'. :-)

The pg_dump says (reordered with sample data at end)

CREATE TABLE therms (
     sn character(16) NOT NULL,
     tid integer NOT NULL,
     name character varying(100) NOT NULL,
     CONSTRAINT therms_sn CHECK ((length(btrim((sn)::text)) = 16))
);

CREATE TABLE events (
     "time" timestamp without time zone DEFAULT
('now'::text)::timestamp(6) with time zone NOT NULL,
     tid integer NOT NULL,
     "temp" double precision NOT NULL
);

CREATE INDEX events_tid_idx ON events USING btree (tid);

ALTER TABLE ONLY therms
     ADD CONSTRAINT therms_pkey PRIMARY KEY (sn);

ALTER TABLE ONLY therms
     ADD CONSTRAINT therms_tid_key UNIQUE (tid);

ALTER TABLE ONLY events
     ADD CONSTRAINT events_time_key UNIQUE ("time");

ALTER TABLE ONLY events
     ADD CONSTRAINT "$1" FOREIGN KEY (tid) REFERENCES therms(tid) ON
UPDATE CASCADE ON DELETE NO ACTION;

COPY therms (sn, tid, name) FROM stdin;
1037c84800080005        1       Hot Tub
10e6a448000800c6        3       Shed 2
1010bb4800080015        4       Shed 3
105fe1480008006c        6       Outside
1083c24800080063        2       Roof
10d8d948000800b9        5       Shed 1
1047ec480008007a        7       Pool
10cdb448000800f4        8       Shed 4
\.

COPY events ("time", tid, "temp") FROM stdin;
2003-05-06 00:25:52.261602      2       55.17
2003-05-06 00:25:53.462081      4       55.27
2003-05-06 00:25:54.463235      5       55.06
2003-05-06 00:25:55.665572      1       63.16
2003-05-06 00:25:56.666579      3       55.17
2003-05-06 00:26:58.275967      2       55.17
\.


I created these tables using the following SQL:

CREATE TABLE therms
(
  sn      CHAR(16) PRIMARY KEY CHECK (length(trim(sn))=16),
  tid     INTEGER UNIQUE NOT NULL,
  name    VARCHAR(100) NOT NULL
);

CREATE TABLE events
(
  time   TIMESTAMP UNIQUE NOT NULL DEFAULT CURRENT_TIMESTAMP,
  tid    INTEGER   NOT NULL,
                   FOREIGN KEY(tid) REFERENCES therms(tid)
                   ON UPDATE CASCADE,
  temp   FLOAT     NOT NULL
);

CREATE INDEX events_tid_idx ON events (tid);

-pmb

Re: why doesn't an index help my simple query?

From
Tom Lane
Date:
Peter Bierman <bierman@apple.com> writes:
> At 10:29 PM -0400 5/30/03, Tom Lane wrote:
>> Hm, why is that shown as a "filter" and not an "index condition"?  And
>> why is there an explicit conversion to timestamp with time zone in
>> there?  Better tell us about the exact data types involved here ...

> I was hoping you'd say 'hm'. :-)

> CREATE TABLE events (
>      "time" timestamp without time zone DEFAULT
> ('now'::text)::timestamp(6) with time zone NOT NULL,

Right.  You're getting bit by ye same olde problem of datatype mismatch:
the planner does not realize that there is any connection between the
types "timestamp without time zone" and "timestamp with time zone", so
the presence of a WHERE condition expressed in terms of a timestamp-with-
tz operator doesn't induce it to do anything that a timestamp-without-tz
index could recognize.

Short answer is you probably ought to declare events.time as timestamp
with time zone; or if you have a *really good* reason why it should not
be declared that way, you ought to cast what you are comparing it to
to timestamp without tz.  ("now()" yields timestamp with tz, which is
considered the preferred type in this category, so the default
assumption is to cast to timestamp with tz not vice versa.)

I'm starting to wonder if we shouldn't devise some way to allow these
sorts of cross-datatype comparisons to be more easily indexable.  No
immediate ideas about how to do it without breaking stuff, though...

            regards, tom lane

Re: why doesn't an index help my simple query?

From
Peter Bierman
Date:
At 11:23 PM -0400 5/30/03, Tom Lane wrote:
>Peter Bierman <bierman@apple.com> writes:
>>  At 10:29 PM -0400 5/30/03, Tom Lane wrote:
>>>  Hm, why is that shown as a "filter" and not an "index condition"?  And
>>>  why is there an explicit conversion to timestamp with time zone in
>>>  there?  Better tell us about the exact data types involved here ...
>
>>  I was hoping you'd say 'hm'. :-)
>
>>  CREATE TABLE events (
>>       "time" timestamp without time zone DEFAULT
>>  ('now'::text)::timestamp(6) with time zone NOT NULL,
>
>Right.  You're getting bit by ye same olde problem of datatype mismatch:
>the planner does not realize that there is any connection between the
>types "timestamp without time zone" and "timestamp with time zone", so
>the presence of a WHERE condition expressed in terms of a timestamp-with-
>tz operator doesn't induce it to do anything that a timestamp-without-tz
>index could recognize.
>
>Short answer is you probably ought to declare events.time as timestamp
>with time zone; or if you have a *really good* reason why it should not
>be declared that way, you ought to cast what you are comparing it to
>to timestamp without tz.


Ok, two more questions then:

1) is there an easy way for me to change the type in place?

2) I created the table initially with:

CREATE TABLE events
(
  time   TIMESTAMP UNIQUE NOT NULL DEFAULT CURRENT_TIMESTAMP,

Should the plain 'timestamp' type really default to a different type
than what all the rest of the timestamp operators prefer?

Thanks!

-pmb

Re: why doesn't an index help my simple query?

From
Tom Lane
Date:
Peter Bierman <bierman@apple.com> writes:
> 1) is there an easy way for me to change the type in place?

Not in this case --- the stored representations of timestamp and
timestamp-with-tz differ.  (The format is the same but the idea
of what is the zero reference is not.)  I'm afraid you have to
dump and reload, or at least use INSERT...SELECT to build a new
table.

> Should the plain 'timestamp' type really default to a different type
> than what all the rest of the timestamp operators prefer?

Tell me about it :-(.  SQL92 specifies that "timestamp" means "timestamp
without time zone", but this is an utterly brain-dead default IMHO.
You can check the PG list archives if you want to see past flamewars
about the issue.  As a member of the losing side I will gracefully
retire...

            regards, tom lane

Re: why doesn't an index help my simple query?

From
Date:
On Fri, 30 May 2003, Peter Bierman wrote:

> I am indeed a novice, but I've done plenty of googling for an answer,
> and haven't had any new ideas in a while, so I thought I'd ask...
>
> I have what I would think is a very simple database. I use it to log
> temperature probe readings. 8 different probes are recorded each
> minute. There are two tables, one of which maps probe-id's to more
> info about that probe, and another that holds all of the logged data,
> as time, probe-id, and temp.
>
> There is an index on the time column. Vacuum Analyze is run every night.
>
> As you can see below, using an index doesn't seem to have any
> significant impact on the query speed. Why not? I would expect a
> b-tree index to be amazingly fast for this sort of query. Why doesn't
> the index-scan stop once the filter threshold is crossed, since the
> index is sorted (right?)?

I haven't gotten much into dbase issues, so I can't help you
on that.  But, maybe you aren't really looking for the right
things.  What sorts of things are you hoping to find in your
temperature data?

If your probes are all at the same temperature, then indexing
on temperature really isn't going to do any good.  There is
only a single value which is trying to map to everything.

Now, if you collect data every N (minutes/hours/days/...),
what might be more useful, would be to every hour (or day,
some convenient time period) gather all the data for the
last (hour/day/week/..., whatever is useful) time period
of interest and process that data in some way.  In the
old days, I suppose a Fourier transform (probably a FFT)
would be the main choice.  These days, there might be
other things like wavelets for how to transform the data.
In any event, what you might be looking for is changes
happening at specific frequencies.  So, making another
table to store some of the results from Fourier analysis
would allow you to construct queries as to how the magnitude
of a certain frequency changed with time.

For example:
I get temperature readings from one of my disk drives.
And every hour, I get a message in my logs telling me
about the temperature.  I haven't done anything yet,
but what I should do is to keep track of this log data,
as well as visit a local weather station and download
local temperature outside my house.  Then I have a chance
to partition the observed disk drive temperature into
some amount due to external temperature and some due
to things due to disk operation.  I might then be able
to construct a query which looks at the DC (wavelength
of infinity) component to tell me about how the drive
temperature increases with time.  I would expect this
value to be constant, until the end of drive life
was approaching due to things like bearings.  I might
add in things like the number of disk accesses per hour,
to try and correlate temperature increases to things
like updating the locate database (filenames present on
system).

There's lots of things one can do.  But it may be that
you are looking for something involving temperature,
and not temperature by itself.  And sitting down and
figuring out what might be happening to change
temperature values, might help you come up with
fields in this, or other tables, which would allow
the use of things like indexes to work better.

Gord
--
Matter Realisations     http://www.materialisations.com/
Gordon Haverland, B.Sc. M.Eng. President
101  9504 182 St. NW    Edmonton, AB, CA  T5T 3A7
780/481-8019            ghaverla @ freenet.edmonton.ab.ca
780/993-1274 (alt.)


Re: why doesn't an index help my simple query?

From
Dima Tkach
Date:
>
>>Should the plain 'timestamp' type really default to a different type
>>than what all the rest of the timestamp operators prefer?
>
>
> Tell me about it :-(.  SQL92 specifies that "timestamp" means "timestamp
> without time zone", but this is an utterly brain-dead default IMHO.
> You can check the PG list archives if you want to see past flamewars
> about the issue.  As a member of the losing side I will gracefully
> retire...
>

But is *this* really the issue here?

It seems to me, that everything would work as expected if the planner
tried to cast the *constant* side of the operator to the column type,
and not the other way around - i.e., in this case, it coverts the
original condition into something like
'time::timestamp with timezone < now ()'
if instead it did
'time < now()::timestamp', it would work, (and be able to use the
index), wouldn't it?

Is there some technical difficulty here, making it too complicated to
implement?

Dima