Thread: why doesn't an index help my simple query?
I am indeed a novice, but I've done plenty of googling for an answer, and haven't had any new ideas in a while, so I thought I'd ask... I have what I would think is a very simple database. I use it to log temperature probe readings. 8 different probes are recorded each minute. There are two tables, one of which maps probe-id's to more info about that probe, and another that holds all of the logged data, as time, probe-id, and temp. There is an index on the time column. Vacuum Analyze is run every night. As you can see below, using an index doesn't seem to have any significant impact on the query speed. Why not? I would expect a b-tree index to be amazingly fast for this sort of query. Why doesn't the index-scan stop once the filter threshold is crossed, since the index is sorted (right?)? -pmb pooldb=> set enable_indexscan=false; SET pooldb=> explain analyze pooldb-> select * from events where time > now() - '2 minutes'::reltime order by time desc; QUERY PLAN --------------------------------------------------------------------------------------------------------------------- Sort (cost=8142.85..8286.17 rows=71659 width=20) (actual time=15086.27..15086.32 rows=16 loops=1) Sort Key: "time" -> Seq Scan on events (cost=0.00..3519.76 rows=71659 width=20) (actual time=15030.85..15032.23 rows=16 loops=1) Filter: (("time")::timestamp with time zone > (now() - '00:02'::interval)) Total runtime: 15086.55 msec (5 rows) pooldb=> set enable_indexscan=true; SET pooldb=> explain analyze pooldb-> select * from events where time > now() - '2 minutes'::reltime order by time desc; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------- Index Scan Backward using events_time_key on events (cost=0.00..4556.74 rows=71659 width=20) (actual time=0.84..16495.36 rows=16 loops=1) Filter: (("time")::timestamp with time zone > (now() - '00:02'::interval)) Total runtime: 16495.62 msec (3 rows) pooldb=> select count(time) from events; count -------- 221974 (1 row)
Peter Bierman <bierman@apple.com> writes: > As you can see below, using an index doesn't seem to have any > significant impact on the query speed. Why not? > Index Scan Backward using events_time_key on events > (cost=0.00..4556.74 rows=71659 width=20) (actual time=0.84..16495.36 > rows=16 loops=1) > Filter: (("time")::timestamp with time zone > (now() - '00:02'::interval)) Hm, why is that shown as a "filter" and not an "index condition"? And why is there an explicit conversion to timestamp with time zone in there? Better tell us about the exact data types involved here ... regards, tom lane
At 10:29 PM -0400 5/30/03, Tom Lane wrote: >Peter Bierman <bierman@apple.com> writes: >> As you can see below, using an index doesn't seem to have any >> significant impact on the query speed. Why not? > >> Index Scan Backward using events_time_key on events >> (cost=0.00..4556.74 rows=71659 width=20) (actual time=0.84..16495.36 >> rows=16 loops=1) >> Filter: (("time")::timestamp with time zone > (now() - >>'00:02'::interval)) > >Hm, why is that shown as a "filter" and not an "index condition"? And >why is there an explicit conversion to timestamp with time zone in >there? Better tell us about the exact data types involved here ... I was hoping you'd say 'hm'. :-) The pg_dump says (reordered with sample data at end) CREATE TABLE therms ( sn character(16) NOT NULL, tid integer NOT NULL, name character varying(100) NOT NULL, CONSTRAINT therms_sn CHECK ((length(btrim((sn)::text)) = 16)) ); CREATE TABLE events ( "time" timestamp without time zone DEFAULT ('now'::text)::timestamp(6) with time zone NOT NULL, tid integer NOT NULL, "temp" double precision NOT NULL ); CREATE INDEX events_tid_idx ON events USING btree (tid); ALTER TABLE ONLY therms ADD CONSTRAINT therms_pkey PRIMARY KEY (sn); ALTER TABLE ONLY therms ADD CONSTRAINT therms_tid_key UNIQUE (tid); ALTER TABLE ONLY events ADD CONSTRAINT events_time_key UNIQUE ("time"); ALTER TABLE ONLY events ADD CONSTRAINT "$1" FOREIGN KEY (tid) REFERENCES therms(tid) ON UPDATE CASCADE ON DELETE NO ACTION; COPY therms (sn, tid, name) FROM stdin; 1037c84800080005 1 Hot Tub 10e6a448000800c6 3 Shed 2 1010bb4800080015 4 Shed 3 105fe1480008006c 6 Outside 1083c24800080063 2 Roof 10d8d948000800b9 5 Shed 1 1047ec480008007a 7 Pool 10cdb448000800f4 8 Shed 4 \. COPY events ("time", tid, "temp") FROM stdin; 2003-05-06 00:25:52.261602 2 55.17 2003-05-06 00:25:53.462081 4 55.27 2003-05-06 00:25:54.463235 5 55.06 2003-05-06 00:25:55.665572 1 63.16 2003-05-06 00:25:56.666579 3 55.17 2003-05-06 00:26:58.275967 2 55.17 \. I created these tables using the following SQL: CREATE TABLE therms ( sn CHAR(16) PRIMARY KEY CHECK (length(trim(sn))=16), tid INTEGER UNIQUE NOT NULL, name VARCHAR(100) NOT NULL ); CREATE TABLE events ( time TIMESTAMP UNIQUE NOT NULL DEFAULT CURRENT_TIMESTAMP, tid INTEGER NOT NULL, FOREIGN KEY(tid) REFERENCES therms(tid) ON UPDATE CASCADE, temp FLOAT NOT NULL ); CREATE INDEX events_tid_idx ON events (tid); -pmb
Peter Bierman <bierman@apple.com> writes: > At 10:29 PM -0400 5/30/03, Tom Lane wrote: >> Hm, why is that shown as a "filter" and not an "index condition"? And >> why is there an explicit conversion to timestamp with time zone in >> there? Better tell us about the exact data types involved here ... > I was hoping you'd say 'hm'. :-) > CREATE TABLE events ( > "time" timestamp without time zone DEFAULT > ('now'::text)::timestamp(6) with time zone NOT NULL, Right. You're getting bit by ye same olde problem of datatype mismatch: the planner does not realize that there is any connection between the types "timestamp without time zone" and "timestamp with time zone", so the presence of a WHERE condition expressed in terms of a timestamp-with- tz operator doesn't induce it to do anything that a timestamp-without-tz index could recognize. Short answer is you probably ought to declare events.time as timestamp with time zone; or if you have a *really good* reason why it should not be declared that way, you ought to cast what you are comparing it to to timestamp without tz. ("now()" yields timestamp with tz, which is considered the preferred type in this category, so the default assumption is to cast to timestamp with tz not vice versa.) I'm starting to wonder if we shouldn't devise some way to allow these sorts of cross-datatype comparisons to be more easily indexable. No immediate ideas about how to do it without breaking stuff, though... regards, tom lane
At 11:23 PM -0400 5/30/03, Tom Lane wrote: >Peter Bierman <bierman@apple.com> writes: >> At 10:29 PM -0400 5/30/03, Tom Lane wrote: >>> Hm, why is that shown as a "filter" and not an "index condition"? And >>> why is there an explicit conversion to timestamp with time zone in >>> there? Better tell us about the exact data types involved here ... > >> I was hoping you'd say 'hm'. :-) > >> CREATE TABLE events ( >> "time" timestamp without time zone DEFAULT >> ('now'::text)::timestamp(6) with time zone NOT NULL, > >Right. You're getting bit by ye same olde problem of datatype mismatch: >the planner does not realize that there is any connection between the >types "timestamp without time zone" and "timestamp with time zone", so >the presence of a WHERE condition expressed in terms of a timestamp-with- >tz operator doesn't induce it to do anything that a timestamp-without-tz >index could recognize. > >Short answer is you probably ought to declare events.time as timestamp >with time zone; or if you have a *really good* reason why it should not >be declared that way, you ought to cast what you are comparing it to >to timestamp without tz. Ok, two more questions then: 1) is there an easy way for me to change the type in place? 2) I created the table initially with: CREATE TABLE events ( time TIMESTAMP UNIQUE NOT NULL DEFAULT CURRENT_TIMESTAMP, Should the plain 'timestamp' type really default to a different type than what all the rest of the timestamp operators prefer? Thanks! -pmb
Peter Bierman <bierman@apple.com> writes: > 1) is there an easy way for me to change the type in place? Not in this case --- the stored representations of timestamp and timestamp-with-tz differ. (The format is the same but the idea of what is the zero reference is not.) I'm afraid you have to dump and reload, or at least use INSERT...SELECT to build a new table. > Should the plain 'timestamp' type really default to a different type > than what all the rest of the timestamp operators prefer? Tell me about it :-(. SQL92 specifies that "timestamp" means "timestamp without time zone", but this is an utterly brain-dead default IMHO. You can check the PG list archives if you want to see past flamewars about the issue. As a member of the losing side I will gracefully retire... regards, tom lane
On Fri, 30 May 2003, Peter Bierman wrote: > I am indeed a novice, but I've done plenty of googling for an answer, > and haven't had any new ideas in a while, so I thought I'd ask... > > I have what I would think is a very simple database. I use it to log > temperature probe readings. 8 different probes are recorded each > minute. There are two tables, one of which maps probe-id's to more > info about that probe, and another that holds all of the logged data, > as time, probe-id, and temp. > > There is an index on the time column. Vacuum Analyze is run every night. > > As you can see below, using an index doesn't seem to have any > significant impact on the query speed. Why not? I would expect a > b-tree index to be amazingly fast for this sort of query. Why doesn't > the index-scan stop once the filter threshold is crossed, since the > index is sorted (right?)? I haven't gotten much into dbase issues, so I can't help you on that. But, maybe you aren't really looking for the right things. What sorts of things are you hoping to find in your temperature data? If your probes are all at the same temperature, then indexing on temperature really isn't going to do any good. There is only a single value which is trying to map to everything. Now, if you collect data every N (minutes/hours/days/...), what might be more useful, would be to every hour (or day, some convenient time period) gather all the data for the last (hour/day/week/..., whatever is useful) time period of interest and process that data in some way. In the old days, I suppose a Fourier transform (probably a FFT) would be the main choice. These days, there might be other things like wavelets for how to transform the data. In any event, what you might be looking for is changes happening at specific frequencies. So, making another table to store some of the results from Fourier analysis would allow you to construct queries as to how the magnitude of a certain frequency changed with time. For example: I get temperature readings from one of my disk drives. And every hour, I get a message in my logs telling me about the temperature. I haven't done anything yet, but what I should do is to keep track of this log data, as well as visit a local weather station and download local temperature outside my house. Then I have a chance to partition the observed disk drive temperature into some amount due to external temperature and some due to things due to disk operation. I might then be able to construct a query which looks at the DC (wavelength of infinity) component to tell me about how the drive temperature increases with time. I would expect this value to be constant, until the end of drive life was approaching due to things like bearings. I might add in things like the number of disk accesses per hour, to try and correlate temperature increases to things like updating the locate database (filenames present on system). There's lots of things one can do. But it may be that you are looking for something involving temperature, and not temperature by itself. And sitting down and figuring out what might be happening to change temperature values, might help you come up with fields in this, or other tables, which would allow the use of things like indexes to work better. Gord -- Matter Realisations http://www.materialisations.com/ Gordon Haverland, B.Sc. M.Eng. President 101 9504 182 St. NW Edmonton, AB, CA T5T 3A7 780/481-8019 ghaverla @ freenet.edmonton.ab.ca 780/993-1274 (alt.)
> >>Should the plain 'timestamp' type really default to a different type >>than what all the rest of the timestamp operators prefer? > > > Tell me about it :-(. SQL92 specifies that "timestamp" means "timestamp > without time zone", but this is an utterly brain-dead default IMHO. > You can check the PG list archives if you want to see past flamewars > about the issue. As a member of the losing side I will gracefully > retire... > But is *this* really the issue here? It seems to me, that everything would work as expected if the planner tried to cast the *constant* side of the operator to the column type, and not the other way around - i.e., in this case, it coverts the original condition into something like 'time::timestamp with timezone < now ()' if instead it did 'time < now()::timestamp', it would work, (and be able to use the index), wouldn't it? Is there some technical difficulty here, making it too complicated to implement? Dima