Thread: inconsistent/weird index usage
To save some time, let me start by saying PostgreSQL 7.4.3 on powerpc-apple-darwin7.4.0, compiled by GCC gcc (GCC) 3.3 20030304 (Apple Computer, Inc. build 1640) OK, now on to details... I'm trying to implement oracle style ``partitions'' in postgres. I've run into my first snag on what should be a fairly quick query. Basically, I started with the following schema and split the ``samples'' table into one table for each year (1999-2004). -- BEGIN SCHEMA create table sensor_types ( sensor_type_id serial, sensor_type text not null, units varchar(10) not null, primary key(sensor_type_id) ); create table sensors ( sensor_id serial, sensor_type_id integer not null, serial char(16) not null, name text not null, low smallint not null, high smallint not null, active boolean default true, primary key(sensor_id), foreign key(sensor_type_id) references sensor_types(sensor_type_id) ); create unique index sensors_byserial on sensors(serial); create table samples ( ts datetime not null, sensor_id integer not null, sample float not null, foreign key(sensor_id) references sensors(sensor_id) ); create index samples_bytime on samples(ts); create unique index samples_bytimeid on samples(ts, sensor_id); -- END SCHEMA Each samples_[year] table looks, and is indexed exactly as the above samples table was by using the following commands: create index samples_1999_bytime on samples_1999(ts); create index samples_2000_bytime on samples_2000(ts); create index samples_2001_bytime on samples_2001(ts); create index samples_2002_bytime on samples_2002(ts); create index samples_2003_bytime on samples_2003(ts); create index samples_2004_bytime on samples_2004(ts); create unique index samples_1999_bytimeid on samples_1999(ts, sensor_id); create unique index samples_2000_bytimeid on samples_2000(ts, sensor_id); create unique index samples_2001_bytimeid on samples_2001(ts, sensor_id); create unique index samples_2002_bytimeid on samples_2002(ts, sensor_id); create unique index samples_2003_bytimeid on samples_2003(ts, sensor_id); create unique index samples_2004_bytimeid on samples_2004(ts, sensor_id); The tables contain the following number of rows: samples_1999 311030 samples_2000 2142245 samples_2001 2706571 samples_2002 3111602 samples_2003 3149316 samples_2004 2375972 The following view creates the illusion of the old ``single-table'' model: create view samples as select * from samples_1999 union select * from samples_2000 union select * from samples_2001 union select * from samples_2002 union select * from samples_2003 union select * from samples_2004 ...along with the following rule on the view for the applications performing inserts: create rule sample_rule as on insert to samples do instead insert into samples_2004 (ts, sensor_id, sample) values(new.ts, new.sensor_id, new.sample) OK, now that that's over with, I have this one particular query that I attempt to run for a report from my phone that no longer works because it tries to do a table scan on *some* of the tables. Why it chooses this table scan, I can't imagine. The query is as follows: select s.serial as serial_num, s.name as name, date(ts) as day, min(sample) as min_temp, avg(sample) as avg_temp, stddev(sample) as stddev_temp, max(sample) as max_temp from samples inner join sensors s using (sensor_id) where ts > current_date - 7 group by serial_num, name, day order by serial_num, day desc explain analyze reports the following (sorry for the horrible wrapping): Sort (cost=1185281.45..1185285.95 rows=1800 width=50) (actual time=82832.106..82832.147 rows=56 loops=1) Sort Key: s.serial, date(samples.ts) -> HashAggregate (cost=1185161.62..1185184.12 rows=1800 width=50) (actual time=82830.624..82831.601 rows=56 loops=1) -> Hash Join (cost=1063980.21..1181539.96 rows=206952 width=50) (actual time=80408.123..81688.590 rows=66389 loops=1) Hash Cond: ("outer".sensor_id = "inner".sensor_id) -> Subquery Scan samples (cost=1063979.10..1155957.38 rows=4598914 width=20) (actual time=80392.477..80922.764 rows=66389 loops=1) -> Unique (cost=1063979.10..1109968.24 rows=4598914 width=20) (actual time=80392.451..80646.761 rows=66389 loops=1) -> Sort (cost=1063979.10..1075476.39 rows=4598914 width=20) (actual time=80392.437..80442.787 rows=66389 loops=1) Sort Key: ts, sensor_id, sample -> Append (cost=0.00..312023.46 rows=4598914 width=20) (actual time=79014.428..80148.396 rows=66389 loops=1) -> Subquery Scan "*SELECT* 1" (cost=0.00..9239.37 rows=103677 width=20) (actual time=4010.181..4010.181 rows=0 loops=1) -> Seq Scan on samples_1999 (cost=0.00..8202.60 rows=103677 width=20) (actual time=4010.165..4010.165 rows=0 loops=1) Filter: (ts > ((('now'::text)::date - 7))::timestamp without time zone) -> Subquery Scan "*SELECT* 2" (cost=0.00..28646.17 rows=714082 width=20) (actual time=44.827..44.827 rows=0 loops=1) -> Index Scan using samples_2000_bytime on samples_2000 (cost=0.00..21505.35 rows=714082 width=20) (actual time=44.818..44.818 rows=0 loops=1) Index Cond: (ts > ((('now'::text)::date - 7))::timestamp without time zone) -> Subquery Scan "*SELECT* 3" (cost=0.00..80393.33 rows=902191 width=20) (actual time=34772.377..34772.377 rows=0 loops=1) -> Seq Scan on samples_2001 (cost=0.00..71371.42 rows=902191 width=20) (actual time=34772.366..34772.366 rows=0 loops=1) Filter: (ts > ((('now'::text)::date - 7))::timestamp without time zone) -> Subquery Scan "*SELECT* 4" (cost=0.00..92424.05 rows=1037201 width=20) (actual time=40072.103..40072.103 rows=0 loops=1) -> Seq Scan on samples_2002 (cost=0.00..82052.04 rows=1037201 width=20) (actual time=40072.090..40072.090 rows=0 loops=1) Filter: (ts > ((('now'::text)::date - 7))::timestamp without time zone) -> Subquery Scan "*SELECT* 5" (cost=0.00..42380.58 rows=1049772 width=20) (actual time=49.455..49.455 rows=0 loops=1) -> Index Scan using samples_2003_bytime on samples_2003 (cost=0.00..31882.86 rows=1049772 width=20) (actual time=49.448..49.448 rows=0 loops=1) Index Cond: (ts > ((('now'::text)::date - 7))::timestamp without time zone) -> Subquery Scan "*SELECT* 6" (cost=0.00..58939.96 rows=791991 width=20) (actual time=65.458..1124.363 rows=66389 loops=1) -> Index Scan using samples_2004_bytime on samples_2004 (cost=0.00..51020.05 rows=791991 width=20) (actual time=65.430..750.336 rows=66389 loops=1) Index Cond: (ts > ((('now'::text)::date - 7))::timestamp without time zone) -> Hash (cost=1.09..1.09 rows=9 width=38) (actual time=15.295..15.295 rows=0 loops=1) -> Seq Scan on sensors s (cost=0.00..1.09 rows=9 width=38) (actual time=15.122..15.187 rows=9 loops=1) Total runtime: 82865.119 ms Essentially, what you can see here is that it's doing an index scan on samples_2000, samples_2003, and samples_2004, but a sequential scan on samples_1999, samples_2001, and samples_2002. It's very strange to me that it would make these choices. If I disable sequential scans altogether for this session, the query runs in under 4 seconds. This is a very cool solution for long-term storage, and isn't terribly hard to manage. I actually have other report queries that seem to be making pretty good index selection currently...but I always want more! :) Does anyone have any suggestions as to how to get this to do what I want? Of course, ideally, it would ignore five of the tables altogether. :) -- SPY My girlfriend asked me which one I like better. pub 1024/3CAE01D5 1994/11/03 Dustin Sallings <dustin@spy.net> | Key fingerprint = 87 02 57 08 02 D0 DA D6 C8 0F 3E 65 51 98 D8 BE L_______________________ I hope the answer won't upset her. ____________
Dustin Sallings wrote: > [...] > OK, now that that's over with, I have this one particular query that > I attempt to run for a report from my phone that no longer works because > it tries to do a table scan on *some* of the tables. Why it chooses > this table scan, I can't imagine. The query is as follows: > > select > s.serial as serial_num, > s.name as name, > date(ts) as day, > min(sample) as min_temp, > avg(sample) as avg_temp, > stddev(sample) as stddev_temp, > max(sample) as max_temp > from > samples inner join sensors s using (sensor_id) > where > ts > current_date - 7 > group by > serial_num, name, day > order by > serial_num, day desc > > [ next section heavily clipped for clarity ] -> Seq Scan on samples_1999 (cost rows=103677) (actual rows=0 loops=1) -> Index Scan using samples_2000_bytime on samples_2000 (cost rows=714082 (actual rows=0 loops=1) -> Seq Scan on samples_2001 (cost rows=902191) (actual rows=0 loops=1) -> Seq Scan on samples_2002 (cost rows=1037201) (actual rows=0 loops=1) -> Index Scan using samples_2003_bytime on samples_2003 (cost rows=1049772) (actual rows=0 loops=1) -> Index Scan using samples_2004_bytime on samples_2004 (cost rows=791991) (actual rows=66389 loops=1) [...] > > > Essentially, what you can see here is that it's doing an index scan > on samples_2000, samples_2003, and samples_2004, but a sequential scan > on samples_1999, samples_2001, and samples_2002. It's very strange to > me that it would make these choices. If I disable sequential scans > altogether for this session, the query runs in under 4 seconds. > > This is a very cool solution for long-term storage, and isn't > terribly hard to manage. I actually have other report queries that seem > to be making pretty good index selection currently...but I always want > more! :) Does anyone have any suggestions as to how to get this to do > what I want? > > Of course, ideally, it would ignore five of the tables altogether. :) > > -- > SPY My girlfriend asked me which one I like better. > pub 1024/3CAE01D5 1994/11/03 Dustin Sallings <dustin@spy.net> > | Key fingerprint = 87 02 57 08 02 D0 DA D6 C8 0F 3E 65 51 98 D8 BE > L_______________________ I hope the answer won't upset her. ____________ > > Just as a heads up. You have run vacuum analyze before running this query, correct? Because you'll notice that the query planner is thinking that it will have 103677 rows from 1999, 700,000 rows from 2000, 900,000 rows from 2001, etc, etc. Obviously the query planner is not planning well considering it there are only 60,000 rows from 2004, and no rows from anything else. It just seems like it hasn't updated it's statistics to be aware of when the time is on most of the tables. (By the way, an indexed scan returning 0 entries is *really* fast, so I wouldn't worry about ignoring the extra tables. :) I suppose the other question is whether this is a prepared or stored query. Because sometimes the query planner cannot do enough optimization in a stored query. (I ran into this problem where I had 1 column with 500,000+ entries referencing 1 number. If I ran manually, the time was much better because I wasn't using *that* number. With a stored query, it had to take into account that I *might* use that number, and didn't want to do 500,000+ indexed lookups) The only other thing I can think of is that there might be some collision between datetime and date. Like it is thinking it is looking at the time of day when it plans the queries (hence why so many rows), but really it is looking at the date. Perhaps a cast is in order to make it work right. I don't really know. Interesting problem, though. John =:->
Attachment
Dustin Sallings <dustin@spy.net> writes: > The following view creates the illusion of the old ``single-table'' > model: > create view samples as > select * from samples_1999 > union select * from samples_2000 > union select * from samples_2001 > union select * from samples_2002 > union select * from samples_2003 > union select * from samples_2004 You really, really, really want to use UNION ALL not UNION here. > OK, now that that's over with, I have this one particular query that I > attempt to run for a report from my phone that no longer works because > it tries to do a table scan on *some* of the tables. Why it chooses > this table scan, I can't imagine. Most of the problem here comes from the fact that "current_date - 7" isn't reducible to a constant and so the planner is making bad guesses about how much of each table will be scanned. If possible, do the date arithmetic on the client side and send over a simple literal constant. If that's not practical you can fake it with a mislabeled IMMUTABLE function --- see the list archives for previous discussions of the same issue. regards, tom lane
Dustin Sallings wrote: > The following view creates the illusion of the old ``single-table'' > model: > > create view samples as > select * from samples_1999 > union select * from samples_2000 > union select * from samples_2001 > union select * from samples_2002 > union select * from samples_2003 > union select * from samples_2004 Try this with UNION ALL (you know there won't be any duplicates) and possibly with some limits too: SELECT * FROM samples_1999 WHERE ts BETWEEN '1999-01-01 00:00:00+00' AND '1999-12-31 11:59:59+00' UNION ALL ... > select > s.serial as serial_num, > s.name as name, > date(ts) as day, > min(sample) as min_temp, > avg(sample) as avg_temp, > stddev(sample) as stddev_temp, > max(sample) as max_temp > from > samples inner join sensors s using (sensor_id) > where > ts > current_date - 7 > group by > serial_num, name, day > order by > serial_num, day desc Try restricting the timestamp too WHERE ts BETWEEN (current_date -7) AND current_timestamp Hopefully that will give the planner enough smarts to know it can skip most of the sample_200x tables. -- Richard Huxton Archonet Ltd
Tom, > Most of the problem here comes from the fact that "current_date - 7" > isn't reducible to a constant and so the planner is making bad guesses > about how much of each table will be scanned. I thought this was fixed in 7.4. No? -- Josh Berkus Aglio Database Solutions San Francisco
Josh Berkus <josh@agliodbs.com> writes: >> Most of the problem here comes from the fact that "current_date - 7" >> isn't reducible to a constant and so the planner is making bad guesses >> about how much of each table will be scanned. > I thought this was fixed in 7.4. No? No. It's not fixed as of CVS tip either, although there was some talk of doing something in time for 8.0. regards, tom lane
On R, 2004-10-01 at 19:34, Tom Lane wrote: > Josh Berkus <josh@agliodbs.com> writes: > >> Most of the problem here comes from the fact that "current_date - 7" > >> isn't reducible to a constant and so the planner is making bad guesses > >> about how much of each table will be scanned. > > > I thought this was fixed in 7.4. No? > > No. It's not fixed as of CVS tip either, although there was some talk > of doing something in time for 8.0. That's weird - my 7.4.2 databases did not consider (now()-'15 min'::interval) to be a constant whereas 7.4.5 does (i.e. it does use index scan on index on datetime column) Is this somehow different for date types ? -------------- Hannu
Hannu Krosing <hannu@tm.ee> writes: >> No. It's not fixed as of CVS tip either, although there was some talk >> of doing something in time for 8.0. > That's weird - my 7.4.2 databases did not consider (now()-'15 > min'::interval) to be a constant whereas 7.4.5 does (i.e. it does use > index scan on index on datetime column) The question isn't whether it can use it as an indexscan bound; the question is whether it can derive an accurate rowcount estimate. The issue is exactly that STABLE functions work for one but not the other. regards, tom lane