Thread: strange query plans
PG seems to be choosing a sub-optimal query plan. It's doing a sequential scan of a 120000-tuple table, instead of an index scan for the 16 matching rows. Running PG 7.0.2: fastfacts=> vacuum analyze event; VACUUM fastfacts=> explain select type from event where type = 'IPOETC_EVENT'; NOTICE: QUERY PLAN: Seq Scan on event (cost=0.00..6664.25 rows=6224 width=12) EXPLAIN fastfacts=> select count(*) from event where type = 'IPOETC_EVENT'; count ------- 16 (1 row) fastfacts=> \d event_type_key Index "event_type_key" Attribute | Type -----------+------ type | text btree fastfacts=> select count(*) from event; count -------- 126580 (1 row) I know that PG is frequently smarter than I am, but this doesn't seem like a case where it's made a good decision. What am I missing? Chris -- ----------------------------------------------------- chris@mt.sri.com Chris Jones SRI International, Inc.
On 30 Nov 2000, Chris Jones wrote: > PG seems to be choosing a sub-optimal query plan. It's doing a > sequential scan of a 120000-tuple table, instead of an index scan for > the 16 matching rows. Running PG 7.0.2: > > fastfacts=> vacuum analyze event; > VACUUM > fastfacts=> explain select type from event where type = 'IPOETC_EVENT'; > NOTICE: QUERY PLAN: > > Seq Scan on event (cost=0.00..6664.25 rows=6224 width=12) > I know that PG is frequently smarter than I am, but this doesn't seem > like a case where it's made a good decision. What am I missing? postgres is expecting 6224 rows to match rather than the 16 that are actually there. Usual questions are has this table been vacuum analyzed recently and is there a very common value that is much more common than other data in the column (this throws off the estimates).
Stephan Szabo <sszabo@megazone23.bigpanda.com> writes: > On 30 Nov 2000, Chris Jones wrote: > > > PG seems to be choosing a sub-optimal query plan. It's doing a > > sequential scan of a 120000-tuple table, instead of an index scan for > > the 16 matching rows. Running PG 7.0.2: > > > > fastfacts=> vacuum analyze event; > > VACUUM > > fastfacts=> explain select type from event where type = 'IPOETC_EVENT'; > > NOTICE: QUERY PLAN: > > > > Seq Scan on event (cost=0.00..6664.25 rows=6224 width=12) > > > I know that PG is frequently smarter than I am, but this doesn't seem > > like a case where it's made a good decision. What am I missing? > > postgres is expecting 6224 rows to match rather than the 16 that are > actually there. Usual questions are has this table been vacuum analyzed > recently and is there a very common value that is much more common > than other data in the column (this throws off the estimates). It was vacuum analyzed immediately before, and the key distribution is anything but uniform. The frequencies range from 16 to 64337. I'm assuming that PG's query optimizer isn't well-informed enough to correctly make this kind of decision. Is there any way I could force it? I have two types which account for about 90% of the data in the table; they won't ever have this particular query run on them. The other 10% of the table could really benefit from an Index Scan. The best workaround I've found so far is to create a column called "sillytype", which is the same as type, except that it's null for those two values. Now it uses the index, but I'm not entirely comfortable with this solution. Chris -- ----------------------------------------------------- chris@mt.sri.com Chris Jones SRI International, Inc.
On 4 Dec 2000, Chris Jones wrote: > Stephan Szabo <sszabo@megazone23.bigpanda.com> writes: > > > On 30 Nov 2000, Chris Jones wrote: > > > > > PG seems to be choosing a sub-optimal query plan. It's doing a > > > sequential scan of a 120000-tuple table, instead of an index scan for > > > the 16 matching rows. Running PG 7.0.2: > > > > > > fastfacts=> vacuum analyze event; > > > VACUUM > > > fastfacts=> explain select type from event where type = 'IPOETC_EVENT'; > > > NOTICE: QUERY PLAN: > > > > > > Seq Scan on event (cost=0.00..6664.25 rows=6224 width=12) > > > > > I know that PG is frequently smarter than I am, but this doesn't seem > > > like a case where it's made a good decision. What am I missing? > > > > postgres is expecting 6224 rows to match rather than the 16 that are > > actually there. Usual questions are has this table been vacuum analyzed > > recently and is there a very common value that is much more common > > than other data in the column (this throws off the estimates). > > It was vacuum analyzed immediately before, and the key distribution is > anything but uniform. The frequencies range from 16 to 64337. I'm > assuming that PG's query optimizer isn't well-informed enough to > correctly make this kind of decision. Is there any way I could force > it? I have two types which account for about 90% of the data in the > table; they won't ever have this particular query run on them. The > other 10% of the table could really benefit from an Index Scan. Yeah, that'd do it. I believe that it takes a fraction of the most common value as the expected (except for that value maybe, don't know) and thinks there are alot of rows to get. A usual temporary hack is to do something like you mentioned (using nulls) because those are handled separately. :( There's been alot of talk about keeping better statistics. There is a set command which will try to force the optimizer to not try a certain plan. I don't remember what the exact names are for the options, but it's come up in the mailing list archives. The problem here is that you have to remember to send the set before and then set it back afterwards to not bias the optimizer for other cases. > The best workaround I've found so far is to create a column called > "sillytype", which is the same as type, except that it's null for > those two values. Now it uses the index, but I'm not entirely > comfortable with this solution.