Thread: strange query plans

strange query plans

From
Chris Jones
Date:
PG seems to be choosing a sub-optimal query plan.  It's doing a
sequential scan of a 120000-tuple table, instead of an index scan for
the 16 matching rows.  Running PG 7.0.2:

fastfacts=> vacuum analyze event;
VACUUM
fastfacts=> explain select type from event where type = 'IPOETC_EVENT';
NOTICE:  QUERY PLAN:

Seq Scan on event  (cost=0.00..6664.25 rows=6224 width=12)

EXPLAIN
fastfacts=> select count(*) from event where type = 'IPOETC_EVENT';
 count
-------
    16
(1 row)

fastfacts=> \d event_type_key
Index "event_type_key"
 Attribute | Type
-----------+------
 type      | text
btree


fastfacts=> select count(*) from event;
 count
--------
 126580
(1 row)


I know that PG is frequently smarter than I am, but this doesn't seem
like a case where it's made a good decision.  What am I missing?

Chris

--
----------------------------------------------------- chris@mt.sri.com
Chris Jones                                    SRI International, Inc.

Re: strange query plans

From
Stephan Szabo
Date:
On 30 Nov 2000, Chris Jones wrote:

> PG seems to be choosing a sub-optimal query plan.  It's doing a
> sequential scan of a 120000-tuple table, instead of an index scan for
> the 16 matching rows.  Running PG 7.0.2:
>
> fastfacts=> vacuum analyze event;
> VACUUM
> fastfacts=> explain select type from event where type = 'IPOETC_EVENT';
> NOTICE:  QUERY PLAN:
>
> Seq Scan on event  (cost=0.00..6664.25 rows=6224 width=12)

> I know that PG is frequently smarter than I am, but this doesn't seem
> like a case where it's made a good decision.  What am I missing?

postgres is expecting 6224 rows to match rather than the 16 that are
actually there.  Usual questions are has this table been vacuum analyzed
recently and is there a very common value that is much more common
than other data in the column (this throws off the estimates).



Re: strange query plans

From
Chris Jones
Date:
Stephan Szabo <sszabo@megazone23.bigpanda.com> writes:

> On 30 Nov 2000, Chris Jones wrote:
>
> > PG seems to be choosing a sub-optimal query plan.  It's doing a
> > sequential scan of a 120000-tuple table, instead of an index scan for
> > the 16 matching rows.  Running PG 7.0.2:
> >
> > fastfacts=> vacuum analyze event;
> > VACUUM
> > fastfacts=> explain select type from event where type = 'IPOETC_EVENT';
> > NOTICE:  QUERY PLAN:
> >
> > Seq Scan on event  (cost=0.00..6664.25 rows=6224 width=12)
>
> > I know that PG is frequently smarter than I am, but this doesn't seem
> > like a case where it's made a good decision.  What am I missing?
>
> postgres is expecting 6224 rows to match rather than the 16 that are
> actually there.  Usual questions are has this table been vacuum analyzed
> recently and is there a very common value that is much more common
> than other data in the column (this throws off the estimates).

It was vacuum analyzed immediately before, and the key distribution is
anything but uniform.  The frequencies range from 16 to 64337.  I'm
assuming that PG's query optimizer isn't well-informed enough to
correctly make this kind of decision.  Is there any way I could force
it?  I have two types which account for about 90% of the data in the
table; they won't ever have this particular query run on them.  The
other 10% of the table could really benefit from an Index Scan.

The best workaround I've found so far is to create a column called
"sillytype", which is the same as type, except that it's null for
those two values.  Now it uses the index, but I'm not entirely
comfortable with this solution.

Chris

--
----------------------------------------------------- chris@mt.sri.com
Chris Jones                                    SRI International, Inc.

Re: strange query plans

From
Stephan Szabo
Date:
On 4 Dec 2000, Chris Jones wrote:

> Stephan Szabo <sszabo@megazone23.bigpanda.com> writes:
>
> > On 30 Nov 2000, Chris Jones wrote:
> >
> > > PG seems to be choosing a sub-optimal query plan.  It's doing a
> > > sequential scan of a 120000-tuple table, instead of an index scan for
> > > the 16 matching rows.  Running PG 7.0.2:
> > >
> > > fastfacts=> vacuum analyze event;
> > > VACUUM
> > > fastfacts=> explain select type from event where type = 'IPOETC_EVENT';
> > > NOTICE:  QUERY PLAN:
> > >
> > > Seq Scan on event  (cost=0.00..6664.25 rows=6224 width=12)
> >
> > > I know that PG is frequently smarter than I am, but this doesn't seem
> > > like a case where it's made a good decision.  What am I missing?
> >
> > postgres is expecting 6224 rows to match rather than the 16 that are
> > actually there.  Usual questions are has this table been vacuum analyzed
> > recently and is there a very common value that is much more common
> > than other data in the column (this throws off the estimates).
>
> It was vacuum analyzed immediately before, and the key distribution is
> anything but uniform.  The frequencies range from 16 to 64337.  I'm
> assuming that PG's query optimizer isn't well-informed enough to
> correctly make this kind of decision.  Is there any way I could force
> it?  I have two types which account for about 90% of the data in the
> table; they won't ever have this particular query run on them.  The
> other 10% of the table could really benefit from an Index Scan.

Yeah, that'd do it.  I believe that it takes a fraction of the most
common value as the expected (except for that value maybe, don't know)
and thinks there are alot of rows to get.  A usual temporary hack is
to do something like you mentioned (using nulls) because those are
handled separately.  :(  There's been alot of talk about keeping better
statistics.

There is a set command which will try to force the optimizer to not try a
certain plan.  I don't remember what the exact names are for the options,
but it's come up in the mailing list archives.  The problem here is that
you have to remember to send the set before and then set it back
afterwards to not bias the optimizer for other cases.

> The best workaround I've found so far is to create a column called
> "sillytype", which is the same as type, except that it's null for
> those two values.  Now it uses the index, but I'm not entirely
> comfortable with this solution.