Re: [SQL] optimizer not using an index... - Mailing list pgsql-sql

From Tom Lane
Subject Re: [SQL] optimizer not using an index...
Date
Msg-id 6737.935765619@sss.pgh.pa.us
Whole thread Raw
In response to optimizer not using an index...  (Howie <caffeine@toodarkpark.org>)
Responses Re: [SQL] optimizer not using an index...
List pgsql-sql
Howie <caffeine@toodarkpark.org> writes:
> explain reveals that postgres ( 6.5.0 ) isnt using some of my indexes,
> opting instead for a complete table scan ( and drastically slowing things
> down ).

Well, mumble.  The optimizer certainly needs work, but what's your
evidence that this query would be faster if done another way?  Hashing
the smaller tables, as it's doing, ought to be a pretty good strategy.

One way to check is to start your client with environment variable
setting PGOPTIONS="-fh" ("forbid hashjoin") to discourage the optimizer
from using hashes, then check the generated plan for the same query and
see what its actual runtime is.  That's likely to be a suboptimal plan
however, since it'll turn off *all* hashing.  The hash on customers is
probably the thing that's bothering you.

How many result rows do you actually get from this query?  If you
eliminate the customers table from the query, and just do the same join
among the remaining tables, how many rows do you get?  I suspect the
optimizer is drastically off in its estimate of ~300k result rows, and
that's contributing to the problem.

> doing a complete table scan on a column thats indexed isnt really that
> nice, especially since there are 12,000 entries in it.

But it's estimating it's going to have to probe that table 300k times,
which makes the hashjoin look mighty attractive...

> interestingly, when querying on "pincodes.code" instead of
> "customers.name", postgres does NOT use a full table scan; it uses the
> proper indexes:

> Hash Join  (cost=23.78 rows=5 width=222)

Note the drastic difference in the estimated result-row count; that's
undoubtedly what's changing the optimizer's choice of what to do.  You
haven't given us enough detail to understand why this query would be
(or at least seem) more selective than the other, however.

Anyway, this looks to me like it is probably a symptom of poor
selectivity estimation leading to bogus estimates of output row counts
leading to a nonoptimal plan choice.  I have been working on improving
the selectivity estimation for 6.6, and am looking for test cases to
check out the logic on.  Is your database small enough/non proprietary
enough that you could send me a dump?  Or could you strip it down to
a test case that still exhibits the same misbehavior?  If you don't
like either of those, perhaps you could grab a current snapshot, install
your data in a test postmaster, and report back on whether it acts any
different...
        regards, tom lane


pgsql-sql by date:

Previous
From: Tom Lane
Date:
Subject: Re: [SQL] Inheritance and DELETE
Next
From: Howie
Date:
Subject: Re: [SQL] optimizer not using an index...