Thread: optimizer not using an index...

optimizer not using an index...

From
Howie
Date:
SELECT  customers.usedby,   customers.hostname,   customers.email,       customers.valid,    customers.signuptime,
pincodes.codenum,      pincodes.code,      subaccts.type,        subaccts.batch,        subaccts.renew,
subaccts.active,     subaccts.price,       subaccts.rebill,    domains.name,         domains.client,
domains.authtype,  ibill.login,          ibill.passwd,        customers.transnum, customers.subscr
 
FROM    pincodes, subaccts, customers, domains, owners, ibill
WHERE   pincodes.codenum=customers.codenum AND       pincodes.type=subaccts.type AND
domains.client=subaccts.clientAND       domains.client=owners.client AND       ibill.client=domains.client AND
customers.email='caffeine@toodarkpark.org';

explain reveals that postgres ( 6.5.0 ) isnt using some of my indexes,
opting instead for a complete table scan ( and drastically slowing things
down ).

Hash Join  (cost=48272.95 rows=293465 width=222) ->  Nested Loop  (cost=20598.72 rows=376206 width=138)       ->  Hash
Join (cost=58.73 rows=413 width=118)             ->  Seq Scan on subaccts  (cost=17.51 rows=379 width=58)
-> Hash  (cost=24.84 rows=87 width=60)                   ->  Hash Join  (cost=24.84 rows=87 width=60)
     ->  Hash Join  (cost=14.56 rows=81 width=56)                               ->  Seq Scan on ibill  (cost=3.64
rows=80width=28)                               ->  Hash  (cost=4.64 rows=80 width=28)
 ->  Seq Scan on domains  (cost=4.64 rows=80 width=28)                         ->  Hash  (cost=3.81 rows=85 width=4)
                          ->  Seq Scan on owners  (cost=3.81 rows=85 width=4)       ->  Index Scan using codes_type_idx
onpincodes  (cost=49.73 rows=345235 width=20) ->  Hash  (cost=546.46 rows=8757 width=84)       ->  Seq Scan on
customers (cost=546.46 rows=8757 width=84)
 

I have an index on customers.name, subaccts.type, ibill.client,
owners.client... every column thats being queried on.  tables are all
vacuum analyze'ed ( the DEBUG notice shows that postgres does indeed 'see'
the indexes, which are all btree's, btw ).

customers table has 12k entries,  pincodes has 350k, ibill as 80, domains
has 80, owners has 80, subaccts has 380. 

doing a complete table scan on a column thats indexed isnt really that
nice, especially since there are 12,000 entries in it.  why postgres
chooses to use table scans on other tables is also beyond me: "explain
select * from (ibill|domains|owners|subaccts) where client=1" uses the
proper index.  the hash join and nested loop also bug me; thats a lot of
rows to cycle through.  interestingly, when querying on "pincodes.code"
instead of "customers.name", postgres does NOT use a full table scan; it uses 
the proper indexes:

Hash Join  (cost=23.78 rows=5 width=222) ->  Seq Scan on ibill  (cost=3.64 rows=80 width=28) ->  Hash  (cost=16.37
rows=4width=194)       ->  Nested Loop  (cost=16.37 rows=4 width=194)             ->  Nested Loop  (cost=10.22 rows=3
width=190)                  ->  Nested Loop  (cost=6.12 rows=2 width=162)                         ->  Nested Loop
(cost=4.07rows=1 width=104)                               ->  Index Scan using codes_code_idx on pincodes  (cost=2.02
rows=1width=20)                               ->  Index Scan using users_pkey on customers (cost=2.05 rows=11226
width=84)                        ->  Index Scan using types_pkey on subaccts (cost=2.05 rows=379 width=58)
    ->  Index Scan using doms_pkey on domains  (cost=2.05 rows=80 width=28)             ->  Index Scan using
owner_client_idxon owners (cost=2.05 rows=85 width=4)
 
so what gives ?  the two queries are 90% identical apart from the column
that's being keyed on ( customers.name -vs- pincodes.code ).

any help would be MOST appreciated.

---
Howie <caffeine@toodarkpark.org>   URL: http://www.toodarkpark.org    
"The distance between insanity and genius is measured only by success."



Re: [SQL] optimizer not using an index...

From
Tom Lane
Date:
Howie <caffeine@toodarkpark.org> writes:
> explain reveals that postgres ( 6.5.0 ) isnt using some of my indexes,
> opting instead for a complete table scan ( and drastically slowing things
> down ).

Well, mumble.  The optimizer certainly needs work, but what's your
evidence that this query would be faster if done another way?  Hashing
the smaller tables, as it's doing, ought to be a pretty good strategy.

One way to check is to start your client with environment variable
setting PGOPTIONS="-fh" ("forbid hashjoin") to discourage the optimizer
from using hashes, then check the generated plan for the same query and
see what its actual runtime is.  That's likely to be a suboptimal plan
however, since it'll turn off *all* hashing.  The hash on customers is
probably the thing that's bothering you.

How many result rows do you actually get from this query?  If you
eliminate the customers table from the query, and just do the same join
among the remaining tables, how many rows do you get?  I suspect the
optimizer is drastically off in its estimate of ~300k result rows, and
that's contributing to the problem.

> doing a complete table scan on a column thats indexed isnt really that
> nice, especially since there are 12,000 entries in it.

But it's estimating it's going to have to probe that table 300k times,
which makes the hashjoin look mighty attractive...

> interestingly, when querying on "pincodes.code" instead of
> "customers.name", postgres does NOT use a full table scan; it uses the
> proper indexes:

> Hash Join  (cost=23.78 rows=5 width=222)

Note the drastic difference in the estimated result-row count; that's
undoubtedly what's changing the optimizer's choice of what to do.  You
haven't given us enough detail to understand why this query would be
(or at least seem) more selective than the other, however.

Anyway, this looks to me like it is probably a symptom of poor
selectivity estimation leading to bogus estimates of output row counts
leading to a nonoptimal plan choice.  I have been working on improving
the selectivity estimation for 6.6, and am looking for test cases to
check out the logic on.  Is your database small enough/non proprietary
enough that you could send me a dump?  Or could you strip it down to
a test case that still exhibits the same misbehavior?  If you don't
like either of those, perhaps you could grab a current snapshot, install
your data in a test postmaster, and report back on whether it acts any
different...
        regards, tom lane


Re: [SQL] optimizer not using an index...

From
Howie
Date:
On Fri, 27 Aug 1999, Tom Lane wrote:

> Howie <caffeine@toodarkpark.org> writes:
> > explain reveals that postgres ( 6.5.0 ) isnt using some of my indexes,
> > opting instead for a complete table scan ( and drastically slowing things
> > down ).
> 
> Well, mumble.  The optimizer certainly needs work, but what's your
> evidence that this query would be faster if done another way?  Hashing
> the smaller tables, as it's doing, ought to be a pretty good strategy.

primary focus was on why pgsql decided to do a complete table scan of
customers ( searching for customer.email ) when there's an index on it;
surely an index scan would be loads faster than a complete table scan.

the other interesting item i ran into was why one query used the indexes
provided and the other didnt.  granted, pincodes has at least 10x more
entries than customers does ( ~300k+ vs ~11k ).

> One way to check is to start your client with environment variable
> setting PGOPTIONS="-fh" ("forbid hashjoin") to discourage the optimizer
> from using hashes, then check the generated plan for the same query and
> see what its actual runtime is.  That's likely to be a suboptimal plan
> however, since it'll turn off *all* hashing.  The hash on customers is
> probably the thing that's bothering you.

that and pgsql not using the indexes that the query on pincodes.code does.

using -fh causes pgsql to use all the proper indexes, but still beefs up 4
merge joins, seq scans, sorts, and nested loops:

[excerpt]

Merge Join  (cost=355979.91 rows=293465 width=222) ->  Seq Scan  (cost=342501.81 rows=376206 width=138)       ->  Sort
(cost=342501.81rows=376206 width=138)             ->  Nested Loop  (cost=20616.32 rows=376206 width=138)
  ->  Merge Join  (cost=76.34 rows=413 width=118)                         ->  Merge Join  (cost=33.01 rows=87 width=60)
                             ->  Merge Join  (cost=20.28 rows=81 width=56)
 

( query on customers.email )

> How many result rows do you actually get from this query?  

when querying on pincodes.code, i get 1 row back.  when querying on
customers.email, i get 4 rows back.  it's producing the correct results,
its just going about getting those results in a somewhat odd manner.

> If you eliminate the customers table from the query, and just do the
> same join among the remaining tables, how many rows do you get? 

1, as expected.

Hash Join  (cost=21.73 rows=5 width=138) ->  Seq Scan on ibill  (cost=3.64 rows=80 width=28) ->  Hash  (cost=14.32
rows=4width=110)       ->  Nested Loop  (cost=14.32 rows=4 width=110)             ->  Nested Loop  (cost=8.17 rows=3
width=106)                  ->  Nested Loop  (cost=4.07 rows=2 width=78)                         ->  Index Scan using
codes_code_idxon pincodes (cost=2.02 rows=1 width=20)                         ->  Index Scan using types_pkey on
subaccts(cost=2.05 rows=379 width=58)                   ->  Index Scan using doms_pkey on domains  (cost=2.05 rows=80
width=28)            ->  Index Scan using owner_client_idx on owners  (cost=2.05 rows=85 width=4)
 

> I suspect the optimizer is drastically off in its estimate of ~300k
> result rows, and that's contributing to the problem. 

yes, so why is it off for one query but right on target for the other ?
more importantly, why is it chosing to use indexes for one query yet
chosing to do complete table scans for the other ( even though the two
queries are almost identical ) ?

answer that and i'll personally treat you to a beer.  :)

> > doing a complete table scan on a column thats indexed isnt really that
> > nice, especially since there are 12,000 entries in it.
> 
> But it's estimating it's going to have to probe that table 300k times,
> which makes the hashjoin look mighty attractive...

why would it have to probe pincodes 300k times when there's a unique index
on pincodes ( pincodes.codenum ) and a unique index on customers (
customers.codenum ) ?

> > interestingly, when querying on "pincodes.code" instead of
> > "customers.name", postgres does NOT use a full table scan; it uses the
> > proper indexes:
> 
> > Hash Join  (cost=23.78 rows=5 width=222)
> 
> Note the drastic difference in the estimated result-row count; that's
> undoubtedly what's changing the optimizer's choice of what to do.  You
> haven't given us enough detail to understand why this query would be
> (or at least seem) more selective than the other, however.

im not sure it would; i placed indexes in ( what i thought were ) all the
proper places - pincodes.code, pincodes.codenum, customers.name,
customers.codenum... every column that's being queried on or joined on.
ideally, all the indexes would be used and querying on customers.email
would be ( query-plan wise ) almost identical to querying on
pincodes.code; only an index would change.

> Anyway, this looks to me like it is probably a symptom of poor
> selectivity estimation leading to bogus estimates of output row counts
> leading to a nonoptimal plan choice.  I have been working on improving
> the selectivity estimation for 6.6, and am looking for test cases to
> check out the logic on.  Is your database small enough/non proprietary
> enough that you could send me a dump?  

i could send you a dump, but the db is fairly large; 7m uncompressed, 2m
gzip -9'ed.  none of the data is overly sensetive.  i am, however, on an
overly lagged 28k8.

> Or could you strip it down to
> a test case that still exhibits the same misbehavior?  

i havent tried trimming the db down.  im not sure that would 'fix' the
problem if the optimizer is misguessing the number of rows it's going to
have to look at...

> If you don't
> like either of those, perhaps you could grab a current snapshot, install
> your data in a test postmaster, and report back on whether it acts any
> different...

over the weekend im planning on upgrading to 6.5.1, but i dont recall
seeing any changes to the optimizer in the changelog...

---
Howie <caffeine@toodarkpark.org>   URL: http://www.toodarkpark.org    
"The distance between insanity and genius is measured only by success."





Re: [SQL] optimizer not using an index...

From
Tom Lane
Date:
Howie <caffeine@toodarkpark.org> writes:
> On Fri, 27 Aug 1999, Tom Lane wrote:
>> Well, mumble.  The optimizer certainly needs work, but what's your
>> evidence that this query would be faster if done another way?  Hashing
>> the smaller tables, as it's doing, ought to be a pretty good strategy.

> primary focus was on why pgsql decided to do a complete table scan of
> customers ( searching for customer.email ) when there's an index on it;
> surely an index scan would be loads faster than a complete table scan.

No, an index scan will be loads slower if most or all of the table has
to be visited.  If it were always faster, the optimizer would have a
much easier task ;-).  The join we are considering here is on
pincodes.codenum = customers.codenum, and since the system (mistakenly)
thinks it is going to be joining a lot of rows from pincodes, doing it
with a hash rather than an indexscan on customers.codenum looks good.

You bring up a good point though: why isn't it deciding to use the
restriction on customer.email for an indexscan of customers, using the
index on email rather than the one on codenum?  (I assume you have both)

It looks like the thing is misestimating the selectivity of that
restriction too: there are not going to be 8757 customer rows that pass
that WHERE clause --- but the system thinks that, and that's why it's
rejecting the indexscan as being slower than the sequential.  I had
missed that point before, being distracted by the even sillier row count
from the other subjoin.  There are clearly two different selectivity
mistakes being made here :-(

> the other interesting item i ran into was why one query used the indexes
> provided and the other didnt.  granted, pincodes has at least 10x more
> entries than customers does ( ~300k+ vs ~11k ).

Right, and the issue seems to be that when you are restricting pincodes
with an extra WHERE clause, the estimated output row count is realistic,
whereas when you aren't restricting pincodes it ain't.  I'm not sure
why, yet.

>> But it's estimating it's going to have to probe that table 300k times,
>> which makes the hashjoin look mighty attractive...

> why would it have to probe pincodes 300k times when there's a unique index
> on pincodes ( pincodes.codenum ) and a unique index on customers (
> customers.codenum ) ?

It's guessing that it's going to get 300k rows out of the join of
pincodes to all the little tables (ie, about one row per pincode row;
it doesn't know there's only going to be one match), and then it thinks
it's going to have to look through most of the customers table for
match(es) to each of those rows.  If the selectivity estimates that the
system was making were in the right ballpark, then the hash join
probably *would* be the right way to do it.  The problem here seems
to be that the estimated row counts are so far off-base.  I'd like to
find out why.

> i could send you a dump, but the db is fairly large; 7m uncompressed, 2m
> gzip -9'ed.  none of the data is overly sensetive.  i am, however, on an
> overly lagged 28k8.

2m gzipped is no problem at this end, but I'll bet we'd see the same
behavior with only half or a third of the data, if you can extract a
consistent subset easily...

> i havent tried trimming the db down.  im not sure that would 'fix' the
> problem if the optimizer is misguessing the number of rows it's going to
> have to look at...

No, it wouldn't be a fix for your problem.  I was just thinking of
making a smaller, easier-to-ship test case that I could use to make
sure the problem is fixed for 6.6.

> over the weekend im planning on upgrading to 6.5.1, but i dont recall
> seeing any changes to the optimizer in the changelog...

There are not; this is work for 6.6.  The fixes involve storing more and
different info in the system statistics table, so there is no prospect
of back-patching them into 6.5.*.  (BTW, I think 6.5.2 will be out
shortly, so you might want to wait another week before updating.)
        regards, tom lane