Re: [SQL] optimizer not using an index... - Mailing list pgsql-sql
From | Tom Lane |
---|---|
Subject | Re: [SQL] optimizer not using an index... |
Date | |
Msg-id | 7075.935772727@sss.pgh.pa.us Whole thread Raw |
In response to | Re: [SQL] optimizer not using an index... (Howie <caffeine@toodarkpark.org>) |
List | pgsql-sql |
Howie <caffeine@toodarkpark.org> writes: > On Fri, 27 Aug 1999, Tom Lane wrote: >> Well, mumble. The optimizer certainly needs work, but what's your >> evidence that this query would be faster if done another way? Hashing >> the smaller tables, as it's doing, ought to be a pretty good strategy. > primary focus was on why pgsql decided to do a complete table scan of > customers ( searching for customer.email ) when there's an index on it; > surely an index scan would be loads faster than a complete table scan. No, an index scan will be loads slower if most or all of the table has to be visited. If it were always faster, the optimizer would have a much easier task ;-). The join we are considering here is on pincodes.codenum = customers.codenum, and since the system (mistakenly) thinks it is going to be joining a lot of rows from pincodes, doing it with a hash rather than an indexscan on customers.codenum looks good. You bring up a good point though: why isn't it deciding to use the restriction on customer.email for an indexscan of customers, using the index on email rather than the one on codenum? (I assume you have both) It looks like the thing is misestimating the selectivity of that restriction too: there are not going to be 8757 customer rows that pass that WHERE clause --- but the system thinks that, and that's why it's rejecting the indexscan as being slower than the sequential. I had missed that point before, being distracted by the even sillier row count from the other subjoin. There are clearly two different selectivity mistakes being made here :-( > the other interesting item i ran into was why one query used the indexes > provided and the other didnt. granted, pincodes has at least 10x more > entries than customers does ( ~300k+ vs ~11k ). Right, and the issue seems to be that when you are restricting pincodes with an extra WHERE clause, the estimated output row count is realistic, whereas when you aren't restricting pincodes it ain't. I'm not sure why, yet. >> But it's estimating it's going to have to probe that table 300k times, >> which makes the hashjoin look mighty attractive... > why would it have to probe pincodes 300k times when there's a unique index > on pincodes ( pincodes.codenum ) and a unique index on customers ( > customers.codenum ) ? It's guessing that it's going to get 300k rows out of the join of pincodes to all the little tables (ie, about one row per pincode row; it doesn't know there's only going to be one match), and then it thinks it's going to have to look through most of the customers table for match(es) to each of those rows. If the selectivity estimates that the system was making were in the right ballpark, then the hash join probably *would* be the right way to do it. The problem here seems to be that the estimated row counts are so far off-base. I'd like to find out why. > i could send you a dump, but the db is fairly large; 7m uncompressed, 2m > gzip -9'ed. none of the data is overly sensetive. i am, however, on an > overly lagged 28k8. 2m gzipped is no problem at this end, but I'll bet we'd see the same behavior with only half or a third of the data, if you can extract a consistent subset easily... > i havent tried trimming the db down. im not sure that would 'fix' the > problem if the optimizer is misguessing the number of rows it's going to > have to look at... No, it wouldn't be a fix for your problem. I was just thinking of making a smaller, easier-to-ship test case that I could use to make sure the problem is fixed for 6.6. > over the weekend im planning on upgrading to 6.5.1, but i dont recall > seeing any changes to the optimizer in the changelog... There are not; this is work for 6.6. The fixes involve storing more and different info in the system statistics table, so there is no prospect of back-patching them into 6.5.*. (BTW, I think 6.5.2 will be out shortly, so you might want to wait another week before updating.) regards, tom lane