Re: [SQL] optimizer not using an index... - Mailing list pgsql-sql
From | Tom Lane |
---|---|
Subject | Re: [SQL] optimizer not using an index... |
Date | |
Msg-id | 6737.935765619@sss.pgh.pa.us Whole thread Raw |
In response to | optimizer not using an index... (Howie <caffeine@toodarkpark.org>) |
Responses |
Re: [SQL] optimizer not using an index...
|
List | pgsql-sql |
Howie <caffeine@toodarkpark.org> writes: > explain reveals that postgres ( 6.5.0 ) isnt using some of my indexes, > opting instead for a complete table scan ( and drastically slowing things > down ). Well, mumble. The optimizer certainly needs work, but what's your evidence that this query would be faster if done another way? Hashing the smaller tables, as it's doing, ought to be a pretty good strategy. One way to check is to start your client with environment variable setting PGOPTIONS="-fh" ("forbid hashjoin") to discourage the optimizer from using hashes, then check the generated plan for the same query and see what its actual runtime is. That's likely to be a suboptimal plan however, since it'll turn off *all* hashing. The hash on customers is probably the thing that's bothering you. How many result rows do you actually get from this query? If you eliminate the customers table from the query, and just do the same join among the remaining tables, how many rows do you get? I suspect the optimizer is drastically off in its estimate of ~300k result rows, and that's contributing to the problem. > doing a complete table scan on a column thats indexed isnt really that > nice, especially since there are 12,000 entries in it. But it's estimating it's going to have to probe that table 300k times, which makes the hashjoin look mighty attractive... > interestingly, when querying on "pincodes.code" instead of > "customers.name", postgres does NOT use a full table scan; it uses the > proper indexes: > Hash Join (cost=23.78 rows=5 width=222) Note the drastic difference in the estimated result-row count; that's undoubtedly what's changing the optimizer's choice of what to do. You haven't given us enough detail to understand why this query would be (or at least seem) more selective than the other, however. Anyway, this looks to me like it is probably a symptom of poor selectivity estimation leading to bogus estimates of output row counts leading to a nonoptimal plan choice. I have been working on improving the selectivity estimation for 6.6, and am looking for test cases to check out the logic on. Is your database small enough/non proprietary enough that you could send me a dump? Or could you strip it down to a test case that still exhibits the same misbehavior? If you don't like either of those, perhaps you could grab a current snapshot, install your data in a test postmaster, and report back on whether it acts any different... regards, tom lane