Re: [SQL] optimizer not using an index... - Mailing list pgsql-sql
From | Howie |
---|---|
Subject | Re: [SQL] optimizer not using an index... |
Date | |
Msg-id | Pine.LNX.3.96.990827161640.7463G-100000@rabies.toodarkpark.org Whole thread Raw |
In response to | Re: [SQL] optimizer not using an index... (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: [SQL] optimizer not using an index...
|
List | pgsql-sql |
On Fri, 27 Aug 1999, Tom Lane wrote: > Howie <caffeine@toodarkpark.org> writes: > > explain reveals that postgres ( 6.5.0 ) isnt using some of my indexes, > > opting instead for a complete table scan ( and drastically slowing things > > down ). > > Well, mumble. The optimizer certainly needs work, but what's your > evidence that this query would be faster if done another way? Hashing > the smaller tables, as it's doing, ought to be a pretty good strategy. primary focus was on why pgsql decided to do a complete table scan of customers ( searching for customer.email ) when there's an index on it; surely an index scan would be loads faster than a complete table scan. the other interesting item i ran into was why one query used the indexes provided and the other didnt. granted, pincodes has at least 10x more entries than customers does ( ~300k+ vs ~11k ). > One way to check is to start your client with environment variable > setting PGOPTIONS="-fh" ("forbid hashjoin") to discourage the optimizer > from using hashes, then check the generated plan for the same query and > see what its actual runtime is. That's likely to be a suboptimal plan > however, since it'll turn off *all* hashing. The hash on customers is > probably the thing that's bothering you. that and pgsql not using the indexes that the query on pincodes.code does. using -fh causes pgsql to use all the proper indexes, but still beefs up 4 merge joins, seq scans, sorts, and nested loops: [excerpt] Merge Join (cost=355979.91 rows=293465 width=222) -> Seq Scan (cost=342501.81 rows=376206 width=138) -> Sort (cost=342501.81rows=376206 width=138) -> Nested Loop (cost=20616.32 rows=376206 width=138) -> Merge Join (cost=76.34 rows=413 width=118) -> Merge Join (cost=33.01 rows=87 width=60) -> Merge Join (cost=20.28 rows=81 width=56) ( query on customers.email ) > How many result rows do you actually get from this query? when querying on pincodes.code, i get 1 row back. when querying on customers.email, i get 4 rows back. it's producing the correct results, its just going about getting those results in a somewhat odd manner. > If you eliminate the customers table from the query, and just do the > same join among the remaining tables, how many rows do you get? 1, as expected. Hash Join (cost=21.73 rows=5 width=138) -> Seq Scan on ibill (cost=3.64 rows=80 width=28) -> Hash (cost=14.32 rows=4width=110) -> Nested Loop (cost=14.32 rows=4 width=110) -> Nested Loop (cost=8.17 rows=3 width=106) -> Nested Loop (cost=4.07 rows=2 width=78) -> Index Scan using codes_code_idxon pincodes (cost=2.02 rows=1 width=20) -> Index Scan using types_pkey on subaccts(cost=2.05 rows=379 width=58) -> Index Scan using doms_pkey on domains (cost=2.05 rows=80 width=28) -> Index Scan using owner_client_idx on owners (cost=2.05 rows=85 width=4) > I suspect the optimizer is drastically off in its estimate of ~300k > result rows, and that's contributing to the problem. yes, so why is it off for one query but right on target for the other ? more importantly, why is it chosing to use indexes for one query yet chosing to do complete table scans for the other ( even though the two queries are almost identical ) ? answer that and i'll personally treat you to a beer. :) > > doing a complete table scan on a column thats indexed isnt really that > > nice, especially since there are 12,000 entries in it. > > But it's estimating it's going to have to probe that table 300k times, > which makes the hashjoin look mighty attractive... why would it have to probe pincodes 300k times when there's a unique index on pincodes ( pincodes.codenum ) and a unique index on customers ( customers.codenum ) ? > > interestingly, when querying on "pincodes.code" instead of > > "customers.name", postgres does NOT use a full table scan; it uses the > > proper indexes: > > > Hash Join (cost=23.78 rows=5 width=222) > > Note the drastic difference in the estimated result-row count; that's > undoubtedly what's changing the optimizer's choice of what to do. You > haven't given us enough detail to understand why this query would be > (or at least seem) more selective than the other, however. im not sure it would; i placed indexes in ( what i thought were ) all the proper places - pincodes.code, pincodes.codenum, customers.name, customers.codenum... every column that's being queried on or joined on. ideally, all the indexes would be used and querying on customers.email would be ( query-plan wise ) almost identical to querying on pincodes.code; only an index would change. > Anyway, this looks to me like it is probably a symptom of poor > selectivity estimation leading to bogus estimates of output row counts > leading to a nonoptimal plan choice. I have been working on improving > the selectivity estimation for 6.6, and am looking for test cases to > check out the logic on. Is your database small enough/non proprietary > enough that you could send me a dump? i could send you a dump, but the db is fairly large; 7m uncompressed, 2m gzip -9'ed. none of the data is overly sensetive. i am, however, on an overly lagged 28k8. > Or could you strip it down to > a test case that still exhibits the same misbehavior? i havent tried trimming the db down. im not sure that would 'fix' the problem if the optimizer is misguessing the number of rows it's going to have to look at... > If you don't > like either of those, perhaps you could grab a current snapshot, install > your data in a test postmaster, and report back on whether it acts any > different... over the weekend im planning on upgrading to 6.5.1, but i dont recall seeing any changes to the optimizer in the changelog... --- Howie <caffeine@toodarkpark.org> URL: http://www.toodarkpark.org "The distance between insanity and genius is measured only by success."