Re: Erronous sort used in query plan - Mailing list pgsql-hackers
From | Shane Ambler |
---|---|
Subject | Re: Erronous sort used in query plan |
Date | |
Msg-id | 45A1606C.1080306@007Marketing.com Whole thread Raw |
In response to | Re: Erronous sort used in query plan (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Erronous sort used in query plan
|
List | pgsql-hackers |
Tom Lane wrote: > Shane Ambler <pgsql@007Marketing.com> writes: >> I am putting together searches on the catalog info and came up with a >> select that was rather slow and I noticed that in the explain analyze >> there is a sort step on one of the left joins which I don't think >> belongs there. > > Well, it's certainly necessary in context because it's preparing the > data for the merge join immediately above it. The correct question > is why is the thing using a merge join here, when a hash join would be > cheaper? > > I dug through this and found out that the hash join is estimated as > cheaper, right up till the last step of cost_hashjoin: > > /* > * Bias against putting larger relation on inside. We don't want an > * absolute prohibition, though, since larger relation might have better > * bucketsize --- and we can't trust the size estimates unreservedly, > * anyway. Instead, inflate the run cost by the square root of the size > * ratio. (Why square root? No real good reason, but it seems > * reasonable...) > * > * Note: before 7.4 we implemented this by inflating startup cost; but if > * there's a disable_cost component in the input paths' startup cost, that > * unfairly penalizes the hash. Probably it'd be better to keep track of > * disable penalty separately from cost. > */ > if (innerbytes > outerbytes && outerbytes > 0) > run_cost *= sqrt(innerbytes / outerbytes); > > In this example, the data volume from the join of everything else is > estimated as less than what needs to be fetched from pg_proc, and so > this bias kicks in, and the cost estimate roughly doubles. > Unfortunately, because it's a LEFT JOIN, we'll never consider hashjoin > in the other direction and so the hash loses out to the mergejoin. > > It seems clear to me that we ought not impose a bias unless the join > type is such that both directions of hashing are feasible. I wonder > also if the bias is too large ... but there's not really evidence for > or against that in this example. The point is that this code implicitly > assumes both directions will be tried, and they won't. > I think that the selected sort (or at least the merge join) is incorrect - the column sorted (or both actually) is linking the current record in pg_operator with the oid in the pg_proc - it will only return one row. If one of the pg_type joins is changed, it then sorts on the oid of pg_operator as the foreign table - again this will only return one row. I would think that the foreign oid would indicate to the planner that it will only find one foreign row to link with. I can see that the error I made created a funny (probably useless actually) link that would throw things out, but I would expect it to create bad planning for the two joins that are in error not a non-related one to one join. If a sort/merge join was created from this and used to satisfy this join I would accept that as part of what I unintentionally requested but instead it generates a sort/merge join on a join that links one current record to one foreign record and has nothing in common with the joins in error. -- Shane Ambler pgSQL@007Marketing.com Get Sheeky @ http://Sheeky.Biz
pgsql-hackers by date: