Re: Erronous sort used in query plan - Mailing list pgsql-hackers

From Shane Ambler
Subject Re: Erronous sort used in query plan
Date
Msg-id 45A1606C.1080306@007Marketing.com
Whole thread Raw
In response to Re: Erronous sort used in query plan  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Erronous sort used in query plan
List pgsql-hackers
Tom Lane wrote:
> Shane Ambler <pgsql@007Marketing.com> writes:
>> I am putting together searches on the catalog info and came up with a 
>> select that was rather slow and I noticed that in the explain analyze 
>> there is a sort step on one of the left joins which I don't think 
>> belongs there.
> 
> Well, it's certainly necessary in context because it's preparing the
> data for the merge join immediately above it.  The correct question
> is why is the thing using a merge join here, when a hash join would be
> cheaper?
> 
> I dug through this and found out that the hash join is estimated as
> cheaper, right up till the last step of cost_hashjoin:
> 
>     /*
>      * Bias against putting larger relation on inside.  We don't want an
>      * absolute prohibition, though, since larger relation might have better
>      * bucketsize --- and we can't trust the size estimates unreservedly,
>      * anyway.  Instead, inflate the run cost by the square root of the size
>      * ratio.  (Why square root?  No real good reason, but it seems
>      * reasonable...)
>      *
>      * Note: before 7.4 we implemented this by inflating startup cost; but if
>      * there's a disable_cost component in the input paths' startup cost, that
>      * unfairly penalizes the hash.  Probably it'd be better to keep track of
>      * disable penalty separately from cost.
>      */
>     if (innerbytes > outerbytes && outerbytes > 0)
>         run_cost *= sqrt(innerbytes / outerbytes);
> 
> In this example, the data volume from the join of everything else is
> estimated as less than what needs to be fetched from pg_proc, and so
> this bias kicks in, and the cost estimate roughly doubles.
> Unfortunately, because it's a LEFT JOIN, we'll never consider hashjoin
> in the other direction and so the hash loses out to the mergejoin.
> 
> It seems clear to me that we ought not impose a bias unless the join
> type is such that both directions of hashing are feasible.  I wonder
> also if the bias is too large ... but there's not really evidence for
> or against that in this example.  The point is that this code implicitly
> assumes both directions will be tried, and they won't.
> 

I think that the selected sort (or at least the merge join) is incorrect 
- the column sorted (or both actually) is linking the current record in 
pg_operator with the oid in the pg_proc - it will only return one row.

If one of the pg_type joins is changed, it then sorts on the oid of 
pg_operator as the foreign table - again this will only return one row.

I would think that the foreign oid would indicate to the planner that it 
will only find one foreign row to link with.


I can see that the error I made created a funny (probably useless 
actually) link that would throw things out, but I would expect it to
create bad planning for the two joins that are in error not a 
non-related one to one join. If a sort/merge join was created from this 
and used to satisfy this join I would accept that as part of what I 
unintentionally requested but instead it generates a sort/merge join on 
a join that links one current record to one foreign record and has 
nothing in common with the joins in error.



-- 

Shane Ambler
pgSQL@007Marketing.com

Get Sheeky @ http://Sheeky.Biz


pgsql-hackers by date:

Previous
From: Gregory Stark
Date:
Subject: Full page writes
Next
From: Peter Eisentraut
Date:
Subject: Re: --with-libxml build failures on OS X