Re: issue with double ordering in a wrapped distinct - Mailing list pgsql-general

From David G Johnston
Subject Re: issue with double ordering in a wrapped distinct
Date
Msg-id 1416354955104-5827446.post@n5.nabble.com
Whole thread Raw
In response to issue with double ordering in a wrapped distinct  (Jonathan Vanasco <postgres@2xlp.com>)
List pgsql-general
Jonathan Vanasco-7 wrote
> I have a particular query that returns resultset of 45k rows out of a
> large resultset (pg 9.3 and 9.1)
>
> It's a many 2 many query, where I"m trying to search for Bar based on
> attributes in a linked Foo.
>
> I tweaked the indexes, optimized the query, and got it down an acceptable
> speed around 1,100ms
>
> the second I added a limit/offset though -- the query plan completely
> changed and it ballooned up to 297,340 ms.   Yes, I waited that long to
> see what was going on in the query planner.
>
> I did a lot of playing around, and managed to get this form of a query to
> work in 305ms with a limit/offset.
>
> SELECT DISTINCT qinner.bar_id
> FROM
>   (SELECT foo_2_bar.bar_id AS bar_id
>    FROM foo_2_bar
>    JOIN foo ON foo_2_bar.foo_id = foo.id
>    WHERE foo.biz_id = 1
>      AND (foo.is_hidden IS NOT TRUE)
>    ORDER BY foo_2_bar.bar_id ASC
>    ) AS qinner
> ORDER BY qinner.bar_id ASC
> LIMIT 100
> OFFSET 0
> ;
>
> This is what I don't understand -- notice the two order_by calls.
>
>     If i run this with an inner and outer order_by, I get ~305ms.  (I don't
> think I need both, but I wasn't sure if ordering is kept from a subselect
> )
>
>     If i run this with only the inner, I get ~304ms.
>
>     If I run this with only the outer, it's pushing over 10minutes again
>
> i'm wondering if anyone might know why that performance hit would be
> happening

Pretty sure we need to see the explains, ideally with ANALYZE.

The DISTINCT at the top-level will cause any ordering that the subquery
imposes to be lost but because the subquery output is ordered the distinct
likely runs more efficiently.

I presume you have a reason for not simply doing away with the subquery
altogether...

The HashAgg used by DISTINCT is fairly expensive and apparently can be
optimized away if it knows it is receiving sorted data from the subquery.
Lacking that knowledge the entire subquery relation needs to be HashAgg'd
before the LIMIT can be applied to its results.

My uninformed question is whether DISTINCT can be made smart enough to
realize that it would be cheaper to sort-and-scan instead of using HashAgg
followed by sort...

Then again without plans I am merely guessing here...

David J.



--
View this message in context:
http://postgresql.nabble.com/issue-with-double-ordering-in-a-wrapped-distinct-tp5827443p5827446.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: issue with double ordering in a wrapped distinct
Next
From: "Joshua D. Drake"
Date:
Subject: Linuxfest 2015 Call for Papers