Thread: CTE that result in repeated sorting of the data

CTE that result in repeated sorting of the data

From
Jon Nelson
Date:
I was watching a very large recursive CTE get built today and this CTE
involves on the order of a dozen or so "loops" joining the initial
table against existing tables. It struck me that - every time through
the loop the tables were sorted and then joined and that it would be
much more efficient if the tables remained in a sorted state and could
avoid being re-sorted each time through the loop. Am I missing
something here? I am using PG 8.4 if that matters.

-- 
Jon



Re: CTE that result in repeated sorting of the data

From
David G Johnston
Date:
Jon Nelson-14 wrote
> I was watching a very large recursive CTE get built today and this CTE
> involves on the order of a dozen or so "loops" joining the initial
> table against existing tables. It struck me that - every time through
> the loop the tables were sorted and then joined and that it would be
> much more efficient if the tables remained in a sorted state and could
> avoid being re-sorted each time through the loop. Am I missing
> something here? I am using PG 8.4 if that matters.

I'm not sure what you mean by "watching" but maybe this is a simple as
changing your CTE to use "UNION ALL" instead of "UNION [DISTINCT]"?

If you really think it could be improved upon maybe you can help and provide
a minimal self-contained example query and data that exhibits the behavior
you describe so others can see it and test changes?  It would be nice to
know if other versions than one that is basically no longer supported
exhibits the same behavior.

Or maybe someone more familiar with the implementation of recursive CTE will
chime in - my knowledge in this area is fairly limited.

David J.





--
View this message in context:
http://postgresql.1045698.n5.nabble.com/CTE-that-result-in-repeated-sorting-of-the-data-tp5804136p5804140.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.



Re: CTE that result in repeated sorting of the data

From
Jon Nelson
Date:
On Thu, May 15, 2014 at 4:50 PM, David G Johnston
<david.g.johnston@gmail.com> wrote:
> Jon Nelson-14 wrote
>> I was watching a very large recursive CTE get built today and this CTE
>> involves on the order of a dozen or so "loops" joining the initial
>> table against existing tables. It struck me that - every time through
>> the loop the tables were sorted and then joined and that it would be
>> much more efficient if the tables remained in a sorted state and could
>> avoid being re-sorted each time through the loop. Am I missing
>> something here? I am using PG 8.4 if that matters.
>
> I'm not sure what you mean by "watching" but maybe this is a simple as
> changing your CTE to use "UNION ALL" instead of "UNION [DISTINCT]"?

In fact, I'm using UNION ALL.

> If you really think it could be improved upon maybe you can help and provide
> a minimal self-contained example query and data that exhibits the behavior
> you describe so others can see it and test changes?  It would be nice to
> know if other versions than one that is basically no longer supported
> exhibits the same behavior.

Pretty much any CTE that looks like this:

with cte AS ( select stuff from A UNION ALL select more_stuff from B, cte WHERE <join conditions>
) SELECT * FROM cte;

*and* where the planner chooses to join B and cte by sorting and doing
a merge join.

I'll see if I can come up with a self-contained example.


-- 
Jon