Re: CTE that result in repeated sorting of the data - Mailing list pgsql-hackers

From Jon Nelson
Subject Re: CTE that result in repeated sorting of the data
Date
Msg-id CAKuK5J1e1jENEFjJneMY8qEmkPqPthHDDtKOt6GqiLhWZ7uxnA@mail.gmail.com
Whole thread Raw
In response to Re: CTE that result in repeated sorting of the data  (David G Johnston <david.g.johnston@gmail.com>)
List pgsql-hackers
On Thu, May 15, 2014 at 4:50 PM, David G Johnston
<david.g.johnston@gmail.com> wrote:
> Jon Nelson-14 wrote
>> I was watching a very large recursive CTE get built today and this CTE
>> involves on the order of a dozen or so "loops" joining the initial
>> table against existing tables. It struck me that - every time through
>> the loop the tables were sorted and then joined and that it would be
>> much more efficient if the tables remained in a sorted state and could
>> avoid being re-sorted each time through the loop. Am I missing
>> something here? I am using PG 8.4 if that matters.
>
> I'm not sure what you mean by "watching" but maybe this is a simple as
> changing your CTE to use "UNION ALL" instead of "UNION [DISTINCT]"?

In fact, I'm using UNION ALL.

> If you really think it could be improved upon maybe you can help and provide
> a minimal self-contained example query and data that exhibits the behavior
> you describe so others can see it and test changes?  It would be nice to
> know if other versions than one that is basically no longer supported
> exhibits the same behavior.

Pretty much any CTE that looks like this:

with cte AS ( select stuff from A UNION ALL select more_stuff from B, cte WHERE <join conditions>
) SELECT * FROM cte;

*and* where the planner chooses to join B and cte by sorting and doing
a merge join.

I'll see if I can come up with a self-contained example.


-- 
Jon



pgsql-hackers by date:

Previous
From: Kohei KaiGai
Date:
Subject: Re: sepgsql: label regression test failed
Next
From: Rohit Goyal
Date:
Subject: Error in running DBT2