Thread: CTE that result in repeated sorting of the data
I was watching a very large recursive CTE get built today and this CTE involves on the order of a dozen or so "loops" joining the initial table against existing tables. It struck me that - every time through the loop the tables were sorted and then joined and that it would be much more efficient if the tables remained in a sorted state and could avoid being re-sorted each time through the loop. Am I missing something here? I am using PG 8.4 if that matters. -- Jon
Jon Nelson-14 wrote > I was watching a very large recursive CTE get built today and this CTE > involves on the order of a dozen or so "loops" joining the initial > table against existing tables. It struck me that - every time through > the loop the tables were sorted and then joined and that it would be > much more efficient if the tables remained in a sorted state and could > avoid being re-sorted each time through the loop. Am I missing > something here? I am using PG 8.4 if that matters. I'm not sure what you mean by "watching" but maybe this is a simple as changing your CTE to use "UNION ALL" instead of "UNION [DISTINCT]"? If you really think it could be improved upon maybe you can help and provide a minimal self-contained example query and data that exhibits the behavior you describe so others can see it and test changes? It would be nice to know if other versions than one that is basically no longer supported exhibits the same behavior. Or maybe someone more familiar with the implementation of recursive CTE will chime in - my knowledge in this area is fairly limited. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/CTE-that-result-in-repeated-sorting-of-the-data-tp5804136p5804140.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
On Thu, May 15, 2014 at 4:50 PM, David G Johnston <david.g.johnston@gmail.com> wrote: > Jon Nelson-14 wrote >> I was watching a very large recursive CTE get built today and this CTE >> involves on the order of a dozen or so "loops" joining the initial >> table against existing tables. It struck me that - every time through >> the loop the tables were sorted and then joined and that it would be >> much more efficient if the tables remained in a sorted state and could >> avoid being re-sorted each time through the loop. Am I missing >> something here? I am using PG 8.4 if that matters. > > I'm not sure what you mean by "watching" but maybe this is a simple as > changing your CTE to use "UNION ALL" instead of "UNION [DISTINCT]"? In fact, I'm using UNION ALL. > If you really think it could be improved upon maybe you can help and provide > a minimal self-contained example query and data that exhibits the behavior > you describe so others can see it and test changes? It would be nice to > know if other versions than one that is basically no longer supported > exhibits the same behavior. Pretty much any CTE that looks like this: with cte AS ( select stuff from A UNION ALL select more_stuff from B, cte WHERE <join conditions> ) SELECT * FROM cte; *and* where the planner chooses to join B and cte by sorting and doing a merge join. I'll see if I can come up with a self-contained example. -- Jon