Re: Status of Hierarchical Queries - Mailing list pgsql-hackers

From Gregory Stark
Subject Re: Status of Hierarchical Queries
Date
Msg-id 87wt2bm13c.fsf@stark.xeocode.com
Whole thread Raw
In response to Re: Status of Hierarchical Queries  (Gavin Sherry <swm@alcove.com.au>)
Responses Re: Status of Hierarchical Queries  (Gavin Sherry <swm@alcove.com.au>)
List pgsql-hackers
"Gavin Sherry" <swm@alcove.com.au> writes:

> Can you elaborate on the 'two different sets of parameters' bit? I'm still
> without coffee.

The spec allows for arbitrarily complex recursive query structures. Including
mutually recursive queries, and even non-linearly recursive queries. I found
grokking them requires far stronger brews than coffee :)

But in a simple recursive tree search you have a node which wants to do a join
between the output of tree level n against some table to produce tree level
n+1. It can't simply execute the plan to produce tree level n since that's the
same tree it's executing itself. If it calls the Init method on itself it'll
lose all its state.

There's another reason it can't just execute the previous node. You really
don't want to recompute all the results for level n when you go to produce
level n+1. You want to keep them around from the previous iteration. Otherwise
you have an n^2 algorithm.

Think of the fibonacci sequence algorithm: if you implement it recursively the
naive way you have to return all the way back to the beginning to calculate
each number. But if you remember the last two you can calculate it directly
without recalculating all the previous number repeatedly.


>> It is sufficient for the non-recursive case which might make it worthwhile
>> putting it in 8.3. But even there user's expectations are probably that the
>> reason they're writing it as a cte is precisely to avoid duplicate execution.
>
> I wonder if the planner should decide that?

That's one option. For the non-recursive case we could inline the cte subquery
everywhere it's referenced and then add smarts to the planner to find
identical subqueries and have a heuristic to determine when it would be
advantageous to calculate the result once.

The alternative is to retain them as references to a single plan. Then have a
heuristic for when to inline them.

In neither case is a heuristic going to be particularly good. The problem is
that for any reasonably complex plan it'll be cheaper to execute it only once
than multiple times. Unless there's an advantage to be gained by inlining it
such as being able to push conditions down into it. But the only way to find
out if that will be possible would be to try planning it and see.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Gavin Sherry
Date:
Subject: Re: Status of Hierarchical Queries
Next
From: Gavin Sherry
Date:
Subject: Re: Status of Hierarchical Queries