Thread: Hierarchical queries a la Oracle. Patch.

Hierarchical queries a la Oracle. Patch.

From
Evgen Potemkin
Date:
Hi there!

there is the patch.
documentation after patching will be in doc/README.hier

regards,

---
.evgen

Attachment

Re: Hierarchical queries a la Oracle. Patch.

From
Tom Lane
Date:
Evgen Potemkin <evgent@ns.terminal.ru> writes:
> This is a patch which allows PG to make hierarchical queries a la Oracle do.

This is an extremely large patch with an extreme paucity of
documentation.  For example, I searched in vain for any explanation of
what the varfake field does or why it's necessary to add such a field
to every single Var node.  That's a nontrivial amount of overhead to
pay for a feature that relatively few people will use.  (And if it is
needed, you've surely neglected to deal with it in an awful lot of
places that create or manipulate Vars.  Perhaps a new node type would
be a better idea than overloading Var like this.  Misusing Const is a
bad idea too.)

I'm also wondering what you may have broken in altering the handling of
HAVING clauses in planner.c.  What's going on there?

A more global concern is whether this really implements what Oracle
calls CONNECT BY.  I had the idea that that was a deeper feature than
just controlling the ordering of output, which AFAICT is what this does.
Also, the documents I've been able to find on the Web suggest that
Oracle's view of the syntax for the feature is only tenuously related
to what you've implemented here.

            regards, tom lane

Re: Hierarchical queries a la Oracle. Patch.

From
Tom Lane
Date:
Evgen Potemkin <evgent@ns.terminal.ru> writes:
> Trick with Const/Var+varfake is to add to result a new column which
> behave like var, but not fetched from tables by subplan.
> For Var+varfake nodes everything is same as for usual Var nodes, so no need
> to check them specially everywhere , with exclusion of some particular
> place where it is checked. It's not a neglect.

I think it would be cleaner to invent a new node type, or perhaps you
couid use Param.  Const nodes that are not really constant, in
particular, are a seriously BAD idea, as the constant-folder is likely
to fold expressions containing them.

>> I'm also wondering what you may have broken in altering the handling of
>> HAVING clauses in planner.c.  What's going on there?

> I add this for posibility of qualification rows after connecting them.
> Like HAVING with GROUP case. I'm leave WHERE clause for subplan, because it
> can considerably descrease amount of rows which is needed to connect, and
> thus improve performance of query. But also there is need to qual rows after
> connecting, to check _level_ column for example. This is HAVING for.

The stuff I found on the web last night suggested that Oracle considers
the WHERE clause to be executed after connecting.  That sounds pretty
brain-dead to me too, but if you are going to implement this feature
using Oracle syntax then you'd better use Oracle semantics too.

Personally I'd prefer to forget Oracle's syntax --- it looks to me like
it's at least as badly designed as their outer-join syntax, if not worse
--- and use SQL99's recursive-query syntax for this sort of thing.
Have you looked at that?

            regards, tom lane

Re: Hierarchical queries a la Oracle. Patch.

From
Jean-Michel POURE
Date:
Le Samedi 23 Novembre 2002 05:55, Tom Lane a écrit :
> That's a nontrivial amount of overhead to
> pay for a feature that relatively few people will use.

Compiere.org port to PostgreSQL need hierachial queries.
Cheers, Jean-Michel POURE

Re: Hierarchical queries a la Oracle. Patch.

From
Evgen Potemkin
Date:
Hi Tom,

On Fri, 22 Nov 2002, Tom Lane wrote:

> Evgen Potemkin <evgent@ns.terminal.ru> writes:
> > This is a patch which allows PG to make hierarchical queries a la Oracle do.
>
> This is an extremely large patch with an extreme paucity of
> documentation.
You're right, i'll add more.
> For example, I searched in vain for any explanation of
> what the varfake field does or why it's necessary to add such a field
> to every single Var node. That's a nontrivial amount of overhead to
> pay for a feature that relatively few people will use.  (And if it is
> needed, you've surely neglected to deal with it in an awful lot of
> places that create or manipulate Vars.
Trick with Const/Var+varfake is to add to result a new column which
behave like var, but not fetched from tables by subplan.
For Var+varfake nodes everything is same as for usual Var nodes, so no need
to check them specially everywhere , with exclusion of some particular
place where it is checked. It's not a neglect.

> Perhaps a new node type would
> be a better idea than overloading Var like this.  Misusing Const is a
> bad idea too.)
May be you're right. i'll try to add a new node.

>
> I'm also wondering what you may have broken in altering the handling of
> HAVING clauses in planner.c.  What's going on there?
I add this for posibility of qualification rows after connecting them.
Like HAVING with GROUP case. I'm leave WHERE clause for subplan, because it
can considerably descrease amount of rows which is needed to connect, and
thus improve performance of query. But also there is need to qual rows after
connecting, to check _level_ column for example. This is HAVING for.

As far as i've check, nothing is broken.
>
> A more global concern is whether this really implements what Oracle
> calls CONNECT BY.  I had the idea that that was a deeper feature than
> just controlling the ordering of output, which AFAICT is what this does.
AFAIK from Oracle's SQL reference, CONNECT BY feature is exactly for controlling
order of output. The matter is that it is the specifical order, which can't
be achieved by ORDER BY clause.

> Also, the documents I've been able to find on the Web suggest that
> Oracle's view of the syntax for the feature is only tenuously related
> to what you've implemented here.
Difference only in CONNECT BY clause. This allows only to choose 2 columns
and then compare them by EQ, when Oracle allows a more complex child
selection. But this is i'm need right now. So i've made a patch.
I thought that someone else may be needed it too. So i'm post it here.
Everything is very simple.

PS May i ask a question: what amount of overhead bring addition of varfake
field? Even if there is a 1K vars only 4K memory overhead.It have checked in
a few places, so performance overhead is not so big. I just want to understand.


regards,

---
.evgen



Re: Hierarchical queries a la Oracle. Patch.

From
Fernando Nasser
Date:
The syntax below is wrong.  Oracle has:

[START WITH condition] CONNECT BY condition

not the reverse as stated below.  Maybe this was just a documentation mistake (I
haven't looked at the code).

Fernando

Evgen Potemkin wrote:
> +
> + Syntax.
> +
> + SELECT ... FROM ... [ WHERE condition ] CONNECT BY c_expr PRIOR c_expr START WITH expr
> +   [ HAVING condition [, ...]] [ LIMIT ... ] [ OFFSET ... ]
> +



--
Fernando Nasser
Red Hat - Toronto                       E-Mail:  fnasser@redhat.com
2323 Yonge Street, Suite #300
Toronto, Ontario   M4P 2C9


Re: Hierarchical queries a la Oracle. Patch.

From
Fernando Nasser
Date:
Fernando Nasser wrote:> The syntax below is wrong.  Oracle has:
>
> [START WITH condition] CONNECT BY condition
>
> not the reverse as stated below.  Maybe this was just a documentation
> mistake (I haven't looked at the code).
>
> Fernando
>
> Evgen Potemkin wrote:
>
>> + + Syntax.
>> + + SELECT ... FROM ... [ WHERE condition ] CONNECT BY c_expr PRIOR
>> c_expr START WITH expr +   [ HAVING condition [, ...]] [ LIMIT ... ] [
>> OFFSET ... ]
>> +
>

Furthermore, PRIOR is not a clause, but a unary operator that must be applied to
one of the terms of the CONNECT BY expression.

So, we would actually have:

[START WITH expr] CONNECT BY c_expr = PRIOR c_expr
or
[START WITH expr] CONNECT BY PRIOR c_expr = c_expr

(we probably don't want to support operators other than '=')


And there is that '+' in there... what is it for?


So I guess we would be implementing something that is not either the standard
SQL nor the Oracle syntax.



--
Fernando Nasser
Red Hat - Toronto                       E-Mail:  fnasser@redhat.com
2323 Yonge Street, Suite #300
Toronto, Ontario   M4P 2C9


Re: Hierarchical queries a la Oracle. Patch.

From
Fernando Nasser
Date:
Tom Lane wrote:
 >
> Personally I'd prefer to forget Oracle's syntax --- it looks to me like
> it's at least as badly designed as their outer-join syntax, if not worse
> --- and use SQL99's recursive-query syntax for this sort of thing.
> Have you looked at that?
>

Evgen's query (put in Oracle's syntax):

SELECT * FROM data START WITH id=0 CONNECT BY id = PRIOR pnt;

would have to be implemented by something like:

WITH flat_tree (id, pnt, data, level) AS
   (SELECT id, pnt, data, 1
      FROM data
      WHERE id = 0
    UNION
    SELECT d.in, d.pnt, d.data, f.level + 1
      FROM data d, flat_tree f
      WHERE d.pnt = f.id)
SELECT * FROM flat_tree
ORDER BY level;

(I am simplifying this, one would have to add a path variable to make it depth
first).

I guess the rewriter could use the START WITH expression to create the first
select and the CONNECT BY clause to create the second one.  Maybe even the
parser could do most of the transformation (maybe).

Anyway, the Oracle syntax is indeed more compact, but is not as generic as the
SQL99 (and IBM DB2) one, so we can always implement it on top of that.

I think even DB2 implements the SQL99 recursion with some restrictions (mostly
for safety) and that probably covers 99.99% of the uses.  Maybe even a basic
implementation of the SQL one can accommodate the execution of rewritten Oracle
CONNECT BY queries.

I agree with Tom that we should implement the SQL99 one first and then, if
possible, add the Oracle compatibility to it.



--
Fernando Nasser
Red Hat - Toronto                       E-Mail:  fnasser@redhat.com
2323 Yonge Street, Suite #300
Toronto, Ontario   M4P 2C9


Re: Hierarchical queries a la Oracle. Patch.

From
Evgen Potemkin
Date:
it supports both versions: start before connect and vice versa.

regards,
---
.evgen

On Tue, 26 Nov 2002, Fernando Nasser wrote:

> The syntax below is wrong.  Oracle has:
>
> [START WITH condition] CONNECT BY condition
>
> not the reverse as stated below.  Maybe this was just a documentation mistake (I
> haven't looked at the code).
>
> Fernando
>
> Evgen Potemkin wrote:
> > +
> > + Syntax.
> > +
> > + SELECT ... FROM ... [ WHERE condition ] CONNECT BY c_expr PRIOR c_expr START WITH expr
> > +   [ HAVING condition [, ...]] [ LIMIT ... ] [ OFFSET ... ]
> --
> Fernando Nasser
> Red Hat - Toronto                       E-Mail:  fnasser@redhat.com
> 2323 Yonge Street, Suite #300
> Toronto, Ontario   M4P 2C9
>
>


Re: Hierarchical queries a la Oracle. Patch.

From
Evgen Potemkin
Date:
it's fixed in patch for PG 7.3

'+' comes from .diff, where you take it from :)

regards,

---
.evgen

On Tue, 26 Nov 2002, Fernando Nasser wrote:

> Fernando Nasser wrote:> The syntax below is wrong.  Oracle has:
> >
> > [START WITH condition] CONNECT BY condition
> >
> > not the reverse as stated below.  Maybe this was just a documentation
> > mistake (I haven't looked at the code).

> >> + + Syntax.
> >> + + SELECT ... FROM ... [ WHERE condition ] CONNECT BY c_expr PRIOR
> >> c_expr START WITH expr +   [ HAVING condition [, ...]] [ LIMIT ... ] [
> >> OFFSET ... ]
> >> +
> >
>
> Furthermore, PRIOR is not a clause, but a unary operator that must be applied to
> one of the terms of the CONNECT BY expression.
>
> So, we would actually have:
>
> [START WITH expr] CONNECT BY c_expr = PRIOR c_expr
>
> And there is that '+' in there... what is it for?
>
>
> So I guess we would be implementing something that is not either the standard
> SQL nor the Oracle syntax.
>
>
>
> --
> Fernando Nasser
> Red Hat - Toronto                       E-Mail:  fnasser@redhat.com
> 2323 Yonge Street, Suite #300
> Toronto, Ontario   M4P 2C9
>
>


Re: Hierarchical queries a la Oracle. Patch.

From
Fernando Nasser
Date:
Evgen Potemkin wrote:> it's fixed in patch for PG 7.3
>
> '+' comes from .diff, where you take it from :)
>

 From here:

>>>>+ + Syntax.
>>>>+ + SELECT ... FROM ... [ WHERE condition ] CONNECT BY c_expr PRIOR
>>>>c_expr START WITH expr +   [ HAVING condition [, ...]] [ LIMIT ... ] [
                           ^^^^
>>>>OFFSET ... ]
>>>>+
>>>




--
Fernando Nasser
Red Hat - Toronto                       E-Mail:  fnasser@redhat.com
2323 Yonge Street, Suite #300
Toronto, Ontario   M4P 2C9


Re: Hierarchical queries a la Oracle. Patch.

From
Fernando Nasser
Date:
Evgen Potemkin wrote:> it supports both versions: start before connect and vice
versa.
>
> regards,
> ---
> .evgen
>

Why do you want to support the START WITH after the CONNECT BY if that syntax
does not exist anywhere (Oracle uses it before)?  Adding another option just
increases the chances of parser conflicts in the future.

Regards,
Fernando


> On Tue, 26 Nov 2002, Fernando Nasser wrote:
>
>
>>The syntax below is wrong.  Oracle has:
>>
>>[START WITH condition] CONNECT BY condition
>>
>>not the reverse as stated below.  Maybe this was just a documentation mistake (I
>>haven't looked at the code).
>>
>>Fernando
>>
>>Evgen Potemkin wrote:
>>
>>>+
>>>+ Syntax.
>>>+
>>>+ SELECT ... FROM ... [ WHERE condition ] CONNECT BY c_expr PRIOR c_expr START WITH expr
>>>+   [ HAVING condition [, ...]] [ LIMIT ... ] [ OFFSET ... ]
>>
>>--
>>Fernando Nasser
>>Red Hat - Toronto                       E-Mail:  fnasser@redhat.com
>>2323 Yonge Street, Suite #300
>>Toronto, Ontario   M4P 2C9
>>
>>
>
>


@redhat.com
2323 Yonge Street, Suite #300
Toronto, Ontario   M4P 2C9


Re: Hierarchical queries a la Oracle. Patch.

From
Fernando Nasser
Date:
Evgen Potemkin wrote:> both versions of syntax are supported for convenience,
also AFAIK there will
> be no conflicts.
>

How can you tell?  Can you foretell what the SQL standard will create in the future?

You don't see conflicts now.   There is a good chance that the standards
committee avoids conflicting with some syntax used by a major vendor like
Oracle.  But your (unnecessary) extensions to that syntax will not be taken into
consideration, of course.  This will increase the chances that we find a parser
conflict when trying to implement some future SQL standard construct.

Furthermore, clauses in SQL are ordered.  There is no reason for allowing random
ordering of clauses in a specific statement.

Regards,
Fernando

>
> On Thu, 28 Nov 2002, Fernando Nasser wrote:
>
>
>>Evgen Potemkin wrote:> it supports both versions: start before connect and vice
>>versa.
>>
>>>regards,
>>>---
>>>.evgen
>>>
>>
>>Why do you want to support the START WITH after the CONNECT BY if that syntax
>>does not exist anywhere (Oracle uses it before)?  Adding another option just
>>increases the chances of parser conflicts in the future.
>>
>>Regards,
>>Fernando
>
>
>>>>Fernando Nasser
>>>>Red Hat - Toronto                       E-Mail:  fnasser@redhat.com
>>>>2323 Yonge Street, Suite #300
>>>>Toronto, Ontario   M4P 2C9
>>>
>



--
Fernando Nasser
Red Hat - Toronto                       E-Mail:  fnasser@redhat.com
2323 Yonge Street, Suite #300
Toronto, Ontario   M4P 2C9