Thread: Hierarchical queries a la Oracle. Patch.
Hi there! there is the patch. documentation after patching will be in doc/README.hier regards, --- .evgen
Attachment
Evgen Potemkin <evgent@ns.terminal.ru> writes: > This is a patch which allows PG to make hierarchical queries a la Oracle do. This is an extremely large patch with an extreme paucity of documentation. For example, I searched in vain for any explanation of what the varfake field does or why it's necessary to add such a field to every single Var node. That's a nontrivial amount of overhead to pay for a feature that relatively few people will use. (And if it is needed, you've surely neglected to deal with it in an awful lot of places that create or manipulate Vars. Perhaps a new node type would be a better idea than overloading Var like this. Misusing Const is a bad idea too.) I'm also wondering what you may have broken in altering the handling of HAVING clauses in planner.c. What's going on there? A more global concern is whether this really implements what Oracle calls CONNECT BY. I had the idea that that was a deeper feature than just controlling the ordering of output, which AFAICT is what this does. Also, the documents I've been able to find on the Web suggest that Oracle's view of the syntax for the feature is only tenuously related to what you've implemented here. regards, tom lane
Evgen Potemkin <evgent@ns.terminal.ru> writes: > Trick with Const/Var+varfake is to add to result a new column which > behave like var, but not fetched from tables by subplan. > For Var+varfake nodes everything is same as for usual Var nodes, so no need > to check them specially everywhere , with exclusion of some particular > place where it is checked. It's not a neglect. I think it would be cleaner to invent a new node type, or perhaps you couid use Param. Const nodes that are not really constant, in particular, are a seriously BAD idea, as the constant-folder is likely to fold expressions containing them. >> I'm also wondering what you may have broken in altering the handling of >> HAVING clauses in planner.c. What's going on there? > I add this for posibility of qualification rows after connecting them. > Like HAVING with GROUP case. I'm leave WHERE clause for subplan, because it > can considerably descrease amount of rows which is needed to connect, and > thus improve performance of query. But also there is need to qual rows after > connecting, to check _level_ column for example. This is HAVING for. The stuff I found on the web last night suggested that Oracle considers the WHERE clause to be executed after connecting. That sounds pretty brain-dead to me too, but if you are going to implement this feature using Oracle syntax then you'd better use Oracle semantics too. Personally I'd prefer to forget Oracle's syntax --- it looks to me like it's at least as badly designed as their outer-join syntax, if not worse --- and use SQL99's recursive-query syntax for this sort of thing. Have you looked at that? regards, tom lane
Le Samedi 23 Novembre 2002 05:55, Tom Lane a écrit : > That's a nontrivial amount of overhead to > pay for a feature that relatively few people will use. Compiere.org port to PostgreSQL need hierachial queries. Cheers, Jean-Michel POURE
Hi Tom, On Fri, 22 Nov 2002, Tom Lane wrote: > Evgen Potemkin <evgent@ns.terminal.ru> writes: > > This is a patch which allows PG to make hierarchical queries a la Oracle do. > > This is an extremely large patch with an extreme paucity of > documentation. You're right, i'll add more. > For example, I searched in vain for any explanation of > what the varfake field does or why it's necessary to add such a field > to every single Var node. That's a nontrivial amount of overhead to > pay for a feature that relatively few people will use. (And if it is > needed, you've surely neglected to deal with it in an awful lot of > places that create or manipulate Vars. Trick with Const/Var+varfake is to add to result a new column which behave like var, but not fetched from tables by subplan. For Var+varfake nodes everything is same as for usual Var nodes, so no need to check them specially everywhere , with exclusion of some particular place where it is checked. It's not a neglect. > Perhaps a new node type would > be a better idea than overloading Var like this. Misusing Const is a > bad idea too.) May be you're right. i'll try to add a new node. > > I'm also wondering what you may have broken in altering the handling of > HAVING clauses in planner.c. What's going on there? I add this for posibility of qualification rows after connecting them. Like HAVING with GROUP case. I'm leave WHERE clause for subplan, because it can considerably descrease amount of rows which is needed to connect, and thus improve performance of query. But also there is need to qual rows after connecting, to check _level_ column for example. This is HAVING for. As far as i've check, nothing is broken. > > A more global concern is whether this really implements what Oracle > calls CONNECT BY. I had the idea that that was a deeper feature than > just controlling the ordering of output, which AFAICT is what this does. AFAIK from Oracle's SQL reference, CONNECT BY feature is exactly for controlling order of output. The matter is that it is the specifical order, which can't be achieved by ORDER BY clause. > Also, the documents I've been able to find on the Web suggest that > Oracle's view of the syntax for the feature is only tenuously related > to what you've implemented here. Difference only in CONNECT BY clause. This allows only to choose 2 columns and then compare them by EQ, when Oracle allows a more complex child selection. But this is i'm need right now. So i've made a patch. I thought that someone else may be needed it too. So i'm post it here. Everything is very simple. PS May i ask a question: what amount of overhead bring addition of varfake field? Even if there is a 1K vars only 4K memory overhead.It have checked in a few places, so performance overhead is not so big. I just want to understand. regards, --- .evgen
The syntax below is wrong. Oracle has: [START WITH condition] CONNECT BY condition not the reverse as stated below. Maybe this was just a documentation mistake (I haven't looked at the code). Fernando Evgen Potemkin wrote: > + > + Syntax. > + > + SELECT ... FROM ... [ WHERE condition ] CONNECT BY c_expr PRIOR c_expr START WITH expr > + [ HAVING condition [, ...]] [ LIMIT ... ] [ OFFSET ... ] > + -- Fernando Nasser Red Hat - Toronto E-Mail: fnasser@redhat.com 2323 Yonge Street, Suite #300 Toronto, Ontario M4P 2C9
Fernando Nasser wrote:> The syntax below is wrong. Oracle has: > > [START WITH condition] CONNECT BY condition > > not the reverse as stated below. Maybe this was just a documentation > mistake (I haven't looked at the code). > > Fernando > > Evgen Potemkin wrote: > >> + + Syntax. >> + + SELECT ... FROM ... [ WHERE condition ] CONNECT BY c_expr PRIOR >> c_expr START WITH expr + [ HAVING condition [, ...]] [ LIMIT ... ] [ >> OFFSET ... ] >> + > Furthermore, PRIOR is not a clause, but a unary operator that must be applied to one of the terms of the CONNECT BY expression. So, we would actually have: [START WITH expr] CONNECT BY c_expr = PRIOR c_expr or [START WITH expr] CONNECT BY PRIOR c_expr = c_expr (we probably don't want to support operators other than '=') And there is that '+' in there... what is it for? So I guess we would be implementing something that is not either the standard SQL nor the Oracle syntax. -- Fernando Nasser Red Hat - Toronto E-Mail: fnasser@redhat.com 2323 Yonge Street, Suite #300 Toronto, Ontario M4P 2C9
Tom Lane wrote: > > Personally I'd prefer to forget Oracle's syntax --- it looks to me like > it's at least as badly designed as their outer-join syntax, if not worse > --- and use SQL99's recursive-query syntax for this sort of thing. > Have you looked at that? > Evgen's query (put in Oracle's syntax): SELECT * FROM data START WITH id=0 CONNECT BY id = PRIOR pnt; would have to be implemented by something like: WITH flat_tree (id, pnt, data, level) AS (SELECT id, pnt, data, 1 FROM data WHERE id = 0 UNION SELECT d.in, d.pnt, d.data, f.level + 1 FROM data d, flat_tree f WHERE d.pnt = f.id) SELECT * FROM flat_tree ORDER BY level; (I am simplifying this, one would have to add a path variable to make it depth first). I guess the rewriter could use the START WITH expression to create the first select and the CONNECT BY clause to create the second one. Maybe even the parser could do most of the transformation (maybe). Anyway, the Oracle syntax is indeed more compact, but is not as generic as the SQL99 (and IBM DB2) one, so we can always implement it on top of that. I think even DB2 implements the SQL99 recursion with some restrictions (mostly for safety) and that probably covers 99.99% of the uses. Maybe even a basic implementation of the SQL one can accommodate the execution of rewritten Oracle CONNECT BY queries. I agree with Tom that we should implement the SQL99 one first and then, if possible, add the Oracle compatibility to it. -- Fernando Nasser Red Hat - Toronto E-Mail: fnasser@redhat.com 2323 Yonge Street, Suite #300 Toronto, Ontario M4P 2C9
it supports both versions: start before connect and vice versa. regards, --- .evgen On Tue, 26 Nov 2002, Fernando Nasser wrote: > The syntax below is wrong. Oracle has: > > [START WITH condition] CONNECT BY condition > > not the reverse as stated below. Maybe this was just a documentation mistake (I > haven't looked at the code). > > Fernando > > Evgen Potemkin wrote: > > + > > + Syntax. > > + > > + SELECT ... FROM ... [ WHERE condition ] CONNECT BY c_expr PRIOR c_expr START WITH expr > > + [ HAVING condition [, ...]] [ LIMIT ... ] [ OFFSET ... ] > -- > Fernando Nasser > Red Hat - Toronto E-Mail: fnasser@redhat.com > 2323 Yonge Street, Suite #300 > Toronto, Ontario M4P 2C9 > >
it's fixed in patch for PG 7.3 '+' comes from .diff, where you take it from :) regards, --- .evgen On Tue, 26 Nov 2002, Fernando Nasser wrote: > Fernando Nasser wrote:> The syntax below is wrong. Oracle has: > > > > [START WITH condition] CONNECT BY condition > > > > not the reverse as stated below. Maybe this was just a documentation > > mistake (I haven't looked at the code). > >> + + Syntax. > >> + + SELECT ... FROM ... [ WHERE condition ] CONNECT BY c_expr PRIOR > >> c_expr START WITH expr + [ HAVING condition [, ...]] [ LIMIT ... ] [ > >> OFFSET ... ] > >> + > > > > Furthermore, PRIOR is not a clause, but a unary operator that must be applied to > one of the terms of the CONNECT BY expression. > > So, we would actually have: > > [START WITH expr] CONNECT BY c_expr = PRIOR c_expr > > And there is that '+' in there... what is it for? > > > So I guess we would be implementing something that is not either the standard > SQL nor the Oracle syntax. > > > > -- > Fernando Nasser > Red Hat - Toronto E-Mail: fnasser@redhat.com > 2323 Yonge Street, Suite #300 > Toronto, Ontario M4P 2C9 > >
Evgen Potemkin wrote:> it's fixed in patch for PG 7.3 > > '+' comes from .diff, where you take it from :) > From here: >>>>+ + Syntax. >>>>+ + SELECT ... FROM ... [ WHERE condition ] CONNECT BY c_expr PRIOR >>>>c_expr START WITH expr + [ HAVING condition [, ...]] [ LIMIT ... ] [ ^^^^ >>>>OFFSET ... ] >>>>+ >>> -- Fernando Nasser Red Hat - Toronto E-Mail: fnasser@redhat.com 2323 Yonge Street, Suite #300 Toronto, Ontario M4P 2C9
Evgen Potemkin wrote:> it supports both versions: start before connect and vice versa. > > regards, > --- > .evgen > Why do you want to support the START WITH after the CONNECT BY if that syntax does not exist anywhere (Oracle uses it before)? Adding another option just increases the chances of parser conflicts in the future. Regards, Fernando > On Tue, 26 Nov 2002, Fernando Nasser wrote: > > >>The syntax below is wrong. Oracle has: >> >>[START WITH condition] CONNECT BY condition >> >>not the reverse as stated below. Maybe this was just a documentation mistake (I >>haven't looked at the code). >> >>Fernando >> >>Evgen Potemkin wrote: >> >>>+ >>>+ Syntax. >>>+ >>>+ SELECT ... FROM ... [ WHERE condition ] CONNECT BY c_expr PRIOR c_expr START WITH expr >>>+ [ HAVING condition [, ...]] [ LIMIT ... ] [ OFFSET ... ] >> >>-- >>Fernando Nasser >>Red Hat - Toronto E-Mail: fnasser@redhat.com >>2323 Yonge Street, Suite #300 >>Toronto, Ontario M4P 2C9 >> >> > > @redhat.com 2323 Yonge Street, Suite #300 Toronto, Ontario M4P 2C9
Evgen Potemkin wrote:> both versions of syntax are supported for convenience, also AFAIK there will > be no conflicts. > How can you tell? Can you foretell what the SQL standard will create in the future? You don't see conflicts now. There is a good chance that the standards committee avoids conflicting with some syntax used by a major vendor like Oracle. But your (unnecessary) extensions to that syntax will not be taken into consideration, of course. This will increase the chances that we find a parser conflict when trying to implement some future SQL standard construct. Furthermore, clauses in SQL are ordered. There is no reason for allowing random ordering of clauses in a specific statement. Regards, Fernando > > On Thu, 28 Nov 2002, Fernando Nasser wrote: > > >>Evgen Potemkin wrote:> it supports both versions: start before connect and vice >>versa. >> >>>regards, >>>--- >>>.evgen >>> >> >>Why do you want to support the START WITH after the CONNECT BY if that syntax >>does not exist anywhere (Oracle uses it before)? Adding another option just >>increases the chances of parser conflicts in the future. >> >>Regards, >>Fernando > > >>>>Fernando Nasser >>>>Red Hat - Toronto E-Mail: fnasser@redhat.com >>>>2323 Yonge Street, Suite #300 >>>>Toronto, Ontario M4P 2C9 >>> > -- Fernando Nasser Red Hat - Toronto E-Mail: fnasser@redhat.com 2323 Yonge Street, Suite #300 Toronto, Ontario M4P 2C9