Re: Google Summer of Code: Full Disjunctions - Mailing list pgsql-general
From | Tzahi Fadida |
---|---|
Subject | Re: Google Summer of Code: Full Disjunctions |
Date | |
Msg-id | 015901c672c5$130b76c0$0b00a8c0@llord Whole thread Raw |
In response to | Re: Google Summer of Code: Full Disjunctions ("Tzahi Fadida" <tzahi_ml@myrealbox.com>) |
List | pgsql-general |
Hi Lee, First, i have no knowledge of anyone that have implemented full disjunctions(ever) aside from the theoretical works of my colleagues. With the exception of a corner case of it, that I believe was a simulation in 96. (A. Rajaman and J.D. Ullman Integrating information by outerjoins and full-disjunctions). I'd love to hear about any implementation out there (aside from my colleagues work, which is mine also: cohen,sagiv, kimelfeld,kanza) It can never be a binary operation since at the heart of the matter is that you need to take each subset of the relations and join them. i.e.: A B C A,B A,C B,C A,B,C and remove subsumptions, for example if you have {t_A,t_B} from the A,B join, you will not include just {t_A} or just {t_B} in the result (subsumption). Usually binary operations allow for a bottom up computation approach, but FD is a TOP down approach (Galindo-Legaria, C. outerjoins as disjunctions). The answer what to do is a bit complex and part of it is in a paper that is currently in review, however: You take the cyclic component, for example FD(A,B,C,D) and then join it with another relation E. It's not a trouble to write (A fd B) fd C but it will still have to be computed as one computation - FD(A,B,C). Again, note that FD is only useful for cases you can't use natural outer joins. So only if you have a cycle you use FD. In my current implementation I do getfdr(A,B,C) to get the RECORD since with FD you cannot know in advance what will be the eventual RECORD (unless you save the RECORD type for A,B,C and A,B,C,D etc...) then I run, for example, FD('A,B,C') as RECORD (a0 int, ....) As you can see, you can now join other relations to the result, it's a standard recordset. So (FD('A,B,C') as RECORD (a0 int, ....)) Here is an example of a RUN (ignore all the additional parameters): select getfdr('rand_1_0,rand_1_1,rand_1_2'); getfdr --------------------------- (a0 int4,a1 int4,a2 int4) (1 row) select * from odmbfd('rand_1_0,rand_1_1,rand_1_2',true,true,100,'random',10,10) AS RECORD (a0 int, a1 int, a2 int) natural full outer join rand_4_2; a2 | a0 | a1 | a3 ------+------+------+------ 1 | | 1813 | 1 | | 3091 | 1 | | 316 | 2 | 2113 | | 2738 3 | | 2924 | 823 3 | | 2924 | 2735 ..... ..... select count(*) from odmbfd('rand_1_0,rand_1_1,rand_1_2',true,true,100,'random',10,10) AS RECORD (a0 int, a1 int, a2 int) natural full outer join rand_4_2; count ------- 3201 (1 row) >Hi Tzahi > >Since you seem to know about full disjunctions, I was wondering if you >could answer this question: > >Can full disjunction be implemented as a binary operator? > >The algorithms I've seen for computing it seem to require that all >input relations be supplied at once. Even in your message above you >specified your example as FD(A,B,C) rather than say (A fd B) fd C. >So I was wondering if the latter is possible? > >The reason I ask is that I'm currently trying to solve a problem that >could in principle be solved using FD but it would fit into my current >framework much better if it were a binary operator like the other >joins. > >Thanks in advance. >Lee
pgsql-general by date: