Re: Google Summer of Code: Full Disjunctions - Mailing list pgsql-general

From Tzahi Fadida
Subject Re: Google Summer of Code: Full Disjunctions
Date
Msg-id 015901c672c5$130b76c0$0b00a8c0@llord
Whole thread Raw
In response to Re: Google Summer of Code: Full Disjunctions  ("Tzahi Fadida" <tzahi_ml@myrealbox.com>)
List pgsql-general
Hi Lee,
First, i have no knowledge of anyone that have implemented full disjunctions(ever) aside
from the theoretical works of my colleagues.
With the exception of a corner case of it, that I believe was a simulation in 96.
(A. Rajaman and J.D. Ullman Integrating information by outerjoins and full-disjunctions).
I'd love to hear about any implementation out there (aside from my colleagues work, which
is mine also: cohen,sagiv, kimelfeld,kanza)

It can never be a binary operation since at the heart of the matter is that you need to take
each subset of the relations and join them. i.e.:
A
B
C
A,B
A,C
B,C
A,B,C

and remove subsumptions, for example if you have {t_A,t_B} from the A,B join,
you will not include just {t_A} or just {t_B} in the result (subsumption).

Usually binary operations allow for a bottom up computation approach, but FD is a TOP down approach
(Galindo-Legaria, C. outerjoins as disjunctions).

The answer what to do is a bit complex and part of it is in a paper that is currently in review,
however:

You take the cyclic component, for example FD(A,B,C,D) and then join it with another relation E.
It's not a trouble to write (A fd B) fd C but it will still have to be computed as one computation -
FD(A,B,C).

Again, note that FD is only useful for cases you can't use natural outer joins.
So only if you have a cycle you use FD.

In my current implementation I do getfdr(A,B,C) to get the RECORD since with FD you cannot
know in advance what will be the eventual RECORD (unless you save the RECORD type for A,B,C
and A,B,C,D etc...)

then I run, for example, FD('A,B,C') as RECORD (a0 int, ....)
As you can see, you can now join other relations to the result, it's a standard recordset.
So (FD('A,B,C') as RECORD (a0 int, ....))

Here is an example of a RUN (ignore all the additional parameters):
select getfdr('rand_1_0,rand_1_1,rand_1_2');
          getfdr
---------------------------
 (a0 int4,a1 int4,a2 int4)
(1 row)
select * from odmbfd('rand_1_0,rand_1_1,rand_1_2',true,true,100,'random',10,10) AS RECORD (a0 int,
a1 int, a2 int) natural full outer join rand_4_2;
  a2  |  a0  |  a1  |  a3
------+------+------+------
    1 |      | 1813 |
    1 |      | 3091 |
    1 |      |  316 |
    2 | 2113 |      | 2738
    3 |      | 2924 |  823
    3 |      | 2924 | 2735
.....
.....

select count(*) from odmbfd('rand_1_0,rand_1_1,rand_1_2',true,true,100,'random',10,10) AS RECORD (a0
int, a1 int, a2 int) natural full outer join rand_4_2;
 count
-------
  3201
(1 row)

>Hi Tzahi
>
>Since you seem to know about full disjunctions, I was wondering if you
>could answer this question:
>
>Can full disjunction be implemented as a binary operator?
>
>The algorithms I've seen for computing it seem to require that all
>input relations be supplied at once. Even in your message above you
>specified your example as FD(A,B,C) rather than say (A fd B) fd C.
>So I was wondering if the latter is possible?
>
>The reason I ask is that I'm currently trying to solve a problem that
>could in principle be solved using FD but it would fit into my current
>framework much better if it were a binary operator like the other
>joins.
>
>Thanks in advance.
>Lee



pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: catalog is missing 16 attribute(s) for relid 8202590
Next
From: Bruno Wolff III
Date:
Subject: Re: database size grows (even after vacuum (full and analyze))....