Re: [PATCHES] Adding fulldisjunctions to the contrib - Mailing list pgsql-hackers

From Tzahi Fadida
Subject Re: [PATCHES] Adding fulldisjunctions to the contrib
Date
Msg-id 200608121301.48259.Tzahi.ML@gmail.com
Whole thread Raw
In response to Re: [PATCHES] Adding fulldisjunctions to the contrib  (Bruce Momjian <bruce@momjian.us>)
Responses Re: [PATCHES] Adding fulldisjunctions to the contrib
List pgsql-hackers
On Saturday 12 August 2006 07:22, Bruce Momjian wrote:
> I am still waiting for someone to tell us that they would use this
> capability for a real-world problem.

I suggest looking into web applications.
The example here
http://www.technion.ac.il/~tzahi/soc.html

shows a possible 3 separate web resources.
I.e. heterogeneous sources. Naturally, since the sources
did not know each other in advance, they did not form
relations that would not end up cyclic in the scheme graph.
XMLs are usually like these. Obviously you have to turn them into
relations first of course.
In addition, i have recently added a feature where you give alias to column
names so if you have "country" column and a "state" column that really means
country, you can do "country=public.relation_with_state.state,..." dictionary
style. This is commonly needed in web applications.

Here is another example (improvising :) ):
site1: user_name,email,favorite_book_isbn
site2: user_name,email,favorite_chat_room
site3: user_name,credit_card

So, let's say i wanted to advertise discounts using a certain credit card
for certain books, i would do FD(site1,site2,site3).
Natural join will give - so you get data on people who read some books and
visit certain chat rooms and users credit cards.
FD will give - some people did not buy books but have a credit card and a chat room so you want to advertise anyway.
Somepeople did buy books and usesa certain credit cards but you don't know where they chat, however,you know you want
toadv some best seller that most buy anyway.certain people did buy books and visit chat rooms but you can't offera
specificdiscount, so you will advertise all credit cards.... 

However, caution. FD is a very,very expensive operation even with the new
algorithms so it is best to do FD separately and put the results into a table
and use that table. Unless of course, as common to web applications, the
relations are quite small (few thousands of rows) and they don't connect
strongly. In this cases, on my p1.6 it comes out about 2-3 secs.
However, i can generate the same experiment with strong connectivity
between the relations and it can take hours to compute.
On the other hand i have seen experiments with 100 thousans of records
that finished in a matter of minutes so it all depends on how many join
combination there are in the data.

>
> ---------------------------------------------------------------------------
>
> Tzahi Fadida wrote:
> > On Friday 11 August 2006 07:18, Bruce Momjian wrote:
> > > I have looked over this addition, and I think I finally understand it.
> > > Given three tables, A, B, C, which join as A->B, B->C, C->A, you can
> > > really join them as A->B->C, and A->C->B.  What full disjunction does
> > > is to perform both of those joins, and return a one row for each join.
> > > Here
> >
> > What it does is to return all the possible natural joins, i.e.:
> > A
> > B
> > C
> > A,B
> > A,C
> > ...
> > A,B,C
> >
> > And, it removes any redundant information so that if we have a tuple
> > that already contains another tuple's information that tuple is
> > discarded. Also, note that the full disjunction algorithm i implemented
> > is commonly used in cases where the scheme graph is cyclic
> > and thus, you cannot use natural full outer join
> > to compute the FD.
> >
> > Finally, you can FD(A,B,C,D,...) any number of relations (limited to 32
> > in the implementation) with no regard to the order between them.
> >
> > A case study and comparison can be found here:
> > http://www.technion.ac.il/~tzahi/soc.html
> >
> > > is an example from the README:
> > >
> > >     Example of an input and output of a full disjunctions:
> > >     INPUT:
> > >
> > >         --A---|---B---|---C--
> > >         X---Y-|-Y---Z-|-X---Z
> > >         a-|-b-|-b-|-c-|-a-|-d
> > >
> > >     A,B and C are relations. X,Y and Z are attributes. a,b,c and d are
> > > values.
> > >
> > >     Note that A,B and C are connected in a cycle. That is:
> > >     A is connected to B on attribute Y,
> > >     B is connected to C on attribute Z,
> > >     C is connected to A on attribute X.
> > >
> > >     The output of the full disjunctions FD(A,B,C):
> > >
> > >            FD
> > >         X---Y---Z
> > >         a-|-b-|-c
> > >         a-|-b-|-d
> > >
> > > This code is pretty complex, so I can see why it should be in /contrib.
> > > Are there reasonable use cases for this capability?
> > >
> > > -----------------------------------------------------------------------
> > >----
> > >
> > > Tzahi Fadida wrote:
> > > > Hi,
> > > > I wish to add the fulldisjunctions function to the contrib.
> > > > With the help of Jonah, we (or rather he :) created a patch with
> > > > regression tests. The function is finished programmatically but
> > > > still a little more code documentation touches and improved error
> > > > messages are needed. All the rest was extensively tested.
> > > >
> > > > Attached is the patch.
> > > >
> > > > Works great. Just compiled from a fresh cvs which i patched with the
> > > > attached diff. ran the fulldijsjunction.sql in the
> > > > share/contrib/fulldisjunction and let it run and it works great.
> > > > 10x.
> > > >
> > > > --
> > > > Regards,
> > > > ????????Tzahi.
> > > > --
> > > > Tzahi Fadida
> > > > Blog: http://tzahi.blogsite.org | Home Site: http://tzahi.webhop.info
> > > > WARNING TO SPAMMERS: ?see at
> > > > http://members.lycos.co.uk/my2nis/spamwarning.html
> > >
> > > [ Attachment, skipping... ]
> > >
> > > > ---------------------------(end of
> > > > broadcast)--------------------------- TIP 3: Have you checked our
> > > > extensive FAQ?
> > > >
> > > >                http://www.postgresql.org/docs/faq
> >
> > --
> > Regards,
> > ????????Tzahi.
> > --
> > Tzahi Fadida
> > Blog: http://tzahi.blogsite.org | Home Site: http://tzahi.webhop.info
> > WARNING TO SPAMMERS: ?see at
> > http://members.lycos.co.uk/my2nis/spamwarning.html

--
Regards,
        Tzahi.
--
Tzahi Fadida
Blog: http://tzahi.blogsite.org | Home Site: http://tzahi.webhop.info
WARNING TO SPAMMERS:  see at
http://members.lycos.co.uk/my2nis/spamwarning.html


pgsql-hackers by date:

Previous
From: Tzahi Fadida
Date:
Subject: Re: [PATCHES] Adding fulldisjunctions to the contrib
Next
From: Tom Lane
Date:
Subject: Re: [COMMITTERS] pgsql: plperl: Allow conversion from perl to postgresql array in OUT