Thread: self-join and DISTINCT quandry.

self-join and DISTINCT quandry.

From

Stuart Rison

Date:

29 September 1999, 16:06:51

Dear all,

Consider the following table:

test=> select * from hum;
number|letter|family
------+------+------    1|a     |one    2|b     |one    3|c     |one    4|d     |one    5|e     |one    6|f     |one
7|a    |two    8|b     |two    9|c     |two   10|g     |two   11|h     |two   12|i     |two
 
(12 rows)

If I want to know what letters appear in more than one family and what
the number for number for such letters is, I could do:

test=> select t1.letter, t1.number, t2.number from hum t1, hum t2 where
t1.letter=t2.letter and t1.family<>t2.family;
letter|number|number
------+------+------
a     |     1|     7
b     |     2|     8
c     |     3|     9
a     |     7|     1
b     |     8|     2
c     |     9|     3
(6 rows)

test=>

But what is I didn't want the information to appear twice.  I could try
adding DISTINCT but that would not work because row a,1,7 is of course
DISTINCT from a,7,1 although I am looking for such duplication to be
eliminated...

I'd like to do this without using DISTINCT ON if at all possible (because
I disapprove on DISTINCT ON on moral grounds ;) ).

regards,

S.

Stuart C. G. Rison
Department of Biochemistry and Molecular Biology
6th floor, Darwin Building, University College London (UCL)
Gower Street, London, WC1E 6BT, United Kingdom
Tel. 0207 504 2303, Fax. 0207 380 7033
e-mail: rison@biochem.ucl.ac.uk

Re: [SQL] self-join and DISTINCT quandry.

From

Tom Lane

Date:

29 September 1999, 19:24:02

Stuart Rison <rison@biochemistry.ucl.ac.uk> writes:
> But what is I didn't want the information to appear twice.  I could try
> adding DISTINCT but that would not work because row a,1,7 is of course
> DISTINCT from a,7,1 although I am looking for such duplication to be
> eliminated...

Bit of a hack, but you could add the condition... AND t1.number < t2.number;
to get rid of the two-way duplication that otherwise arises from
your join.  (Any other way of selecting just one of the pair
of matches would work as well, of course; you could compare the
two row's OIDs if nothing else was handy.)

Not sure if you still need DISTINCT after that.
        regards, tom lane

Re: [SQL] self-join and DISTINCT quandry.

From

Stuart Rison

Date:

30 September 1999, 06:03:47

Dear Tom,

Thanks a lot, that works great, the simple ideas are always the best!

There is one proviso.  If the two numbers are equal, the 'hack' does not
work.  So the trick, as you point out, is to distinguish on oid's (... AND
t1.oid<t2.oid) since these are by definition unique.

regards,

S.

On Wed, 29 Sep 1999, Tom Lane wrote:

> Stuart Rison <rison@biochemistry.ucl.ac.uk> writes:
> > But what is I didn't want the information to appear twice.  I could try
> > adding DISTINCT but that would not work because row a,1,7 is of course
> > DISTINCT from a,7,1 although I am looking for such duplication to be
> > eliminated...
> 
> Bit of a hack, but you could add the condition
>     ... AND t1.number < t2.number;
> to get rid of the two-way duplication that otherwise arises from
> your join.  (Any other way of selecting just one of the pair
> of matches would work as well, of course; you could compare the
> two row's OIDs if nothing else was handy.)
> 
> Not sure if you still need DISTINCT after that.
> 
>             regards, tom lane
> 
> ************
> 

Stuart C. G. Rison
Department of Biochemistry and Molecular Biology
6th floor, Darwin Building, University College London (UCL)
Gower Street, London, WC1E 6BT, United Kingdom
Tel. 0207 504 2303, Fax. 0207 380 7033
e-mail: rison@biochem.ucl.ac.uk