Thread: self-join and DISTINCT quandry.
Dear all, Consider the following table: test=> select * from hum; number|letter|family ------+------+------ 1|a |one 2|b |one 3|c |one 4|d |one 5|e |one 6|f |one 7|a |two 8|b |two 9|c |two 10|g |two 11|h |two 12|i |two (12 rows) If I want to know what letters appear in more than one family and what the number for number for such letters is, I could do: test=> select t1.letter, t1.number, t2.number from hum t1, hum t2 where t1.letter=t2.letter and t1.family<>t2.family; letter|number|number ------+------+------ a | 1| 7 b | 2| 8 c | 3| 9 a | 7| 1 b | 8| 2 c | 9| 3 (6 rows) test=> But what is I didn't want the information to appear twice. I could try adding DISTINCT but that would not work because row a,1,7 is of course DISTINCT from a,7,1 although I am looking for such duplication to be eliminated... I'd like to do this without using DISTINCT ON if at all possible (because I disapprove on DISTINCT ON on moral grounds ;) ). regards, S. Stuart C. G. Rison Department of Biochemistry and Molecular Biology 6th floor, Darwin Building, University College London (UCL) Gower Street, London, WC1E 6BT, United Kingdom Tel. 0207 504 2303, Fax. 0207 380 7033 e-mail: rison@biochem.ucl.ac.uk
Stuart Rison <rison@biochemistry.ucl.ac.uk> writes: > But what is I didn't want the information to appear twice. I could try > adding DISTINCT but that would not work because row a,1,7 is of course > DISTINCT from a,7,1 although I am looking for such duplication to be > eliminated... Bit of a hack, but you could add the condition... AND t1.number < t2.number; to get rid of the two-way duplication that otherwise arises from your join. (Any other way of selecting just one of the pair of matches would work as well, of course; you could compare the two row's OIDs if nothing else was handy.) Not sure if you still need DISTINCT after that. regards, tom lane
Dear Tom, Thanks a lot, that works great, the simple ideas are always the best! There is one proviso. If the two numbers are equal, the 'hack' does not work. So the trick, as you point out, is to distinguish on oid's (... AND t1.oid<t2.oid) since these are by definition unique. regards, S. On Wed, 29 Sep 1999, Tom Lane wrote: > Stuart Rison <rison@biochemistry.ucl.ac.uk> writes: > > But what is I didn't want the information to appear twice. I could try > > adding DISTINCT but that would not work because row a,1,7 is of course > > DISTINCT from a,7,1 although I am looking for such duplication to be > > eliminated... > > Bit of a hack, but you could add the condition > ... AND t1.number < t2.number; > to get rid of the two-way duplication that otherwise arises from > your join. (Any other way of selecting just one of the pair > of matches would work as well, of course; you could compare the > two row's OIDs if nothing else was handy.) > > Not sure if you still need DISTINCT after that. > > regards, tom lane > > ************ > Stuart C. G. Rison Department of Biochemistry and Molecular Biology 6th floor, Darwin Building, University College London (UCL) Gower Street, London, WC1E 6BT, United Kingdom Tel. 0207 504 2303, Fax. 0207 380 7033 e-mail: rison@biochem.ucl.ac.uk