Re: identifying duplicates in table with redundancies - Mailing list pgsql-sql
From | Tarlika Elisabeth Schmitz |
---|---|
Subject | Re: identifying duplicates in table with redundancies |
Date | |
Msg-id | 20100924171226.6000b310@dick.coachhouse Whole thread Raw |
In response to | Re: identifying duplicates in table with redundancies ("Oliveiros d'Azevedo Cristina" <oliveiros.cristina@marktest.pt>) |
Responses |
Re: identifying duplicates in table with redundancies
|
List | pgsql-sql |
Dear Oliveiros, Thank you for taking the time to help. On Fri, 24 Sep 2010 11:22:21 +0100 "Oliveiros d'Azevedo Cristina" <oliveiros.cristina@marktest.pt> wrote: >----- Original Message ----- >From: "Tarlika Elisabeth Schmitz" <postgresql@numerixtechnology.de> >To: <pgsql-sql@postgresql.org> >Sent: Thursday, September 23, 2010 10:39 PM >Subject: [SQL] identifying duplicates in table with redundancies > > >>[...] I want to check for duplicates: >> >> 1) multiples trainer names for same trainer id >> 2) multiple trainer ids for same trainer name >> >> I cobbled together the SQL and it does the job but it seems rather >> convoluted. I would like to know how I can improve it. >> >> CREATE TABLE student ( >> id INTEGER NOT NULL, >> name VARCHAR(256) NOT NULL, >> trainer_id INTEGER, >> trainer_name VARCHAR(256), >> ); >> >> ==== >> EXAMPLE DATA >> >> 22 John 1 Macdonald >> 23 Jane 1 MacDonald >> 24 Paul 1 MacDonald >> 25 Dick 2 Smith >> 26 Bill 3 Smith >> 27 Kate 3 Smith >> ==== >> >> select trainer_id, trainer_name from >> ( >> select distinct on (trainer_name) trainer_id, trainer_name >> from student >> where trainer_id in >> ( >> select distinct on (id) id >> from >> ( >> select distinct on (trainer_id,trainer_name) >> trainer_id as id, >> trainer_name as name from student >> ) as trainer >> group by trainer.id >> having count (trainer.name) > 1 >> ) >> ) as y >> order by trainer_id >Howdy, Tarlika. > >First, did you past correctly your query into your mail? >I am asking this because your query doesn't seem work for me, it >returns an empty list :-| > >Your most nested query, [...] >returns this > >1|"MacDonald" >2|"Smith" >3|"Smith" For me, the innermost query returns: 1|"Macdonald" 1|"MacDonald" 2|"Smith" 3|"Smith" (note the lower/uppercase "d" in MacDonald) The whole query returns: 1|"Macdonald" 1|"MacDonald" >> 1) multiples trainer names for same trainer id >> 2) multiple trainer ids for same trainer name > >To achieve 2) I would use this query > >SELECT DISTINCT trainer_id,trainer_name >FROM ( >SELECT trainer_name -- The field you want to test for duplicates >FROM ( >SELECT DISTINCT "trainer_id","trainer_name" >FROM student >) x >GROUP BY "trainer_name" -- the field you want to test for duplicates >HAVING (COUNT(*) > 1) >) z >NATURAL JOIN student y I see my 2 innermost queries are the same as yours, just a bit more wordy. I messed up at the third query, which threw up an error when I tried to add an ORDER BY. >It will give you a list of the trainer names who have more than one >trainer ID and the respective trainer IDS. >For your particular example data result will be >2|"Smith" >3|"Smith" Splendid! Just what I wanted. >As Smith is the only trainer with records with diferent trainer IDs. >Question : Can this be what you want? The real table has 250000 entries and quite a few dups. >If you want to achieve 1) just substitute the trainer_name by >trainer_id on the commented places. 1) works as well now - just had to transpose id/name. -- Best Regards, Tarlika Elisabeth Schmitz