identifying duplicates in table with redundancies - Mailing list pgsql-sql

From Tarlika Elisabeth Schmitz
Subject identifying duplicates in table with redundancies
Date
Msg-id 20100923223940.5019475e@dick.coachhouse
Whole thread Raw
List pgsql-sql
I loaded data from a spread into a interim table so I can analyze the
quality of the data.

The table contains an entry for every student (250K records) and his
trainer. Eventually, I want to extract a unique list of trainers from
it. But first of all I want to check for duplicates:

1) multiples trainer names for same trainer id
2) multiple trainer ids for same trainer name

I cobbled together the SQL and it does the job but it seems rather
convoluted. I would like to know how I can improve it. 

CREATE TABLE student (
id INTEGER NOT NULL,
name VARCHAR(256) NOT NULL,
trainer_id INTEGER,
trainer_name VARCHAR(256),
);

====
EXAMPLE DATA

22 John 1 Macdonald
23 Jane 1 MacDonald
24 Paul 1 MacDonald
25 Dick 2 Smith
26 Bill 3 Smith
27 Kate 3 Smith
====

-- outputs trainer ids which appear under different names
select trainer_id, trainer_name from 
(   -- different id/name combinations   select distinct on (trainer_name) trainer_id, trainer_name   from student
wheretrainer_id in   (       -- trainer ids with appearing with different names       select distinct on (id) id
from       (           -- distinct trainer id-name           select distinct on (trainer_id,trainer_name)
trainer_idas id,            trainer_name as name from student       ) as trainer       group by trainer.id       having
count(trainer.name) > 1    ) 
 
) as y
order by trainer_id

-- 

Best Regards,
Tarlika Elisabeth Schmitz


pgsql-sql by date:

Previous
From: Kenneth Marshall
Date:
Subject: Re: Question Regarding Unique Index on Table
Next
From: Rob Sargent
Date:
Subject: Re: pg_config -less