I loaded data from a spread into a interim table so I can analyze the
quality of the data.
The table contains an entry for every student (250K records) and his
trainer. Eventually, I want to extract a unique list of trainers from
it. But first of all I want to check for duplicates:
1) multiples trainer names for same trainer id
2) multiple trainer ids for same trainer name
I cobbled together the SQL and it does the job but it seems rather
convoluted. I would like to know how I can improve it.
CREATE TABLE student (
id INTEGER NOT NULL,
name VARCHAR(256) NOT NULL,
trainer_id INTEGER,
trainer_name VARCHAR(256),
);
====
EXAMPLE DATA
22 John 1 Macdonald
23 Jane 1 MacDonald
24 Paul 1 MacDonald
25 Dick 2 Smith
26 Bill 3 Smith
27 Kate 3 Smith
====
-- outputs trainer ids which appear under different names
select trainer_id, trainer_name from
( -- different id/name combinations select distinct on (trainer_name) trainer_id, trainer_name from student
wheretrainer_id in ( -- trainer ids with appearing with different names select distinct on (id) id
from ( -- distinct trainer id-name select distinct on (trainer_id,trainer_name)
trainer_idas id, trainer_name as name from student ) as trainer group by trainer.id having
count(trainer.name) > 1 )
) as y
order by trainer_id
--
Best Regards,
Tarlika Elisabeth Schmitz