Home > mailing lists

identifying duplicates in table with redundancies - Mailing list pgsql-sql

From	Tarlika Elisabeth Schmitz
Subject	identifying duplicates in table with redundancies
Date	September 23, 2010 19:35:40
Msg-id	20100923223940.5019475e@dick.coachhouse Whole thread
List	pgsql-sql

Tree view

I loaded data from a spread into a interim table so I can analyze the
quality of the data.

The table contains an entry for every student (250K records) and his
trainer. Eventually, I want to extract a unique list of trainers from
it. But first of all I want to check for duplicates:

1) multiples trainer names for same trainer id
2) multiple trainer ids for same trainer name

I cobbled together the SQL and it does the job but it seems rather
convoluted. I would like to know how I can improve it. 

CREATE TABLE student (
id INTEGER NOT NULL,
name VARCHAR(256) NOT NULL,
trainer_id INTEGER,
trainer_name VARCHAR(256),
);

====
EXAMPLE DATA

22 John 1 Macdonald
23 Jane 1 MacDonald
24 Paul 1 MacDonald
25 Dick 2 Smith
26 Bill 3 Smith
27 Kate 3 Smith
====

-- outputs trainer ids which appear under different names
select trainer_id, trainer_name from 
(   -- different id/name combinations   select distinct on (trainer_name) trainer_id, trainer_name   from student
wheretrainer_id in   (       -- trainer ids with appearing with different names       select distinct on (id) id
from       (           -- distinct trainer id-name           select distinct on (trainer_id,trainer_name)
trainer_idas id,            trainer_name as name from student       ) as trainer       group by trainer.id       having
count(trainer.name) > 1    ) 
 
) as y
order by trainer_id

-- 

Best Regards,
Tarlika Elisabeth Schmitz

pgsql-sql by date:

From: Kenneth Marshall
Date: 23 September 2010, 15:33:11
Subject: Re: Question Regarding Unique Index on Table

From: Rob Sargent
Date: 23 September 2010, 22:26:08
Subject: Re: pg_config -less

identifying duplicates in table with redundancies - Mailing list pgsql-sql

Previous

Next