Re: identifying duplicates in table with redundancies - Mailing list pgsql-sql

From Oliver d'Azevedo Christina
Subject Re: identifying duplicates in table with redundancies
Date
Msg-id 065CBEA1-1647-4233-BBB6-48E879FD91B0@gmail.com
Whole thread Raw
In response to Re: identifying duplicates in table with redundancies  ("Tarlika Elisabeth Schmitz" <postgresql@numerixtechnology.de>)
Responses Re: identifying duplicates in table with redundancies  (Tarlika Elisabeth Schmitz <postgresql@numerixtechnology.de>)
Re: identifying duplicates in table with redundancies  (Tarlika Elisabeth Schmitz <postgresql@numerixtechnology.de>)
List pgsql-sql
Hey,Tarlika.
I tried to reproduce your test case through a series of inserts and It
seems that the lower case "d" went unnoticed.
That explains the empty list i got as result.  My fault.  Sorry :-(
Great to hear it helped you

Best,
Oliveiros



Enviado de meu iPhone

Em 24/09/2010, às 05:12 PM, "Tarlika Elisabeth Schmitz" <postgresql@numerixtechnology.d
e> escreveu:

> Dear Oliveiros,
> Thank you for taking the time to help.
>
> On Fri, 24 Sep 2010 11:22:21 +0100
> "Oliveiros d'Azevedo Cristina" <oliveiros.cristina@marktest.pt> wrote:
>
>> ----- Original Message -----
>> From: "Tarlika Elisabeth Schmitz" <postgresql@numerixtechnology.de>
>> To: <pgsql-sql@postgresql.org>
>> Sent: Thursday, September 23, 2010 10:39 PM
>> Subject: [SQL] identifying duplicates in table with redundancies
>>
>>
>>> [...] I want to check for duplicates:
>>>
>>> 1) multiples trainer names for same trainer id
>>> 2) multiple trainer ids for same trainer name
>>>
>>> I cobbled together the SQL and it does the job but it seems rather
>>> convoluted. I would like to know how I can improve it.
>>>
>>> CREATE TABLE student (
>>> id INTEGER NOT NULL,
>>> name VARCHAR(256) NOT NULL,
>>> trainer_id INTEGER,
>>> trainer_name VARCHAR(256),
>>> );
>>>
>>> ====
>>> EXAMPLE DATA
>>>
>>> 22 John 1 Macdonald
>>> 23 Jane 1 MacDonald
>>> 24 Paul 1 MacDonald
>>> 25 Dick 2 Smith
>>> 26 Bill 3 Smith
>>> 27 Kate 3 Smith
>>> ====
>>>
>>> select trainer_id, trainer_name from
>>> (
>>>   select distinct on (trainer_name) trainer_id, trainer_name
>>>   from student
>>>   where trainer_id in
>>>   (
>>>       select distinct on (id) id
>>>       from
>>>       (
>>>           select distinct on (trainer_id,trainer_name)
>>>    trainer_id as id,
>>>           trainer_name as name from student
>>>       ) as trainer
>>>       group by trainer.id
>>>       having count (trainer.name) > 1
>>>   )
>>> ) as y
>>> order by trainer_id
>
>
>> Howdy, Tarlika.
>>
>> First, did you past correctly your query into your mail?
>> I am asking this because your query doesn't seem work for me, it
>> returns an empty list :-|
>>
>> Your most nested query, [...]
>> returns this
>>
>> 1|"MacDonald"
>> 2|"Smith"
>> 3|"Smith"
>
>
> For me, the innermost query returns:
> 1|"Macdonald"
> 1|"MacDonald"
> 2|"Smith"
> 3|"Smith"
>
> (note the lower/uppercase "d" in MacDonald)
>
> The whole query returns:
> 1|"Macdonald"
> 1|"MacDonald"
>
>
>>> 1) multiples trainer names for same trainer id
>>> 2) multiple trainer ids for same trainer name
>>
>> To achieve 2) I would use this query
>>
>> SELECT DISTINCT trainer_id,trainer_name
>> FROM (
>> SELECT trainer_name   -- The field you want to test for duplicates
>> FROM (
>> SELECT DISTINCT "trainer_id","trainer_name"
>> FROM student
>> ) x
>> GROUP BY "trainer_name"    -- the field you want to test for
>> duplicates
>> HAVING (COUNT(*) > 1)
>> ) z
>> NATURAL JOIN student y
>
> I see my 2 innermost queries are the same as yours, just a bit more
> wordy. I messed up at the third query, which threw up an error when I
> tried to add an ORDER BY.
>
>> It will give you a list of the trainer names who have more than one
>> trainer ID and the respective trainer IDS.
>> For your particular example data result will be
>> 2|"Smith"
>> 3|"Smith"
>
> Splendid! Just what I wanted.
>
>> As Smith is the only trainer with records with diferent trainer IDs.
>> Question : Can this be what you want?
>
> The real table has 250000 entries and quite a few dups.
>
>> If you want to achieve 1) just substitute the trainer_name by
>> trainer_id on the commented places.
>
> 1) works as well now - just had to transpose id/name.
>
>
>
>
>
> --
>
> Best Regards,
> Tarlika Elisabeth Schmitz
>
> --
> Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-sql


pgsql-sql by date:

Previous
From: "Tarlika Elisabeth Schmitz"
Date:
Subject: Re: identifying duplicates in table with redundancies
Next
From: Tarlika Elisabeth Schmitz
Date:
Subject: Re: identifying duplicates in table with redundancies