Re: identifying duplicates in table with redundancies - Mailing list pgsql-sql

From Oliveiros d'Azevedo Cristina
Subject Re: identifying duplicates in table with redundancies
Date
Msg-id 5F1C28C4D61848D6A012C5DD84B22226@marktestcr.marktest.pt
Whole thread Raw
In response to identifying duplicates in table with redundancies  (Tarlika Elisabeth Schmitz <postgresql@numerixtechnology.de>)
Responses Re: identifying duplicates in table with redundancies  ("Tarlika Elisabeth Schmitz" <postgresql2@numerixtechnology.de>)
List pgsql-sql
Hello again,
Tarlika.

In what concerns to indices, I 'm affraid I may not be the best person to 
advise you, my knowledge of them hardly goes beyond the most trivial cases.

I'm sure there are plenty of other people on the list who are able to give 
you better advise than me.
But, on this query in particular I would recomend an indice on trainer_name, 
as this field will be used on the join and on the group by.
For the other query, the one you get by substituting trainer_name by 
trainer_id, place an index on trainer_id.
Also, these indexes may help speed up the order by clause, if you use one.

If you have a table with lots of data you can try them around and see how 
performance varies (and don't forget there's also EXPLAIN ANALYZE)

Bear in mind, though, that this is just my 2 cents on a matter that I don't 
really master. Do not take this as an expert answer.

There are many people on this list that can help you better

Best,
Oliveiros

----- Original Message ----- 
From: "Tarlika Elisabeth Schmitz" <postgresql@numerixtechnology.de>
To: <pgsql-sql@postgresql.org>
Sent: Monday, September 27, 2010 5:54 PM
Subject: Re: [SQL] identifying duplicates in table with redundancies


> On Fri, 24 Sep 2010 18:12:18 +0100
> Oliver d'Azevedo Christina <oliveiros.cristina@gmail.com> wrote:
>
>>>> SELECT DISTINCT trainer_id,trainer_name
>>>> FROM (
>>>> SELECT trainer_name   -- The field you want to test for duplicates
>>>> FROM (
>>>> SELECT DISTINCT "trainer_id","trainer_name"
>>>> FROM student
>>>> ) x
>>>> GROUP BY "trainer_name"    -- the field you want to test for
>>>> duplicates
>>>> HAVING (COUNT(*) > 1)
>>>> ) z
>>>> NATURAL JOIN student y
>
>
>
> What indices would you recommend for this operation?
>
> -- 
>
> Best Regards,
> Tarlika Elisabeth Schmitz
>
> -- 
> Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-sql 



pgsql-sql by date:

Previous
From: gangadhar bandaru
Date:
Subject: eB7E2R
Next
From: msi77
Date:
Subject: Re: is there a distinct function for comma lists ?