Thread: Selecting dupes from table

Selecting dupes from table

From

Uros

Date:

24 June 2003, 10:16:59

Hello ,

I have table directory with 3 columns (id,url,title)

I want to list all entries with duplicate urls.

I tried this:

select id,url,title from directory where url IN
  (select url from directory group by url having count(url) > 1)
ORDER by url;

but this takes 30 seconds with 25.000 entries. I have index on url.

Can I use any other query to select this faster.


--
Best regards,
 Uros                          mailto:uros@sir-mag.com

Re: Selecting dupes from table

From

Jean-Christian Imbeault

Date:

24 June 2003, 10:33:04

Uros wrote:
>
> I want to list all entries with duplicate urls.
>
> I tried this:
>
> select id,url,title from directory where url IN
>   (select url from directory group by url having count(url) > 1)
> ORDER by url;

Try:

select id,url,title from directory where group by id, url, title having
count(url) > 1 order by url;

Think it should work,

Jan-Christian Imbeault

Re: Selecting dupes from table

From

Martijn van Oosterhout

Date:

24 June 2003, 10:33:20

On Tue, Jun 24, 2003 at 12:16:43PM +0200, Uros wrote:
> Hello ,
>
> I have table directory with 3 columns (id,url,title)
>
> I want to list all entries with duplicate urls.
>
> I tried this:
>
> select id,url,title from directory where url IN
>   (select url from directory group by url having count(url) > 1)
> ORDER by url;
>
> but this takes 30 seconds with 25.000 entries. I have index on url.
>
> Can I use any other query to select this faster.

How about:

Duplicate urls would be given by:

select url from directory group by url having count(*) > 1;

To get all the entries with those urls, something like:

select id,url,title from directory,
(select url from directory group by url having count(*) > 1) as list
where list.url = directory.url;

I hope I got the syntax right.
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> "the West won the world not by the superiority of its ideas or values or
> religion but rather by its superiority in applying organized violence.
> Westerners often forget this fact, non-Westerners never do."
>   - Samuel P. Huntington

Attachment

msg-32091-62186.dat

Re: Selecting dupes from table

From

Uros

Date:

24 June 2003, 11:12:20

Hello Martijn,

Tuesday, June 24, 2003, 12:32:53 PM, you wrote:

MvO> On Tue, Jun 24, 2003 at 12:16:43PM +0200, Uros wrote:

MvO> How about:

MvO> Duplicate urls would be given by:

MvO> select url from directory group by url having count(*) > 1;

MvO> To get all the entries with those urls, something like:

MvO> select id,url,title from directory,
MvO> (select url from directory group by url having count(*) > 1) as list
MvO> where list.url = directory.url;

MvO> I hope I got the syntax right.


I tried that before but got error:

ERROR:  Column reference "url" is ambiguous

--
Best regards,
 Uros                            mailto:uros@sir-mag.com

Re: Selecting dupes from table

From

Martijn van Oosterhout

Date:

24 June 2003, 11:23:25

On Tue, Jun 24, 2003 at 01:12:05PM +0200, Uros wrote:
> Hello Martijn,
>
> MvO> select id,url,title from directory,
> MvO> (select url from directory group by url having count(*) > 1) as list
> MvO> where list.url = directory.url;
>
> MvO> I hope I got the syntax right.
>
> I tried that before but got error:
>
> ERROR:  Column reference "url" is ambiguous

Oh right, try:

select id,directory.url,title from directory,
(select url from directory group by url having count(*) > 1) as list
where list.url = directory.url;
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> "the West won the world not by the superiority of its ideas or values or
> religion but rather by its superiority in applying organized violence.
> Westerners often forget this fact, non-Westerners never do."
>   - Samuel P. Huntington

Attachment

msg-32091-62190.dat

Re: Selecting dupes from table

From

Hervé Piedvache

Date:

24 June 2003, 11:27:59

Hi,

I think this could be good :

select d1.id, d1.url, d1.tittle
   from directory d1
where exists (select url from directory d2 where d1.url=d2.url and
d2.oid>d1.oid)
  order by d1.url;

Hope this will help,

Regards,

Le Mardi 24 Juin 2003 12:16, Uros a écrit :
> Hello ,
>
> I have table directory with 3 columns (id,url,title)
>
> I want to list all entries with duplicate urls.
>
> I tried this:
>
> select id,url,title from directory where url IN
>   (select url from directory group by url having count(url) > 1)
> ORDER by url;
>
> but this takes 30 seconds with 25.000 entries. I have index on url.
>
> Can I use any other query to select this faster.

--
Hervé Piedvache

Elma Ingénierie Informatique
6 rue du Faubourg Saint-Honoré
F-75008 - Paris - France
Pho. 33-144949901
Fax. 33-144949902

Re: Selecting dupes from table

From

Uros

Date:

24 June 2003, 11:40:25

Hello Hervé,


Thanks a lot for help. Both Martijn van Oosterhout and yours do job god.
Execution time is now about 1 to 2 seconds. I also figured myself what i
was doing wrong with join and error i got.

If enybody need this here is both three solutions.

select distinct d1.id,d1.url,d1.title from directory d1 inner join directory d2 on d2.url = d1.url where d1.id <> d2.id
ORDERby d1.url; 

select id,directory.url,title from directory, (select url from directory group by url having count(*) > 1) as list
wherelist.url = directory.url; 

select d1.id, d1.url, d1.tittle  from directory d1 where exists (select url from directory d2 where d1.url=d2.url and
d2.oid>d1.oid) order by d1.url; 


--
Best regards,
 Uros                            mailto:uros@sir-mag.com



Tuesday, June 24, 2003, 1:27:44 PM, you wrote:

HP> Hi,

HP> I think this could be good :

HP> select d1.id, d1.url, d1.tittle
HP>    from directory d1
HP> where exists (select url from directory d2 where d1.url=d2.url and
d2.oid>>d1.oid)
HP>   order by d1.url;

HP> Hope this will help,

HP> Regards,

HP> Le Mardi 24 Juin 2003 12:16, Uros a écrit :
>> Hello ,
>>
>> I have table directory with 3 columns (id,url,title)
>>
>> I want to list all entries with duplicate urls.
>>
>> I tried this:
>>
>> select id,url,title from directory where url IN
>>   (select url from directory group by url having count(url) > 1)
>> ORDER by url;
>>
>> but this takes 30 seconds with 25.000 entries. I have index on url.
>>
>> Can I use any other query to select this faster.

Re: Selecting dupes from table

From

Bruno Wolff III

Date:

24 June 2003, 16:35:40

On Tue, Jun 24, 2003 at 19:31:46 +0900,
  Jean-Christian Imbeault <jc@mega-bucks.co.jp> wrote:
>
> Try:
>
> select id,url,title from directory where group by id, url, title having
> count(url) > 1 order by url;

That won't work because the count is over the triples, rather than just
the url.