Thread: Selecting dupes from table
Hello , I have table directory with 3 columns (id,url,title) I want to list all entries with duplicate urls. I tried this: select id,url,title from directory where url IN (select url from directory group by url having count(url) > 1) ORDER by url; but this takes 30 seconds with 25.000 entries. I have index on url. Can I use any other query to select this faster. -- Best regards, Uros mailto:uros@sir-mag.com
Uros wrote: > > I want to list all entries with duplicate urls. > > I tried this: > > select id,url,title from directory where url IN > (select url from directory group by url having count(url) > 1) > ORDER by url; Try: select id,url,title from directory where group by id, url, title having count(url) > 1 order by url; Think it should work, Jan-Christian Imbeault
On Tue, Jun 24, 2003 at 12:16:43PM +0200, Uros wrote: > Hello , > > I have table directory with 3 columns (id,url,title) > > I want to list all entries with duplicate urls. > > I tried this: > > select id,url,title from directory where url IN > (select url from directory group by url having count(url) > 1) > ORDER by url; > > but this takes 30 seconds with 25.000 entries. I have index on url. > > Can I use any other query to select this faster. How about: Duplicate urls would be given by: select url from directory group by url having count(*) > 1; To get all the entries with those urls, something like: select id,url,title from directory, (select url from directory group by url having count(*) > 1) as list where list.url = directory.url; I hope I got the syntax right. -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > "the West won the world not by the superiority of its ideas or values or > religion but rather by its superiority in applying organized violence. > Westerners often forget this fact, non-Westerners never do." > - Samuel P. Huntington
Attachment
Hello Martijn, Tuesday, June 24, 2003, 12:32:53 PM, you wrote: MvO> On Tue, Jun 24, 2003 at 12:16:43PM +0200, Uros wrote: MvO> How about: MvO> Duplicate urls would be given by: MvO> select url from directory group by url having count(*) > 1; MvO> To get all the entries with those urls, something like: MvO> select id,url,title from directory, MvO> (select url from directory group by url having count(*) > 1) as list MvO> where list.url = directory.url; MvO> I hope I got the syntax right. I tried that before but got error: ERROR: Column reference "url" is ambiguous -- Best regards, Uros mailto:uros@sir-mag.com
On Tue, Jun 24, 2003 at 01:12:05PM +0200, Uros wrote: > Hello Martijn, > > MvO> select id,url,title from directory, > MvO> (select url from directory group by url having count(*) > 1) as list > MvO> where list.url = directory.url; > > MvO> I hope I got the syntax right. > > I tried that before but got error: > > ERROR: Column reference "url" is ambiguous Oh right, try: select id,directory.url,title from directory, (select url from directory group by url having count(*) > 1) as list where list.url = directory.url; -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > "the West won the world not by the superiority of its ideas or values or > religion but rather by its superiority in applying organized violence. > Westerners often forget this fact, non-Westerners never do." > - Samuel P. Huntington
Attachment
Hi, I think this could be good : select d1.id, d1.url, d1.tittle from directory d1 where exists (select url from directory d2 where d1.url=d2.url and d2.oid>d1.oid) order by d1.url; Hope this will help, Regards, Le Mardi 24 Juin 2003 12:16, Uros a écrit : > Hello , > > I have table directory with 3 columns (id,url,title) > > I want to list all entries with duplicate urls. > > I tried this: > > select id,url,title from directory where url IN > (select url from directory group by url having count(url) > 1) > ORDER by url; > > but this takes 30 seconds with 25.000 entries. I have index on url. > > Can I use any other query to select this faster. -- Hervé Piedvache Elma Ingénierie Informatique 6 rue du Faubourg Saint-Honoré F-75008 - Paris - France Pho. 33-144949901 Fax. 33-144949902
Hello Hervé, Thanks a lot for help. Both Martijn van Oosterhout and yours do job god. Execution time is now about 1 to 2 seconds. I also figured myself what i was doing wrong with join and error i got. If enybody need this here is both three solutions. select distinct d1.id,d1.url,d1.title from directory d1 inner join directory d2 on d2.url = d1.url where d1.id <> d2.id ORDERby d1.url; select id,directory.url,title from directory, (select url from directory group by url having count(*) > 1) as list wherelist.url = directory.url; select d1.id, d1.url, d1.tittle from directory d1 where exists (select url from directory d2 where d1.url=d2.url and d2.oid>d1.oid) order by d1.url; -- Best regards, Uros mailto:uros@sir-mag.com Tuesday, June 24, 2003, 1:27:44 PM, you wrote: HP> Hi, HP> I think this could be good : HP> select d1.id, d1.url, d1.tittle HP> from directory d1 HP> where exists (select url from directory d2 where d1.url=d2.url and d2.oid>>d1.oid) HP> order by d1.url; HP> Hope this will help, HP> Regards, HP> Le Mardi 24 Juin 2003 12:16, Uros a écrit : >> Hello , >> >> I have table directory with 3 columns (id,url,title) >> >> I want to list all entries with duplicate urls. >> >> I tried this: >> >> select id,url,title from directory where url IN >> (select url from directory group by url having count(url) > 1) >> ORDER by url; >> >> but this takes 30 seconds with 25.000 entries. I have index on url. >> >> Can I use any other query to select this faster.
On Tue, Jun 24, 2003 at 19:31:46 +0900, Jean-Christian Imbeault <jc@mega-bucks.co.jp> wrote: > > Try: > > select id,url,title from directory where group by id, url, title having > count(url) > 1 order by url; That won't work because the count is over the triples, rather than just the url.