Thread: duplicates
I have a system bug which means that I have occasional dupicate entries in my 'subscribers' table. I'm trying to find a query which performs something like a 'where user_name is not unique' query. At the moment we use this cludge: select count(*), user_name from subscribersgroup by user_name order by count; (I'm surprised I can't add 'where count > 1' but it seems conditionals on aggregate fields are not allowed). This gives me a very long list with the 'not unique' entries at the bottom, eg count | user_name ------+------------------ 1 | bill.hicks [ ..cut 9 zillion results.. ] 1 | margaret.thatcher 4 | linus.torvalds 9 | bill.gates I then have to do select oid from subscribers where user_name = 'linus.torvalds'; and delete all but one of the records. Is there a better way to do this? And yes, we're working on a system fix to avoid the problem in the 1st place! TIA, al. -- # Allan Kelly http://www.plotsearch.co.uk# (+44) (0)131 524 8500# allan.kelly@buildstore.co.uk... ..# /Software Engineer/i . . . . .# ------------------------------ * . . . . .# "If you are a Visual Basic programmer, * . . .# these details are none of your business." * . . .# Mr Bunny's Guide to Active X, by Carlton Egremont III * . .# ------------------------------ vi: set noet tw=80 sts=4 ts=8 : .
Hi Allan Firstly I'd suggest a unique index on the column that should be unique to force uniqueness on that column. But since you already have double tuples, you can do the following: delete from subscribers where exists (select 1 from subscribers s where s.user_name = subscribers.user_name and s.oid >subscribers.oid) I'm not 100% certain it'll do the work right, so PLEASE try it out with a testtable first =) Regards, Patrik Kudo Allan Kelly wrote: > > I have a system bug which means that I have occasional dupicate entries in > my 'subscribers' table. I'm trying to find a query which performs something > like a 'where user_name is not unique' query. > > At the moment we use this cludge: > > select count(*), user_name from subscribers > group by user_name order by count; > > (I'm surprised I can't add 'where count > 1' but it seems conditionals on > aggregate fields are not allowed). > > This gives me a very long list with the 'not unique' entries at the bottom, eg > > count | user_name > ------+------------------ > 1 | bill.hicks > [ ..cut 9 zillion results.. ] > 1 | margaret.thatcher > 4 | linus.torvalds > 9 | bill.gates > > I then have to do > > select oid from subscribers where user_name = 'linus.torvalds'; > > and delete all but one of the records. Is there a better way to do this? > And yes, we're working on a system fix to avoid the problem in the 1st place! > > TIA, al.
Hi again. Your query: > select count(*), user_name from subscribers > group by user_name order by count; > would work if you instead wrote: select count(*), user_name from subscribersgroup by user_name having count(*) > 1 Regards, Patrik Kudo
----- Original Message ----- From: "Allan Kelly" <allan.kelly@buildstore.co.uk> To: <pgsql-sql@postgresql.org> Sent: Friday, April 07, 2000 11:34 AM Subject: [SQL] duplicates > I have a system bug which means that I have occasional dupicate entries in > my 'subscribers' table. I'm trying to find a query which performs something > like a 'where user_name is not unique' query. > > At the moment we use this cludge: > > select count(*), user_name from subscribers > group by user_name order by count; > > (I'm surprised I can't add 'where count > 1' but it seems conditionals on > aggregate fields are not allowed). > > This gives me a very long list with the 'not unique' entries at the bottom, eg > > count | user_name > ------+------------------ > 1 | bill.hicks > [ ..cut 9 zillion results.. ] > 1 | margaret.thatcher > 4 | linus.torvalds > 9 | bill.gates > > I then have to do > > select oid from subscribers where user_name = 'linus.torvalds'; > > and delete all but one of the records. Is there a better way to do this? DELETE FROM subscribers WHERE EXISTS(SELECT x.username FROM subscribers x WHERE subscribers.oid <x.oid AND subscribers.username LIKE x.username); ought to do it, I *think*. It will delete all but the most recently inserted copy, i.e. that with the highest oid. Self-joins (which this is, effectively), are often quicker and more efficient than counts. Yours, Moray
> I have a system bug which means that I have occasional dupicate entries in > my 'subscribers' table. I'm trying to find a query which performs something > like a 'where user_name is not unique' query. > > At the moment we use this cludge: > > select count(*), user_name from subscribers > group by user_name order by count; > > (I'm surprised I can't add 'where count > 1' but it seems conditionals on > aggregate fields are not allowed). > > This gives me a very long list with the 'not unique' entries at the bottom, eg > > count | user_name > ------+------------------ > 1 | bill.hicks > [ ..cut 9 zillion results.. ] > 1 | margaret.thatcher > 4 | linus.torvalds > 9 | bill.gates > > I then have to do > > select oid from subscribers where user_name = 'linus.torvalds'; > > and delete all but one of the records. Is there a better way to do this? > And yes, we're working on a system fix to avoid the problem in the 1st place! I would create a unique index on the user_name field (it sounds like a field you want indexed anyway). This way the insert will fail when it is a duplicate. Of course you may have to alter the code that does the insert to ckeck for failure and handle it. Also you will have to delete your duplicates before creating the unique index.