Thread: Re: selecting duplicate records

Re: selecting duplicate records

From
Christoph Haller
Date:
> 1. How to select duplicate records only from a single table using a
select
> query.
>
e.g.
select sid,count(sid) from location group by sid having count(sid)>1;

Do you get the idea?
Your request is pretty unspecific, so if this is not what you're asking
for,
try again.

Regards, Christoph




Re: selecting duplicate records

From
Christopher Browne
Date:
The world rejoiced as ch@rodos.fzk.de (Christoph Haller) wrote:
>> 1. How to select duplicate records only from a single table using a
> select
>> query.
>>
> e.g.
> select sid,count(sid) from location group by sid having count(sid)>1;
>
> Do you get the idea?
> Your request is pretty unspecific, so if this is not what you're asking
> for,
> try again.

The aggregate is likely to perform horrifically badly.  Here might
be an option:

Step 1.  Find all of the duplicates...

select a.* into temp table sid from some_table a, some_table b where a.oid < b.oid and   a.field1 = b.field1 and
a.field2= b.field2 and   a.field3 = b.field3 and    ...   a.fieldn = b.fieldn;
 

Step 2.  Look for the matching entries in the source table...

select a.* from some_table a, sid b where   a.field1 = b.field1 and   a.field2 = b.field2 and   a.field3 = b.field3 and
  ...   a.fieldn = b.fieldn;
 

[There's a weakness here; if there are multiple dupes, they may get
picked multiple times in the second query :-(.]
-- 
If this was helpful, <http://svcs.affero.net/rm.php?r=cbbrowne> rate me
http://cbbrowne.com/info/rdbms.html
As Will Rogers would have said, "There is no such thing as a free
variable."  -- Alan Perlis


Re: selecting duplicate records

From
Guy Fraser
Date:
This is a test I did on one of my tables where I put duplicated entries :

select *
from acct_other_2003sepexcept select sep.*from ( select min(oid) as min_oid,  "Acct-Status-Type",  "User-Name",
"Realm", "Acct-Session-Time",  "Acct-Input-Octets",  "Acct-Output-Octets",  "Called-Station-Id",  "Calling-Station-Id",
"Acct-Terminate-Cause",  "Framed-IP-Address",  "Service-Type",  "Framed-Protocol",  "Client-IP-Address",
"NAS-IP-Address", "NAS-Port-Type",  "NAS-Port-Id",  "Acct-Session-Id",  "Acct-Link-Count",  "Acct-Multi-Session-Id"
fromacct_other_2003sep group by "Acct-Status-Type",  "User-Name",  "Realm",  "Acct-Session-Time",  "Acct-Input-Octets",
"Acct-Output-Octets",  "Called-Station-Id",  "Calling-Station-Id",  "Acct-Terminate-Cause",  "Framed-IP-Address",
"Service-Type", "Framed-Protocol",  "Client-IP-Address",  "NAS-IP-Address",  "NAS-Port-Type",  "NAS-Port-Id",
"Acct-Session-Id", "Acct-Link-Count",  "Acct-Multi-Session-Id" ) as min_sep,acct_other_2003sep as sep
 
where sep.oid = min_sep.min_oid
;
From the above example you can see how to use a subselect to get a 
unique list then using except, you can get the records that were not unique.

This may not be exactly what you want but it does implement some of the 
methods required to get around using using temporary tables.

For some tasks using temporary tables may be more suitable if your query 
becomes too complex and or you run out of memory/time.

Hope this helps.

Guy

Christopher Browne wrote:

>The world rejoiced as ch@rodos.fzk.de (Christoph Haller) wrote:
>  
>
>>>1. How to select duplicate records only from a single table using a
>>>      
>>>
>>select
>>    
>>
>>>query.
>>>
>>>      
>>>
>>e.g.
>>select sid,count(sid) from location group by sid having count(sid)>1;
>>
>>Do you get the idea?
>>Your request is pretty unspecific, so if this is not what you're asking
>>for,
>>try again.
>>    
>>
>
>The aggregate is likely to perform horrifically badly.  Here might
>be an option:
>
>Step 1.  Find all of the duplicates...
>
>select a.* into temp table sid from some_table a, some_table b
>  where a.oid < b.oid and
>    a.field1 = b.field1 and
>    a.field2 = b.field2 and
>    a.field3 = b.field3 and
>     ...
>    a.fieldn = b.fieldn;
>
>Step 2.  Look for the matching entries in the source table...
>
>select a.* from some_table a, sid b
>  where
>    a.field1 = b.field1 and
>    a.field2 = b.field2 and
>    a.field3 = b.field3 and
>     ...
>    a.fieldn = b.fieldn;
>
>[There's a weakness here; if there are multiple dupes, they may get
>picked multiple times in the second query :-(.]
>  
>

-- 
Guy Fraser
Network Administrator
The Internet Centre
780-450-6787 , 1-888-450-6787

There is a fine line between genius and lunacy, fear not, walk the
line with pride. Not all things will end up as you wanted, but you
will certainly discover things the meek and timid will miss out on.