Home > mailing lists

Re: selecting duplicate records - Mailing list pgsql-sql

From	Guy Fraser
Subject	Re: selecting duplicate records
Date	September 23, 2003 17:34:45
Msg-id	3F70AE58.7030308@incentre.net Whole thread Raw
In response to	Re: selecting duplicate records (Christopher Browne <cbbrowne@acm.org>)
List	pgsql-sql

Tree view

This is a test I did on one of my tables where I put duplicated entries :

select *
from acct_other_2003sepexcept select sep.*from ( select min(oid) as min_oid,  "Acct-Status-Type",  "User-Name",
"Realm", "Acct-Session-Time",  "Acct-Input-Octets",  "Acct-Output-Octets",  "Called-Station-Id",  "Calling-Station-Id",
"Acct-Terminate-Cause",  "Framed-IP-Address",  "Service-Type",  "Framed-Protocol",  "Client-IP-Address",
"NAS-IP-Address", "NAS-Port-Type",  "NAS-Port-Id",  "Acct-Session-Id",  "Acct-Link-Count",  "Acct-Multi-Session-Id"
fromacct_other_2003sep group by "Acct-Status-Type",  "User-Name",  "Realm",  "Acct-Session-Time",  "Acct-Input-Octets",
"Acct-Output-Octets",  "Called-Station-Id",  "Calling-Station-Id",  "Acct-Terminate-Cause",  "Framed-IP-Address",
"Service-Type", "Framed-Protocol",  "Client-IP-Address",  "NAS-IP-Address",  "NAS-Port-Type",  "NAS-Port-Id",
"Acct-Session-Id", "Acct-Link-Count",  "Acct-Multi-Session-Id" ) as min_sep,acct_other_2003sep as sep
 
where sep.oid = min_sep.min_oid
;
From the above example you can see how to use a subselect to get a 
unique list then using except, you can get the records that were not unique.

This may not be exactly what you want but it does implement some of the 
methods required to get around using using temporary tables.

For some tasks using temporary tables may be more suitable if your query 
becomes too complex and or you run out of memory/time.

Hope this helps.

Guy

Christopher Browne wrote:

>The world rejoiced as ch@rodos.fzk.de (Christoph Haller) wrote:
>  
>
>>>1. How to select duplicate records only from a single table using a
>>>      
>>>
>>select
>>    
>>
>>>query.
>>>
>>>      
>>>
>>e.g.
>>select sid,count(sid) from location group by sid having count(sid)>1;
>>
>>Do you get the idea?
>>Your request is pretty unspecific, so if this is not what you're asking
>>for,
>>try again.
>>    
>>
>
>The aggregate is likely to perform horrifically badly.  Here might
>be an option:
>
>Step 1.  Find all of the duplicates...
>
>select a.* into temp table sid from some_table a, some_table b
>  where a.oid < b.oid and
>    a.field1 = b.field1 and
>    a.field2 = b.field2 and
>    a.field3 = b.field3 and
>     ...
>    a.fieldn = b.fieldn;
>
>Step 2.  Look for the matching entries in the source table...
>
>select a.* from some_table a, sid b
>  where
>    a.field1 = b.field1 and
>    a.field2 = b.field2 and
>    a.field3 = b.field3 and
>     ...
>    a.fieldn = b.fieldn;
>
>[There's a weakness here; if there are multiple dupes, they may get
>picked multiple times in the second query :-(.]
>  
>

-- 
Guy Fraser
Network Administrator
The Internet Centre
780-450-6787 , 1-888-450-6787

There is a fine line between genius and lunacy, fear not, walk the
line with pride. Not all things will end up as you wanted, but you
will certainly discover things the meek and timid will miss out on.

pgsql-sql by date:

From: Tom Lane
Date: 23 September 2003, 14:18:44
Subject: Re: pl/pgsql, cursors and C function

From: "Kumar"
Date: 24 September 2003, 05:45:53
Subject: Compatible datatype of image in SQL server

Re: selecting duplicate records - Mailing list pgsql-sql

Previous

Next