Re: removing specific duplicates - Mailing list pgsql-novice

From Ron Johnson
Subject Re: removing specific duplicates
Date
Msg-id 1026754555.23763.4.camel@rebel
Whole thread Raw
In response to Re: removing specific duplicates  (Josh Jore <josh@greentechnologist.org>)
List pgsql-novice
Instead of One Big Delete Statement, the other way to do it
is in a loop in pg/sql.  This is the way I like to do it
because then I can put progress indicators to show how fast
or slow the statements are running.

On Mon, 2002-07-15 at 09:46, Josh Jore wrote:
> Al,
> First thing you should do is add a serial not null unique column to that
> table. If you are going to need to do this sort of operation then you need
> something by which to get a handle to a specific row.
>
> So this command will delete exactly one duplicate from your table. You
> need to keep runnin it until it deletes zero rows. Is this idea? No. It's
> one step above worst case. Unless you *like* this sort of pain, fix your
> app to prevent this sort of  duplication. You could have a trigger or rule
> stop duplicate inserts. You could use a unique index. Whatever works for
> you. Just do *something*.
>
> DELETE FROM temp WHERE oid IN
>     (
>     SELECT    temp.oid
>     FROM    temp
>     JOIN
>     ( SELECT count(*), sample, spec, temp, measurement
>       FROM temp
>       GROUP BY sample, spec, temp, measurement ) dup
>     USING (sample, spec, temp, measurement)
>     WHERE dup.count > 1
>     LIMIT 1;
>
>
> So your next question. How to replace the existing unique data with a
> single averaage. Requesting the average is easy:
>
> > sample  spec  temp  voltage  measurement
> > 1           Tdsu  25      4.5       1.12e-9
> > 1           Tdsu  25      4.5       1.3e-9
>
> SELECT sample, spec, temp, voltage, AVG(measurement)
> FROM temp GROUP BY sample, spec, temp, voltage;
>
> So if you wanted to keep the average and throw away the detail data then
> just delete the old row and insert the new row (which you saved prior to
> deleting the data)
>
>
> Joshua b. Jore ; http://www.greentechnologist.org
>
> On Fri, 12 Jul 2002, Al Arduengo wrote:
>
> > I have a table of characterization data from a test of a certain chip where
> > I work. THe table consists of 5 columns:
> >
> > sample int
> > spec text
> > temp int
> > voltage float
> > measurement float
> >
> > Each sample chip (1-120) is tested at three temps and two voltages. The
> > column of interest is measurement. The problem is that some of the samples
> > were tested multiple times for one temp/voltage combination so I have
> > entries such as:
> >
> >
> > My requirement is to delete any duplicate rows (such as #2 in this case)
> > and then take the average of #1 and #3, replace #1 with that average
> > measurement and delete #3. I simply cannot figure out how to first get rid
> > of *just* #2 and then somehow find the case of #1 and #3 existing and then
> > do the average followed by removing #3. Using DISTINCT with SELECT will
> > obviously spit out #1 and #3 but that doesn't get me anywhere. I am
> > somewhat of a SQL novice so please have pity on me in your possible
> > explanations. My ultimate goal is to do these modifications without having
> > to go in and manually find these situations and then manually make the
> > changes. I think it is possible with SQL but I can't figure out how.
> >
> > Thanks very much in advance.
> > -Al Arduengo

--
+-----------------------------------------------------------------+
| Ron Johnson, Jr.        Home: ron.l.johnson@cox.net             |
| Jefferson, LA  USA      http://ronandheather.dhs.org:81         |
|                                                                 |
| "Experience should teach us to be most on our guard to protect  |
|  liberty when the government's purposes are beneficent. Men     |
|  born to freedom are naturally alert to repel invasion of their |
|  liberty by evil minded rulers. The greatest dangers to liberty |
|  lurk in insidious encroachment by men of zeal, well-meaning    |
|  but without understanding."                                    |
|   Justice Louis Brandeis, dissenting, Olmstead v US (1928)      |
+-----------------------------------------------------------------+


pgsql-novice by date:

Previous
From: "Hillensbeck, Preston"
Date:
Subject: Re: POSTGRESQL for WINDOWS NT4
Next
From: "Chad Thompson"
Date:
Subject: Seeding