Re: removing specific duplicates - Mailing list pgsql-novice
From | Ron Johnson |
---|---|
Subject | Re: removing specific duplicates |
Date | |
Msg-id | 1026754555.23763.4.camel@rebel Whole thread Raw |
In response to | Re: removing specific duplicates (Josh Jore <josh@greentechnologist.org>) |
List | pgsql-novice |
Instead of One Big Delete Statement, the other way to do it is in a loop in pg/sql. This is the way I like to do it because then I can put progress indicators to show how fast or slow the statements are running. On Mon, 2002-07-15 at 09:46, Josh Jore wrote: > Al, > First thing you should do is add a serial not null unique column to that > table. If you are going to need to do this sort of operation then you need > something by which to get a handle to a specific row. > > So this command will delete exactly one duplicate from your table. You > need to keep runnin it until it deletes zero rows. Is this idea? No. It's > one step above worst case. Unless you *like* this sort of pain, fix your > app to prevent this sort of duplication. You could have a trigger or rule > stop duplicate inserts. You could use a unique index. Whatever works for > you. Just do *something*. > > DELETE FROM temp WHERE oid IN > ( > SELECT temp.oid > FROM temp > JOIN > ( SELECT count(*), sample, spec, temp, measurement > FROM temp > GROUP BY sample, spec, temp, measurement ) dup > USING (sample, spec, temp, measurement) > WHERE dup.count > 1 > LIMIT 1; > > > So your next question. How to replace the existing unique data with a > single averaage. Requesting the average is easy: > > > sample spec temp voltage measurement > > 1 Tdsu 25 4.5 1.12e-9 > > 1 Tdsu 25 4.5 1.3e-9 > > SELECT sample, spec, temp, voltage, AVG(measurement) > FROM temp GROUP BY sample, spec, temp, voltage; > > So if you wanted to keep the average and throw away the detail data then > just delete the old row and insert the new row (which you saved prior to > deleting the data) > > > Joshua b. Jore ; http://www.greentechnologist.org > > On Fri, 12 Jul 2002, Al Arduengo wrote: > > > I have a table of characterization data from a test of a certain chip where > > I work. THe table consists of 5 columns: > > > > sample int > > spec text > > temp int > > voltage float > > measurement float > > > > Each sample chip (1-120) is tested at three temps and two voltages. The > > column of interest is measurement. The problem is that some of the samples > > were tested multiple times for one temp/voltage combination so I have > > entries such as: > > > > > > My requirement is to delete any duplicate rows (such as #2 in this case) > > and then take the average of #1 and #3, replace #1 with that average > > measurement and delete #3. I simply cannot figure out how to first get rid > > of *just* #2 and then somehow find the case of #1 and #3 existing and then > > do the average followed by removing #3. Using DISTINCT with SELECT will > > obviously spit out #1 and #3 but that doesn't get me anywhere. I am > > somewhat of a SQL novice so please have pity on me in your possible > > explanations. My ultimate goal is to do these modifications without having > > to go in and manually find these situations and then manually make the > > changes. I think it is possible with SQL but I can't figure out how. > > > > Thanks very much in advance. > > -Al Arduengo -- +-----------------------------------------------------------------+ | Ron Johnson, Jr. Home: ron.l.johnson@cox.net | | Jefferson, LA USA http://ronandheather.dhs.org:81 | | | | "Experience should teach us to be most on our guard to protect | | liberty when the government's purposes are beneficent. Men | | born to freedom are naturally alert to repel invasion of their | | liberty by evil minded rulers. The greatest dangers to liberty | | lurk in insidious encroachment by men of zeal, well-meaning | | but without understanding." | | Justice Louis Brandeis, dissenting, Olmstead v US (1928) | +-----------------------------------------------------------------+
pgsql-novice by date: