Thread: Deleting Multiple Rows Based on Multiple Columns
Thanks to David J. I have a working script to locate multiple rows having the same values in three candidate columns. I used an enhanced version of this script to copy those duplicate (and triplicate) records to a clone of the original table. Now I would like to delete those duplicates from the original table in either of two ways, but my Google searches have not produced hits where the selection criteria for the DELETE has multiple columns. If it is possible to leave one row with specific values in the columns (loc_name, sample_date, param) and delete the additional ones, I'd like to learn how to do so. I know that I'll have use for these techniques with future data. Else, I'd like to delete all those rows with multiple copies. Then I'll manually remove the extra rows in emacs, and insert the remainder in the original table. The script I tried to to the latter is: DELETE FROM chemistry WHERE (SELECT lab_nbr, loc_name, sample_date, param, quant, units, qa_qc, easting, northing, remark FROM chemistry Natural Inner join ( SELECT loc_name, sample_date, param, Count(*) as duplicate_count FROM chemistry GROUP BY loc_name, sample_date, param) grouped WHERE duplicate_count > 1); But postgres wants a single column in the first SELECT. TIA, Rich
> If it is possible to leave one row with specific values in the columns > (loc_name, sample_date, param) and delete the additional ones, I'd like to > learn how to do so. I know that I'll have use for these techniques with > future data. > > Else, I'd like to delete all those rows with multiple copies. Then I'll > manually remove the extra rows in emacs, and insert the remainder in the > original table. It will be easiest to delete all and add back the single desired record later. The general form will be: DELETE FROM chemistry c WHERE EXISTS (SELECT 1 FROM chem_too ct WHERE c.field =ct.field AND c.field2=ct.field2 etc...) Not tested so minor syntax tweaks by be needed. Pretty sure docs cover this use case. You can also do: DELETE FROM chemistry USING chem_too WHERE chemistry.fields =AND chem_too.fields; The other way to group multiple columns in a single "row" column. E.g., WHERE (field1, field2, field3) =/IN (SELECT field1, field2, field3 FROM ...) Note the parenthesis grouping the three columns into a single unit for comparison. It is useful shorthand for the ( field1=field1AND field2=field2 etc... ) construct. You can readily use this form in the FROM/USING form's WHERE clause. In your chem_too table you can use the ROW_NUMBER window function over the desired key columns to number the duplicate rows(in a sub-query) and then, in the outer query, remove any rows that have a ROW_NUMBER > 1. Try to write the query yourselfand post your best effort if you get stumped. Using this query on the chem_too table you can select a single recordper key to insert back into the main table. You can also use ORDER BY in the WINDOW definition to sort on secondary(non-partition by) fields if desired. David J.