Thanks to David J. I have a working script to locate multiple rows having
the same values in three candidate columns. I used an enhanced version of
this script to copy those duplicate (and triplicate) records to a clone of
the original table.
Now I would like to delete those duplicates from the original table in
either of two ways, but my Google searches have not produced hits where the
selection criteria for the DELETE has multiple columns.
If it is possible to leave one row with specific values in the columns
(loc_name, sample_date, param) and delete the additional ones, I'd like to
learn how to do so. I know that I'll have use for these techniques with
future data.
Else, I'd like to delete all those rows with multiple copies. Then I'll
manually remove the extra rows in emacs, and insert the remainder in the
original table.
The script I tried to to the latter is:
DELETE FROM chemistry
WHERE
(SELECT lab_nbr, loc_name, sample_date, param, quant, units, qa_qc,
easting, northing, remark
FROM chemistry
Natural Inner join (
SELECT loc_name, sample_date, param, Count(*) as duplicate_count
FROM chemistry
GROUP BY loc_name, sample_date, param) grouped
WHERE duplicate_count > 1);
But postgres wants a single column in the first SELECT.
TIA,
Rich