Thread: Deleting Multiple Rows Based on Multiple Columns

Deleting Multiple Rows Based on Multiple Columns

From
Rich Shepard
Date:
   Thanks to David J. I have a working script to locate multiple rows having
the same values in three candidate columns. I used an enhanced version of
this script to copy those duplicate (and triplicate) records to a clone of
the original table.

   Now I would like to delete those duplicates from the original table in
either of two ways, but my Google searches have not produced hits where the
selection criteria for the DELETE has multiple columns.

   If it is possible to leave one row with specific values in the columns
(loc_name, sample_date, param) and delete the additional ones, I'd like to
learn how to do so. I know that I'll have use for these techniques with
future data.

   Else, I'd like to delete all those rows with multiple copies. Then I'll
manually remove the extra rows in emacs, and insert the remainder in the
original table.

   The script I tried to to the latter is:

DELETE FROM chemistry
   WHERE
   (SELECT lab_nbr, loc_name, sample_date, param, quant, units, qa_qc,
        easting, northing, remark
        FROM chemistry
        Natural Inner join (
        SELECT loc_name, sample_date, param, Count(*) as duplicate_count
          FROM chemistry
          GROUP BY loc_name, sample_date, param) grouped
        WHERE duplicate_count > 1);

   But postgres wants a single column in the first SELECT.

TIA,

Rich

Re: Deleting Multiple Rows Based on Multiple Columns

From
David Johnston
Date:
>  If it is possible to leave one row with specific values in the columns
> (loc_name, sample_date, param) and delete the additional ones, I'd like to
> learn how to do so. I know that I'll have use for these techniques with
> future data.
>
>  Else, I'd like to delete all those rows with multiple copies. Then I'll
> manually remove the extra rows in emacs, and insert the remainder in the
> original table.

It will be easiest to delete all and add back the single desired record later.

The general form will be:

DELETE FROM chemistry c WHERE EXISTS (SELECT 1 FROM chem_too ct WHERE c.field =ct.field AND c.field2=ct.field2 etc...)

Not tested so minor syntax tweaks by be needed.  Pretty sure docs cover this use case.  You can also do:

DELETE FROM chemistry
USING chem_too
WHERE chemistry.fields =AND chem_too.fields;

The other way to group multiple columns in a single "row" column.

E.g.,  WHERE (field1, field2, field3) =/IN (SELECT field1, field2, field3 FROM ...)

Note the parenthesis grouping the three columns into a single unit for comparison.  It is useful shorthand for the (
field1=field1AND field2=field2 etc... ) construct.  You can readily use this form in the FROM/USING form's WHERE
clause.

In your chem_too table you can use the ROW_NUMBER window function over the desired key columns to number the duplicate
rows(in a sub-query) and then, in the outer query, remove any rows that have a ROW_NUMBER > 1.  Try to write the query
yourselfand post your best effort if you get stumped.  Using this query on the chem_too table you can select a single
recordper key to insert back into the main table.  You can also use ORDER BY in the WINDOW definition to sort on
secondary(non-partition by) fields if desired. 

David J.