Home > mailing lists

Re: Finding Duplicate Rows during INSERTs - Mailing list pgsql-general

From	Darren Duncan
Subject	Re: Finding Duplicate Rows during INSERTs
Date	July 9, 2012 23:54:44
Msg-id	4FFB995F.5080007@darrenduncan.net Whole thread
In response to	Finding Duplicate Rows during INSERTs (Rich Shepard <rshepard@appl-ecosys.com>)
List	pgsql-general

Tree view

Rich Shepard wrote:
>   Source data has duplicates. I have a file that creates the table then
> INSERTS INTO the table all the rows. When I see errors flash by during the
> 'psql -d <database> -f <file.sql>' I try to scroll back in the terminal to
> see where the duplicate rows are located. Too often they are too far
> back to
> let me scroll to see them.
>
>   There must be a better way of doing this. Can I run psql with the tee
> command to capture errors in a file I can examine? What is the proper/most
> efficient way to identify the duplicates so they can be removed?
>
> TIA,
>
> Rich

What I recommend is instead inserting your data into staging tables which lack
key constraints, and then you can use SQL to then either locate duplicates or
just copy the unique rows to the normal tables.  I mean, ostensibly SQL is a
better tool for cleaning data than anything else right, usually, or reporting.
-- Darren Duncan

pgsql-general by date:

From: Rich Shepard
Date: 09 July 2012, 22:03:40
Subject: Re: Finding Duplicate Rows during INSERTs

From: Stefan Schwarzer
Date: 10 July 2012, 02:32:24
Subject: Re: ERROR: function crosstab(unknown, unknown) does not exist

Re: Finding Duplicate Rows during INSERTs - Mailing list pgsql-general

Previous

Next