Re: Recommended Protocol: Adding Rows to Table - Mailing list pgsql-general

From David Johnston
Subject Re: Recommended Protocol: Adding Rows to Table
Date
Msg-id 9FD7A75C-1ABE-4143-9E07-3B872693CDD8@yahoo.com
Whole thread Raw
In response to Recommended Protocol: Adding Rows to Table  (Rich Shepard <rshepard@appl-ecosys.com>)
Responses Re: Recommended Protocol: Adding Rows to Table  (Rich Shepard <rshepard@appl-ecosys.com>)
List pgsql-general
On Nov 25, 2011, at 11:05, Rich Shepard <rshepard@appl-ecosys.com> wrote:

>  The data originated in a spreadsheet and, based on my experience, contains
> duplicate records. After reformatting there are 143,260 rows to insert in
> the table. The approach I tried seems to have problems (explained below) and
> I would like to learn the proper way to insert rows in either an empty table
> or one with existing rows since I'll need to do this procedure for my
> projects.
>
>  The table was created with the primary key and I used INSERT INTO ... to
> load the data. Many duplicate records, so I split the file into smaller ones
> and re-ran the command to load them. I'd then remove the reported duplicate
> rows from the source (text) file. Between runs, I'd issue the DROP TABLE
> command within psql and check it was empty using 'select count(*) from
> waterchem;'.
>
>  It appeared that I removed duplicates from the first couple of smaller
> files so I combined them into one file named ok.sql. But, when I tested the
> combined file it, too, reported many duplicate records. Something wrong
> here.
>
>  Perhaps a better approach is to put the CREATE TABLE command above the
> INSERT INTO section of the file (without specifying a PK), load that using
> the command 'psql -d <database> -f waterchem.sql', then add the PK and
> remove duplicates as postgres presents them. Or, perhaps there is a much
> more efficient way to accomplish this task.
>
>  Rather than my flailing around and spending a lot of time failing to load
> all non-duplicate rows into the table I'd like to learn the proper way to
> accomplish this task. Suggestions, recommendations, and your experiences are
> requested.
>
> TIA,
>
> Rich
>
>

Simplistically you load all the data into a staging table that has no natural primary key and then write a query that
willresult in only a single record for whatever you define as a primary key.  Insert the results of that query into the
finaltable. 

If you only use a subset of columns to uniquely determine the PK than you need to decide how to resolve situations
wherethe PK fields are duplicated but the extra fields are unique.  That step is why the first description is
simplistic.

David J.

pgsql-general by date:

Previous
From: Rich Shepard
Date:
Subject: Recommended Protocol: Adding Rows to Table
Next
From: Tom Lane
Date:
Subject: Re: error when defining a search configuration named "default"