Thread: inserting only new rows from csv file

inserting only new rows from csv file

From
Seb
Date:
Hi,

I've seen some approaches to this problem, but I suspect there are
better ways.

A csv file is produced nightly, with the same table structure as a table
in a PostgreSQL database.  The file grows over time as new records are
added every day.  However, the rows in the file may not be sorted the
same way every day.

I need to isolate the new rows and insert them into the database table.
Before I start writing a script to do this, I need to decide whether to
do the comparison between the database table and the csv file: a) within
the database, or b) COPY'ing the table to a file and then do the
comparison.  I'm not very experienced with SQL, so am more comfortable
with the latter option using plain shell and awk.  Which route is best?
If a), I'd appreciate some SQL code to do it.  Thanks.


Cheers,

--
Seb

Re: inserting only new rows from csv file

From
"Scott Marlowe"
Date:
On Wed, Oct 1, 2008 at 3:16 PM, Seb <spluque@gmail.com> wrote:
> Hi,
>
> I've seen some approaches to this problem, but I suspect there are
> better ways.
>
> A csv file is produced nightly, with the same table structure as a table
> in a PostgreSQL database.  The file grows over time as new records are
> added every day.  However, the rows in the file may not be sorted the
> same way every day.
>
> I need to isolate the new rows and insert them into the database table.
> Before I start writing a script to do this, I need to decide whether to
> do the comparison between the database table and the csv file: a) within
> the database, or b) COPY'ing the table to a file and then do the
> comparison.  I'm not very experienced with SQL, so am more comfortable
> with the latter option using plain shell and awk.  Which route is best?
> If a), I'd appreciate some SQL code to do it.  Thanks.

I'd load them into a staging table, and use a query (either a
subselect or a left join where null type query) to load them into the
master table.

Re: inserting only new rows from csv file

From
"Harvey, Allan AC"
Date:

> -----Original Message-----
> From: pgsql-general-owner@postgresql.org
> [mailto:pgsql-general-owner@postgresql.org]On Behalf Of Scott Marlowe
> Sent: Thursday, 2 October 2008 7:39 AM
> To: Seb
> Cc: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] inserting only new rows from csv file
>
>
> On Wed, Oct 1, 2008 at 3:16 PM, Seb <spluque@gmail.com> wrote:
> > Hi,
> >
> > I've seen some approaches to this problem, but I suspect there are
> > better ways.
> >
> > A csv file is produced nightly, with the same table
> structure as a table
> > in a PostgreSQL database.  The file grows over time as new
> records are
> > added every day.  However, the rows in the file may not be
> sorted the
> > same way every day.
> >
> > I need to isolate the new rows and insert them into the
> database table.
> > Before I start writing a script to do this, I need to
> decide whether to
> > do the comparison between the database table and the csv
> file: a) within
> > the database, or b) COPY'ing the table to a file and then do the
> > comparison.  I'm not very experienced with SQL, so am more
> comfortable
> > with the latter option using plain shell and awk.  Which
> route is best?
> > If a), I'd appreciate some SQL code to do it.  Thanks.
>
> I'd load them into a staging table, and use a query (either a
> subselect or a left join where null type query) to load them into the
> master table.
>
I solved the same problem this way.
csv file processed by gawk script to produce SQL script with inserts etc. Then pass
the SQL file to psql.

This is what I have in my BEGIN line of my gawk script leaving out appropriate print statements for clarity.

CREATE TEMPORARY TABLE tempjobdetails( LIKE jobdetails INCLUDING DEFAULTS );

--Do load from csv into tempjobdetails

This is in my END line of the gawk scrip ditto the print statements.

insert into jobdetails
   ( select * from tempjobdetails
     where
         ( tempjobdetails.jobnum, tempjobdetails.opnum )
         not in
         ( select
               jobdetails.jobnum, jobdetails.opnum from jobdetails
          )
    );

Hope this helps.

Allan


The material contained in this email may be confidential, privileged or copyrighted. If you are not the intended
recipient,use, disclosure or copying of this information is prohibited. If you have received this document in error,
pleaseadvise the sender and delete the document. Neither OneSteel nor the sender accept responsibility for any viruses
containedin this email or any attachments.