Thread: batch insert/update

batch insert/update

From
"blackwater dev"
Date:
I have some php code that will be pulling in a file via ftp.  This file will contain 20,000+ records that I then need to pump into the postgres db.  These records will represent a subset of the records in a certain table.  I basically need an efficient way to pump these rows into the table, replacing matching rows (based on id) already there and inserting ones that aren't.  Sort of looping through the result and inserting or updating based on the presents of the row, what is the best way to handle this?  This is something that will run nightly.


Thanks!

Re: batch insert/update

From
Andreas Kretschmer
Date:
blackwater dev <blackwaterdev@gmail.com> schrieb:

> I have some php code that will be pulling in a file via ftp.  This file will
> contain 20,000+ records that I then need to pump into the postgres db.  These
> records will represent a subset of the records in a certain table.  I basically
> need an efficient way to pump these rows into the table, replacing matching
> rows (based on id) already there and inserting ones that aren't.  Sort of
> looping through the result and inserting or updating based on the presents of
> the row, what is the best way to handle this?  This is something that will run
> nightly.

Insert you data to a extra table and work with regular SQL to
insert/update the destination table. You can use COPY to insert the data
into your extra table, this works very fast, but you need a suitable
file format for this.


Andreas
--
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect.                              (Linus Torvalds)
"If I was god, I would recompile penguin with --enable-fly."    (unknow)
Kaufbach, Saxony, Germany, Europe.              N 51.05082°, E 13.56889°

Re: batch insert/update

From
Ivan Sergio Borgonovo
Date:
On Wed, 26 Dec 2007 20:48:27 +0100
Andreas Kretschmer <akretschmer@spamfence.net> wrote:

> blackwater dev <blackwaterdev@gmail.com> schrieb:
>
> > I have some php code that will be pulling in a file via ftp.
> > This file will contain 20,000+ records that I then need to pump
> > into the postgres db.  These records will represent a subset of
> > the records in a certain table.  I basically need an efficient
> > way to pump these rows into the table, replacing matching rows
> > (based on id) already there and inserting ones that aren't.  Sort
> > of looping through the result and inserting or updating based on
> > the presents of the row, what is the best way to handle this?
> > This is something that will run nightly.

> Insert you data to a extra table and work with regular SQL to
> insert/update the destination table. You can use COPY to insert the
> data into your extra table, this works very fast, but you need a
> suitable file format for this.

What if you know in advance what are the row that should be inserted
and you've a batch of rows that should be updated?

Is it still the fasted system to insert them all in a temp table with
copy?

What about the one that have to be updated if you've all the columns,
not just the changed ones?
Is it faster to delete & insert or to update?

updates comes with the same pk as the destination table.

thx

--
Ivan Sergio Borgonovo
http://www.webthatworks.it


Re: batch insert/update

From
Ivan Sergio Borgonovo
Date:
On Wed, 26 Dec 2007 20:48:27 +0100
Andreas Kretschmer <akretschmer@spamfence.net> wrote:

> blackwater dev <blackwaterdev@gmail.com> schrieb:
>
> > I have some php code that will be pulling in a file via ftp.
> > This file will contain 20,000+ records that I then need to pump
> > into the postgres db.  These records will represent a subset of
> > the records in a certain table.  I basically need an efficient
> > way to pump these rows into the table, replacing matching rows
> > (based on id) already there and inserting ones that aren't.  Sort
> > of looping through the result and inserting or updating based on
> > the presents of the row, what is the best way to handle this?
> > This is something that will run nightly.

> Insert you data to a extra table and work with regular SQL to
> insert/update the destination table. You can use COPY to insert the
> data into your extra table, this works very fast, but you need a
> suitable file format for this.

What if you know in advance what are the row that should be inserted
and you've a batch of rows that should be updated?

Is it still the fasted system to insert them all in a temp table with
copy?

What about the one that have to be updated if you've all the columns,
not just the changed ones?
Is it faster to delete & insert or to update?

updates comes with the same pk as the destination table.

thx

--
Ivan Sergio Borgonovo
http://www.webthatworks.it


Re: batch insert/update

From
Thomas Hart
Date:
Ivan Sergio Borgonovo wrote:
>
> What if you know in advance what are the row that should be inserted
> and you've a batch of rows that should be updated?
>
> Is it still the fasted system to insert them all in a temp table with
> copy?
>
> What about the one that have to be updated if you've all the columns,
> not just the changed ones?
> Is it faster to delete & insert or to update?
>
> updates comes with the same pk as the destination table.
>
> thx
>
We heard you the first time

--
Tom Hart
IT Specialist
Cooperative Federal
723 Westcott St.
Syracuse, NY 13210
(315) 471-1116 ext. 202
(315) 476-0567 (fax)


[TLM] Re: batch insert/update

From
Ivan Sergio Borgonovo
Date:
On Wed, 26 Dec 2007 20:48:27 +0100
Andreas Kretschmer <akretschmer@spamfence.net> wrote:

> blackwater dev <blackwaterdev@gmail.com> schrieb:
>
> > I have some php code that will be pulling in a file via ftp.
> > This file will contain 20,000+ records that I then need to pump
> > into the postgres db.  These records will represent a subset of
> > the records in a certain table.  I basically need an efficient
> > way to pump these rows into the table, replacing matching rows
> > (based on id) already there and inserting ones that aren't.  Sort
> > of looping through the result and inserting or updating based on
> > the presents of the row, what is the best way to handle this?
> > This is something that will run nightly.

> Insert you data to a extra table and work with regular SQL to
> insert/update the destination table. You can use COPY to insert the
> data into your extra table, this works very fast, but you need a
> suitable file format for this.

What if you know in advance what are the row that should be inserted
and you've a batch of rows that should be updated?

Is it still the fasted system to insert them all in a temp table with
copy?

What about the one that have to be updated if you've all the columns,
not just the changed ones?
Is it faster to delete & insert or to update?

updates comes with the same pk as the destination table.

thx

--
Ivan Sergio Borgonovo
http://www.webthatworks.it


---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Re: [TLM] Re: batch insert/update

From
"blackwater dev"
Date:
I was also thinking about adding a 'is_new' column to the table which I would flag as 0, then do a basic copy of all the new rows in with is_new at 1.  I'd then do a delete statement to delete all the rows which are duplicate and have a flag of 0 as the copy should leave me some with two rows, one with is_new of 1 and some with 0.  Just don't know if this would be best.

On Dec 26, 2007 3:13 PM, Ivan Sergio Borgonovo <mail@webthatworks.it> wrote:
On Wed, 26 Dec 2007 20:48:27 +0100
Andreas Kretschmer <akretschmer@spamfence.net> wrote:

> blackwater dev < blackwaterdev@gmail.com> schrieb:
>
> > I have some php code that will be pulling in a file via ftp.
> > This file will contain 20,000+ records that I then need to pump
> > into the postgres db.  These records will represent a subset of
> > the records in a certain table.  I basically need an efficient
> > way to pump these rows into the table, replacing matching rows
> > (based on id) already there and inserting ones that aren't.  Sort
> > of looping through the result and inserting or updating based on
> > the presents of the row, what is the best way to handle this?
> > This is something that will run nightly.

> Insert you data to a extra table and work with regular SQL to
> insert/update the destination table. You can use COPY to insert the
> data into your extra table, this works very fast, but you need a
> suitable file format for this.

What if you know in advance what are the row that should be inserted
and you've a batch of rows that should be updated?

Is it still the fasted system to insert them all in a temp table with
copy?

What about the one that have to be updated if you've all the columns,
not just the changed ones?
Is it faster to delete & insert or to update?

updates comes with the same pk as the destination table.

thx

--
Ivan Sergio Borgonovo
http://www.webthatworks.it


---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
      choose an index scan if your joining column's datatypes do not
      match