Thread: batch insert/update
I have some php code that will be pulling in a file via ftp. This file will contain 20,000+ records that I then need to pump into the postgres db. These records will represent a subset of the records in a certain table. I basically need an efficient way to pump these rows into the table, replacing matching rows (based on id) already there and inserting ones that aren't. Sort of looping through the result and inserting or updating based on the presents of the row, what is the best way to handle this? This is something that will run nightly.
Thanks!
Thanks!
blackwater dev <blackwaterdev@gmail.com> schrieb: > I have some php code that will be pulling in a file via ftp. This file will > contain 20,000+ records that I then need to pump into the postgres db. These > records will represent a subset of the records in a certain table. I basically > need an efficient way to pump these rows into the table, replacing matching > rows (based on id) already there and inserting ones that aren't. Sort of > looping through the result and inserting or updating based on the presents of > the row, what is the best way to handle this? This is something that will run > nightly. Insert you data to a extra table and work with regular SQL to insert/update the destination table. You can use COPY to insert the data into your extra table, this works very fast, but you need a suitable file format for this. Andreas -- Really, I'm not out to destroy Microsoft. That will just be a completely unintentional side effect. (Linus Torvalds) "If I was god, I would recompile penguin with --enable-fly." (unknow) Kaufbach, Saxony, Germany, Europe. N 51.05082°, E 13.56889°
On Wed, 26 Dec 2007 20:48:27 +0100 Andreas Kretschmer <akretschmer@spamfence.net> wrote: > blackwater dev <blackwaterdev@gmail.com> schrieb: > > > I have some php code that will be pulling in a file via ftp. > > This file will contain 20,000+ records that I then need to pump > > into the postgres db. These records will represent a subset of > > the records in a certain table. I basically need an efficient > > way to pump these rows into the table, replacing matching rows > > (based on id) already there and inserting ones that aren't. Sort > > of looping through the result and inserting or updating based on > > the presents of the row, what is the best way to handle this? > > This is something that will run nightly. > Insert you data to a extra table and work with regular SQL to > insert/update the destination table. You can use COPY to insert the > data into your extra table, this works very fast, but you need a > suitable file format for this. What if you know in advance what are the row that should be inserted and you've a batch of rows that should be updated? Is it still the fasted system to insert them all in a temp table with copy? What about the one that have to be updated if you've all the columns, not just the changed ones? Is it faster to delete & insert or to update? updates comes with the same pk as the destination table. thx -- Ivan Sergio Borgonovo http://www.webthatworks.it
On Wed, 26 Dec 2007 20:48:27 +0100 Andreas Kretschmer <akretschmer@spamfence.net> wrote: > blackwater dev <blackwaterdev@gmail.com> schrieb: > > > I have some php code that will be pulling in a file via ftp. > > This file will contain 20,000+ records that I then need to pump > > into the postgres db. These records will represent a subset of > > the records in a certain table. I basically need an efficient > > way to pump these rows into the table, replacing matching rows > > (based on id) already there and inserting ones that aren't. Sort > > of looping through the result and inserting or updating based on > > the presents of the row, what is the best way to handle this? > > This is something that will run nightly. > Insert you data to a extra table and work with regular SQL to > insert/update the destination table. You can use COPY to insert the > data into your extra table, this works very fast, but you need a > suitable file format for this. What if you know in advance what are the row that should be inserted and you've a batch of rows that should be updated? Is it still the fasted system to insert them all in a temp table with copy? What about the one that have to be updated if you've all the columns, not just the changed ones? Is it faster to delete & insert or to update? updates comes with the same pk as the destination table. thx -- Ivan Sergio Borgonovo http://www.webthatworks.it
Ivan Sergio Borgonovo wrote: > > What if you know in advance what are the row that should be inserted > and you've a batch of rows that should be updated? > > Is it still the fasted system to insert them all in a temp table with > copy? > > What about the one that have to be updated if you've all the columns, > not just the changed ones? > Is it faster to delete & insert or to update? > > updates comes with the same pk as the destination table. > > thx > We heard you the first time -- Tom Hart IT Specialist Cooperative Federal 723 Westcott St. Syracuse, NY 13210 (315) 471-1116 ext. 202 (315) 476-0567 (fax)
On Wed, 26 Dec 2007 20:48:27 +0100 Andreas Kretschmer <akretschmer@spamfence.net> wrote: > blackwater dev <blackwaterdev@gmail.com> schrieb: > > > I have some php code that will be pulling in a file via ftp. > > This file will contain 20,000+ records that I then need to pump > > into the postgres db. These records will represent a subset of > > the records in a certain table. I basically need an efficient > > way to pump these rows into the table, replacing matching rows > > (based on id) already there and inserting ones that aren't. Sort > > of looping through the result and inserting or updating based on > > the presents of the row, what is the best way to handle this? > > This is something that will run nightly. > Insert you data to a extra table and work with regular SQL to > insert/update the destination table. You can use COPY to insert the > data into your extra table, this works very fast, but you need a > suitable file format for this. What if you know in advance what are the row that should be inserted and you've a batch of rows that should be updated? Is it still the fasted system to insert them all in a temp table with copy? What about the one that have to be updated if you've all the columns, not just the changed ones? Is it faster to delete & insert or to update? updates comes with the same pk as the destination table. thx -- Ivan Sergio Borgonovo http://www.webthatworks.it ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend
I was also thinking about adding a 'is_new' column to the table which I would flag as 0, then do a basic copy of all the new rows in with is_new at 1. I'd then do a delete statement to delete all the rows which are duplicate and have a flag of 0 as the copy should leave me some with two rows, one with is_new of 1 and some with 0. Just don't know if this would be best.
On Dec 26, 2007 3:13 PM, Ivan Sergio Borgonovo <mail@webthatworks.it> wrote:
> blackwater dev < blackwaterdev@gmail.com> schrieb:
>
> > I have some php code that will be pulling in a file via ftp.
> > This file will contain 20,000+ records that I then need to pump
> > into the postgres db. These records will represent a subset of
> > the records in a certain table. I basically need an efficient
> > way to pump these rows into the table, replacing matching rows
> > (based on id) already there and inserting ones that aren't. Sort
> > of looping through the result and inserting or updating based on
> > the presents of the row, what is the best way to handle this?
> > This is something that will run nightly.
> Insert you data to a extra table and work with regular SQL to
> insert/update the destination table. You can use COPY to insert the
> data into your extra table, this works very fast, but you need a
> suitable file format for this.What if you know in advance what are the row that should be inserted
and you've a batch of rows that should be updated?
Is it still the fasted system to insert them all in a temp table with
copy?
What about the one that have to be updated if you've all the columns,
not just the changed ones?
Is it faster to delete & insert or to update?
updates comes with the same pk as the destination table.
thx-----------------------------(end of broadcast)---------------------------
Ivan Sergio Borgonovo
http://www.webthatworks.it
---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match