Thread: really quick multiple inserts can use COPY?
Hello! In our JAVA application we do multiple inserts to a table by data from a Hash Map. Due to poor database access implemention - done by another company (we got the job to enhance the software) - we cannot use prepared statements. (We are not allowed to change code at database access!) First, we tried to fire one INSERT statement per record to insert. This costs 3 ms per row which is to slow because normally we insert 10.000 records which results in 30.000 ms just for inserts. for(){ sql = "INSERT INTO tblfoo(foo,bar) VALUES("+it.next()+","+CONST.BAR+");"; } I was searching for an quicker way - MSSQL offers Array Inserts - at PostgreSQL. The only solution seem to be "INSERT INTO foo SELECT" and this is really dirty. I improved the inserts using the subselect with union. sql = "INSERT INTO tblfoo(foo,bar) "; for(){ sql += "SELECT "+it.next()+","+CONST.BAR+" UNION " ... } This results in a really long INSERT INTO SELECT UNION statement and works correct and quick but looks dirty. When I heard about COPY I thought this will be the right way. But it does not work using JDBC. Testing via psql does it perfect but sending the same SQL statements via JDBC throws an error. -> BEGIN sql = "COPY tblfoo(foo,bar) FROM STDIN;\n1 'foobar'\n2 'foobar'\n\\."; -> COMMIT ERROR: syntax error at or near "1" at character 34 So, my questions: Is it possible to use COPY FROM STDIN with JDBC? Will it bring performance improvement compared to SELECT UNION solution? many thanks in advance, Jens Schipkowski -- ** APUS Software GmbH
Jens Schipkowski <jens.schipkowski@apus.co.at> schrieb: > Hello! > > In our JAVA application we do multiple inserts to a table by data from a > Hash Map. Due to poor database access implemention - done by another > company (we got the job to enhance the software) - we cannot use prepared > statements. (We are not allowed to change code at database access!) > First, we tried to fire one INSERT statement per record to insert. This > costs 3 ms per row which is to slow because normally we insert 10.000 > records which results in 30.000 ms just for inserts. Can you change this from INSERT-Statements to COPY? Copy is *much* faster than INSERT. If no, do you using all INSERTs in one transaction? I believe, a 'BEGIN' and a 'COMMIT' around all INSERTs may increase the speed. Andreas -- Really, I'm not out to destroy Microsoft. That will just be a completely unintentional side effect. (Linus Torvalds) "If I was god, I would recompile penguin with --enable-fly." (unknow) Kaufbach, Saxony, Germany, Europe. N 51.05082°, E 13.56889°
> So, my questions: > Is it possible to use COPY FROM STDIN with JDBC? Should be. Its at least possible using DBI and DBD::Pg (perl) my $copy_sth = $dbh -> prepare( "COPY general.datamining_mailing_lists (query_id,email_key) FROM STDIN;") ; $copy_sth -> execute(); while (my ($email_key ) = $fetch_sth -> fetchrow_array ()) { $dbh -> func("$query_id\t$email_key\n", 'putline'); } $fetch_sth -> finish(); $dbh -> func("\\.\n", 'putline'); $dbh -> func('endcopy'); $copy_sth->finish(); Some JDBC expert would tell better how its done with JDBC. > Will it bring performance improvement compared to SELECT UNION solution? COPY is quite faast. Regds mallah. > > many thanks in advance, > Jens Schipkowski > > -- > ** > APUS Software GmbH > > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly >
"Jens Schipkowski" <jens.schipkowski@apus.co.at> writes: > Is it possible to use COPY FROM STDIN with JDBC? You should be asking the pgsql-jdbc list, not here. (I know I've seen mention of a JDBC patch to support COPY, but I dunno if it's made it into any official version.) > Will it bring performance improvement compared to SELECT UNION solution? Please, at least be smart enough to use UNION ALL not UNION. If you're using 8.2 you could also consider using INSERT with multiple VALUES-lists. regards, tom lane
"Jens Schipkowski" <jens.schipkowski 'at' apus.co.at> writes: > Hello! > > In our JAVA application we do multiple inserts to a table by data from > a Hash Map. Due to poor database access implemention - done by > another company (we got the job to enhance the software) - we cannot > use prepared statements. (We are not allowed to change code at > database access!) > First, we tried to fire one INSERT statement per record to insert. > This costs 3 ms per row which is to slow because normally we insert > 10.000 records which results in 30.000 ms just for inserts. > > for(){ > sql = "INSERT INTO tblfoo(foo,bar) VALUES("+it.next()+","+CONST.BAR+");"; > } You should try to wrap that into a single transaction. PostgreSQL waits for I/O write completion for each INSERT as it's implicitely in its own transaction. Maybe the added performance would be satisfactory for you. -- Guillaume Cottenceau Create your personal SMS or WAP Service - visit http://mobilefriends.ch/
On 12/11/06, Andreas Kretschmer <akretschmer@spamfence.net> wrote: > Jens Schipkowski <jens.schipkowski@apus.co.at> schrieb: > > > Hello! > > > > In our JAVA application we do multiple inserts to a table by data from a > > Hash Map. Due to poor database access implemention - done by another > > company (we got the job to enhance the software) - we cannot use prepared > > statements. (We are not allowed to change code at database access!) > > First, we tried to fire one INSERT statement per record to insert. This > > costs 3 ms per row which is to slow because normally we insert 10.000 > > records which results in 30.000 ms just for inserts. > > Can you change this from INSERT-Statements to COPY? Copy is *much* > faster than INSERT. > > If no, do you using all INSERTs in one transaction? I believe, a 'BEGIN' > and a 'COMMIT' around all INSERTs may increase the speed. Performance increment can also be gained by disabling constraints in the transaction. These disabled constraints are invoked at the end of the transaction according to the SQL standard, so no worries about data consistency. Hmmm... PG currently supports disabling foreign constraints only. But that can still be significant. --Imad www.EnterpriseDB.com
Thanks a lot to all for your tips. Of course, I am doing all the INSERTs using a transaction. So the cost per INSERT dropped from 30 ms to 3 ms. The improvement factor matches with the hint by Brian Hurt. Sorry, I forgot to mention we are using PostgreSQL 8.1.4. Thanks for the code snippet posted by mallah. It looks like you are using prepared statements, which are not available to us. But I will check our database access if its possible to do a workaround, because this looks clean and quick to me. regards Jens Schipkowski On Mon, 11 Dec 2006 17:53:52 +0100, Guillaume Cottenceau <gc@mnc.ch> wrote: > "Jens Schipkowski" <jens.schipkowski 'at' apus.co.at> writes: > >> Hello! >> >> In our JAVA application we do multiple inserts to a table by data from >> a Hash Map. Due to poor database access implemention - done by >> another company (we got the job to enhance the software) - we cannot >> use prepared statements. (We are not allowed to change code at >> database access!) >> First, we tried to fire one INSERT statement per record to insert. >> This costs 3 ms per row which is to slow because normally we insert >> 10.000 records which results in 30.000 ms just for inserts. >> >> for(){ >> sql = "INSERT INTO tblfoo(foo,bar) >> VALUES("+it.next()+","+CONST.BAR+");"; >> } > > You should try to wrap that into a single transaction. PostgreSQL > waits for I/O write completion for each INSERT as it's > implicitely in its own transaction. Maybe the added performance > would be satisfactory for you. > -- ** APUS Software GmbH
Jens Schipkowski wrote: > Thanks a lot to all for your tips. > > Of course, I am doing all the INSERTs using a transaction. So the cost > per INSERT dropped from 30 ms to 3 ms. > The improvement factor matches with the hint by Brian Hurt. > Sorry, I forgot to mention we are using PostgreSQL 8.1.4. > Thanks for the code snippet posted by mallah. It looks like you are > using prepared statements, which are not available to us. > But I will check our database access if its possible to do a > workaround, because this looks clean and quick to me. > > regards > Jens Schipkowski > > > On Mon, 11 Dec 2006 17:53:52 +0100, Guillaume Cottenceau <gc@mnc.ch> > wrote: > >> "Jens Schipkowski" <jens.schipkowski 'at' apus.co.at> writes: >> >>> Hello! >>> >>> In our JAVA application we do multiple inserts to a table by data from >>> a Hash Map. Due to poor database access implemention - done by >>> another company (we got the job to enhance the software) - we cannot >>> use prepared statements. (We are not allowed to change code at >>> database access!) >>> First, we tried to fire one INSERT statement per record to insert. >>> This costs 3 ms per row which is to slow because normally we insert >>> 10.000 records which results in 30.000 ms just for inserts. >>> >>> for(){ >>> sql = "INSERT INTO tblfoo(foo,bar) >>> VALUES("+it.next()+","+CONST.BAR+");"; >>> } >> >> You should try to wrap that into a single transaction. PostgreSQL >> waits for I/O write completion for each INSERT as it's >> implicitely in its own transaction. Maybe the added performance >> would be satisfactory for you. >> > > > > --** > APUS Software GmbH > > ---------------------------(end of broadcast)--------------------------- > TIP 7: You can help support the PostgreSQL project by donating at > > http://www.postgresql.org/about/donate > > > This link might be what you are looking for, it has some information about implementing COPY in the JDBC driver. Check the reply message as well. http://archives.postgresql.org/pgsql-jdbc/2005-04/msg00134.php Another solution might be to have Java dump the contents of the HashMap to a CVS file and have it load through psql with COPY commands. Good luck, Nick