branching off from
http://archives.postgresql.org/pgsql-jdbc/2008-09/msg00074.php and joining this thread
Dear Kris,
thanks for the
link in regards to COPY versus JDBC batch. From the numbers in that email, I am tempted to assume that, if your tables have indices, you would expect something like
1x speed of
COPY =
6x speed of
JDBC batch =
36x speed of
single-statement JDBC
Comparison table (records inserted per millisecond)
COPY JDBC JDBC batch
WITHOUT INDEXES: 198 1.5 14
WITH 2 INDEXES: 45 1.5 10
In the last few weeks, I have been trying to improve a bulk update process which involves arbitrary SQL statements, so a COPY command has not been an option. I cannot really compare what I've been doing to the performance of a COPY command, cause the specs differ, but I can compare pgBee's performance to psql's processing text files of SQL statements.
For the same input file, psql took 112m, whereas pgBee finished in 21m. Both were run on the same database server (localhost), pgBee was batching groups of 100 statements at a time and a real data file was used, with 1131753 SQL statements in total (511335 DELETEs and 567577 INSERTs). pgBee seems to be doing 898 operations/millisecond, or ~450 INSERTSs/millisecond ! Is that possible? psql seems to manage 168 operations/millisecond - half of those are INSERTs.
I suppose I am sort of doing what a JDBC COPY command would do (but with arbitrary SQL statements): pgBee handles the parsing of files, takes care of encoding issues, escapes some characters and find and tries to execute all statements in the most efficient way. Somehow, reading lines from a text file into a String buffer, splitting the statements up and executing them as a batch gives you a real performance boost. Does psql implicitly handle each statetement as a separate transaction? This could account for pgBee's better performance.
Michael