Thread: Performance comparison to psql.
We compared insert & select operations between jdbc (postgres 8.2, postgresql-8.2-506.jdbc3.jar) and psql. Here are some numbers: PSQL *COPY* from STDIN recordsCount = 10 000 000 WITHOUT INDEXES: time = 50544 ms (*0.005 ms* per record) WITH 2 INDEXES: time = 221491 ms (*0.022 ms* per record) *JDBC (prepared statement without batch)* recordsCount = 100 000 WITHOUT INDEXES: time = 64874 ms (*0.649 **ms* per record) WITH 2 INDEXES: time = 63057 ms (*0.630 ms* per record) *JDBC (**prepared statement with batch**)* recordsCount = 1 000 000 WITHOUT INDEXES: time = 73205 ms (*0.073** ms* per record) WITH 2 INDEXES: time = 100270 ms (*0.100 ms* per record) Comparison table (records inserted per millisecond) COPY JDBC JDBC batch WITHOUT INDEXES: 198 1.5 14 WITH 2 INDEXES: 45 1.5 10 As for select/queries psql was about 3 times faster. Both psql and jdbc operations were done remotely from the same machine. I understand that JDBC has some overhead (object translation,..) but didn't think the difference would be that big. Do this numbers look correct (any optimization suggestion?) Any performance improvement in postgresql-8.2-507.jdbc4.jar? Is the copy operation much more optimized than inserts (and if so when/will the driver support it)? Thanks, Arie.
On Tue, 5 Feb 2008, Arie Ozarov wrote: > I understand that JDBC has some overhead (object translation,..) but didn't > think the difference would be that big. Do this numbers look correct (any > optimization suggestion?) The real cost is the protocol level overhead of INSERT vs COPY. JDBC batch execution groups things together to reduce the number of network round trips, but it still has to send each insert as an individual request to the server. > Any performance improvement in postgresql-8.2-507.jdbc4.jar? > No. > Is the copy operation much more optimized than inserts (and if so when/will > the driver support it)? > Yes, copy is significantly faster than insert. If you'd like, construct a psql test case that does 100,000 individual inserts and you'll see it's not just a JDBC driver/libpq difference. Copy support is available using this patched driver, but it has not been integrated into the official version yet. http://kato.iki.fi/sw/db/postgresql/jdbc/copy/ Kris Jurka
Kris Jurka <books@ejurka.com> writes: > On Tue, 5 Feb 2008, Arie Ozarov wrote: >> I understand that JDBC has some overhead (object translation,..) but didn't >> think the difference would be that big. Do this numbers look correct (any >> optimization suggestion?) > The real cost is the protocol level overhead of INSERT vs COPY. Also, if you were inserting only one row per INSERT command, there's a significant statement startup/shutdown overhead in the server, even for a prepared statement. I don't see any reason to think that these numbers are JDBC's fault --- it's just a fact of life that COPY is a lot more efficient than a series of INSERTs. (If it were not, we'd hardly even bother having it.) regards, tom lane
Would it be better than to group inserts using standard statement (not prepared - or does it really matter) this way: insert into T values (set1), (set2),..,(setN); ? Any reason for a select statement to be 3 times slower? When is it planned to include the copy support in the official version? Thanks! Arie. On 2/5/08 2:31 PM, "Kris Jurka" <books@ejurka.com> wrote: > > > On Tue, 5 Feb 2008, Arie Ozarov wrote: > >> I understand that JDBC has some overhead (object translation,..) but didn't >> think the difference would be that big. Do this numbers look correct (any >> optimization suggestion?) > > The real cost is the protocol level overhead of INSERT vs COPY. JDBC > batch execution groups things together to reduce the number of network > round trips, but it still has to send each insert as an individual request > to the server. > >> Any performance improvement in postgresql-8.2-507.jdbc4.jar? >> > > No. > >> Is the copy operation much more optimized than inserts (and if so when/will >> the driver support it)? >> > > Yes, copy is significantly faster than insert. If you'd like, construct a > psql test case that does 100,000 individual inserts and you'll see it's > not just a JDBC driver/libpq difference. > > Copy support is available using this patched driver, but it has not been > integrated into the official version yet. > > http://kato.iki.fi/sw/db/postgresql/jdbc/copy/ > > Kris Jurka >
branching off from http://archives.postgresql.org/pgsql-jdbc/2008-09/msg00074.php and joining this thread
Dear Kris,
thanks for the link in regards to COPY versus JDBC batch. From the numbers in that email, I am tempted to assume that, if your tables have indices, you would expect something like 1x speed of COPY = 6x speed of JDBC batch = 36x speed of single-statement JDBC
For the same input file, psql took 112m, whereas pgBee finished in 21m. Both were run on the same database server (localhost), pgBee was batching groups of 100 statements at a time and a real data file was used, with 1131753 SQL statements in total (511335 DELETEs and 567577 INSERTs). pgBee seems to be doing 898 operations/millisecond, or ~450 INSERTSs/millisecond ! Is that possible? psql seems to manage 168 operations/millisecond - half of those are INSERTs.
I suppose I am sort of doing what a JDBC COPY command would do (but with arbitrary SQL statements): pgBee handles the parsing of files, takes care of encoding issues, escapes some characters and find and tries to execute all statements in the most efficient way. Somehow, reading lines from a text file into a String buffer, splitting the statements up and executing them as a batch gives you a real performance boost. Does psql implicitly handle each statetement as a separate transaction? This could account for pgBee's better performance.
Michael
Dear Kris,
thanks for the link in regards to COPY versus JDBC batch. From the numbers in that email, I am tempted to assume that, if your tables have indices, you would expect something like 1x speed of COPY = 6x speed of JDBC batch = 36x speed of single-statement JDBC
Comparison table (records inserted per millisecond)In the last few weeks, I have been trying to improve a bulk update process which involves arbitrary SQL statements, so a COPY command has not been an option. I cannot really compare what I've been doing to the performance of a COPY command, cause the specs differ, but I can compare pgBee's performance to psql's processing text files of SQL statements.
COPY JDBC JDBC batch
WITHOUT INDEXES: 198 1.5 14
WITH 2 INDEXES: 45 1.5 10
For the same input file, psql took 112m, whereas pgBee finished in 21m. Both were run on the same database server (localhost), pgBee was batching groups of 100 statements at a time and a real data file was used, with 1131753 SQL statements in total (511335 DELETEs and 567577 INSERTs). pgBee seems to be doing 898 operations/millisecond, or ~450 INSERTSs/millisecond ! Is that possible? psql seems to manage 168 operations/millisecond - half of those are INSERTs.
I suppose I am sort of doing what a JDBC COPY command would do (but with arbitrary SQL statements): pgBee handles the parsing of files, takes care of encoding issues, escapes some characters and find and tries to execute all statements in the most efficient way. Somehow, reading lines from a text file into a String buffer, splitting the statements up and executing them as a batch gives you a real performance boost. Does psql implicitly handle each statetement as a separate transaction? This could account for pgBee's better performance.
Michael
Michael, Try passing the "-1" (one) flag to psql--this processes the script file as a single transaction. Otherwise, it indeed treats each statement as a single implicit transaction. -- Maciek Sakrejda Truviso, Inc. http://www.truviso.com -----Original Message----- From: Michael Nacos <m.nacos@gmail.com> To: Kris Jurka <books@ejurka.com> Cc: pgsql-jdbc@postgresql.org Subject: Re: [JDBC] Performance comparison to psql. Date: Thu, 25 Sep 2008 17:01:44 +0100 branching off from http://archives.postgresql.org/pgsql-jdbc/2008-09/msg00074.php and joining this thread Dear Kris, thanks for the link in regards to COPY versus JDBC batch. From the numbers in that email, I am tempted to assume that, if your tables have indices, you would expect something like 1x speed of COPY = 6x speed of JDBC batch = 36x speed of single-statement JDBC Comparison table (records inserted per millisecond) COPY JDBC JDBC batch WITHOUT INDEXES: 198 1.5 14 WITH 2 INDEXES: 45 1.5 10 In the last few weeks, I have been trying to improve a bulk update process which involves arbitrary SQL statements, so a COPY command has not been an option. I cannot really compare what I've been doing to the performance of a COPY command, cause the specs differ, but I can compare pgBee's performance to psql's processing text files of SQL statements. For the same input file, psql took 112m, whereas pgBee finished in 21m. Both were run on the same database server (localhost), pgBee was batching groups of 100 statements at a time and a real data file was used, with 1131753 SQL statements in total (511335 DELETEs and 567577 INSERTs). pgBee seems to be doing 898 operations/millisecond, or ~450 INSERTSs/millisecond ! Is that possible? psql seems to manage 168 operations/millisecond - half of those are INSERTs. I suppose I am sort of doing what a JDBC COPY command would do (but with arbitrary SQL statements): pgBee handles the parsing of files, takes care of encoding issues, escapes some characters and find and tries to execute all statements in the most efficient way. Somehow, reading lines from a text file into a String buffer, splitting the statements up and executing them as a batch gives you a real performance boost. Does psql implicitly handle each statetement as a separate transaction? This could account for pgBee's better performance. Michael
right! the -1 flag...
I repeated the psql test, this time with the -1 flag, and the process completed in 107m
so the transaction overhead in the previous psql tests is 5m
psql is probably reading each line from the input file and immediately submitting it
pgBee groups together many lines and batches them off to the server in one step
btw, it's operations/sec, not milliseconds in my previous email -- sorry! I am not using
prepared statements as I have to cope with arbitrary SQL, so it looks like I'm approaching
the performance of unbatched but prepared JDBC statements mentioned in this table:
must be hardware-specific (I'm using a laptop with a 5400rpm disk)
Michael
I repeated the psql test, this time with the -1 flag, and the process completed in 107m
so the transaction overhead in the previous psql tests is 5m
psql is probably reading each line from the input file and immediately submitting it
pgBee groups together many lines and batches them off to the server in one step
btw, it's operations/sec, not milliseconds in my previous email -- sorry! I am not using
prepared statements as I have to cope with arbitrary SQL, so it looks like I'm approaching
the performance of unbatched but prepared JDBC statements mentioned in this table:
Comparison table (records inserted per millisecond)
COPY JDBC JDBC batch
WITHOUT INDEXES: 198 1.5 14
WITH 2 INDEXES: 45 1.5 10
898 operations/second vs. 1500 records/second in the table above. Besides, these numbers
must be hardware-specific (I'm using a laptop with a 5400rpm disk)
Michael
I seem unable to perform basic calculus today... sorry again, with psql -1 the process took 97m
just to set the record straight -- the overhead from the implicit transactions was 15m
M.
just to set the record straight -- the overhead from the implicit transactions was 15m
M.
On Thu, Sep 25, 2008 at 7:12 PM, Michael Nacos <m.nacos@gmail.com> wrote:
right! the -1 flag...
I repeated the psql test, this time with the -1 flag, and the process completed in 107m
so the transaction overhead in the previous psql tests is 5m
psql is probably reading each line from the input file and immediately submitting it
pgBee groups together many lines and batches them off to the server in one step
btw, it's operations/sec, not milliseconds in my previous email -- sorry! I am not using
prepared statements as I have to cope with arbitrary SQL, so it looks like I'm approaching
the performance of unbatched but prepared JDBC statements mentioned in this table:898 operations/second vs. 1500 records/second in the table above. Besides, these numbersComparison table (records inserted per millisecond)
COPY JDBC JDBC batch
WITHOUT INDEXES: 198 1.5 14
WITH 2 INDEXES: 45 1.5 10
must be hardware-specific (I'm using a laptop with a 5400rpm disk)
Michael