Thread: Performance comparison to psql.

Performance comparison to psql.

From

Arie Ozarov

Date:

05 February 2008, 21:11:08

We compared insert & select operations between jdbc (postgres 8.2,
postgresql-8.2-506.jdbc3.jar) and psql.

Here are some numbers:

PSQL *COPY* from STDIN
recordsCount = 10 000 000

WITHOUT INDEXES: time =   50544 ms (*0.005 ms* per record)
WITH  2 INDEXES: time =  221491 ms (*0.022 ms* per record)


*JDBC (prepared statement without batch)*
recordsCount = 100 000

WITHOUT INDEXES: time =  64874 ms (*0.649 **ms* per record)
WITH  2 INDEXES: time =  63057 ms (*0.630 ms* per record)


*JDBC (**prepared statement with batch**)*
recordsCount = 1 000 000

WITHOUT INDEXES: time =  73205 ms (*0.073** ms* per record)
WITH  2 INDEXES: time = 100270 ms (*0.100 ms* per record)


Comparison table (records inserted per millisecond)
                    COPY    JDBC    JDBC batch
WITHOUT INDEXES:    198     1.5        14
WITH  2 INDEXES:     45     1.5        10


As for select/queries psql was about 3 times faster.
Both psql and jdbc operations were done remotely from the same machine.

I understand that JDBC has some overhead (object translation,..) but didn't
think the difference would be that big. Do this numbers look correct (any
optimization suggestion?)

Any performance improvement in postgresql-8.2-507.jdbc4.jar?

Is the copy operation much more optimized than inserts (and if so when/will
the driver support it)?

Thanks,
Arie.

Re: Performance comparison to psql.

From

Kris Jurka

Date:

05 February 2008, 21:32:05

On Tue, 5 Feb 2008, Arie Ozarov wrote:

> I understand that JDBC has some overhead (object translation,..) but didn't
> think the difference would be that big. Do this numbers look correct (any
> optimization suggestion?)

The real cost is the protocol level overhead of INSERT vs COPY.  JDBC
batch execution groups things together to reduce the number of network
round trips, but it still has to send each insert as an individual request
to the server.

> Any performance improvement in postgresql-8.2-507.jdbc4.jar?
>

No.

> Is the copy operation much more optimized than inserts (and if so when/will
> the driver support it)?
>

Yes, copy is significantly faster than insert.  If you'd like, construct a
psql test case that does 100,000 individual inserts and you'll see it's
not just a JDBC driver/libpq difference.

Copy support is available using this patched driver, but it has not been
integrated into the official version yet.

http://kato.iki.fi/sw/db/postgresql/jdbc/copy/

Kris Jurka

Re: Performance comparison to psql.

From

Tom Lane

Date:

05 February 2008, 22:09:32

Kris Jurka <books@ejurka.com> writes:
> On Tue, 5 Feb 2008, Arie Ozarov wrote:
>> I understand that JDBC has some overhead (object translation,..) but didn't
>> think the difference would be that big. Do this numbers look correct (any
>> optimization suggestion?)

> The real cost is the protocol level overhead of INSERT vs COPY.

Also, if you were inserting only one row per INSERT command, there's a
significant statement startup/shutdown overhead in the server, even for
a prepared statement.  I don't see any reason to think that these
numbers are JDBC's fault --- it's just a fact of life that COPY is
a lot more efficient than a series of INSERTs.  (If it were not, we'd
hardly even bother having it.)

            regards, tom lane

Re: Performance comparison to psql.

From

Arie Ozarov

Date:

05 February 2008, 22:13:59

Would it be better than to group inserts using standard statement (not
prepared - or does it really matter) this way:
insert into T values (set1), (set2),..,(setN); ?

Any reason for a select statement to be 3 times slower?

When is it planned to include the copy support in the official version?

Thanks!
Arie.


On 2/5/08 2:31 PM, "Kris Jurka" <books@ejurka.com> wrote:

>
>
> On Tue, 5 Feb 2008, Arie Ozarov wrote:
>
>> I understand that JDBC has some overhead (object translation,..) but didn't
>> think the difference would be that big. Do this numbers look correct (any
>> optimization suggestion?)
>
> The real cost is the protocol level overhead of INSERT vs COPY.  JDBC
> batch execution groups things together to reduce the number of network
> round trips, but it still has to send each insert as an individual request
> to the server.
>
>> Any performance improvement in postgresql-8.2-507.jdbc4.jar?
>>
>
> No.
>
>> Is the copy operation much more optimized than inserts (and if so when/will
>> the driver support it)?
>>
>
> Yes, copy is significantly faster than insert.  If you'd like, construct a
> psql test case that does 100,000 individual inserts and you'll see it's
> not just a JDBC driver/libpq difference.
>
> Copy support is available using this patched driver, but it has not been
> integrated into the official version yet.
>
> http://kato.iki.fi/sw/db/postgresql/jdbc/copy/
>
> Kris Jurka
>

Re: Performance comparison to psql.

From

"Michael Nacos"

Date:

25 September 2008, 16:01:53

branching off from http://archives.postgresql.org/pgsql-jdbc/2008-09/msg00074.php and joining this thread

Dear Kris,

thanks for the link in regards to COPY versus JDBC batch. From the numbers in that email, I am tempted to assume that, if your tables have indices, you would expect something like 1x speed of COPY = 6x speed of JDBC batch = 36x speed of single-statement JDBC

Comparison table (records inserted per millisecond)
                    COPY    JDBC    JDBC batch
WITHOUT INDEXES:    198     1.5        14
WITH  2 INDEXES:     45     1.5        10

In the last few weeks, I have been trying to improve a bulk update process which involves arbitrary SQL statements, so a COPY command has not been an option. I cannot really compare what I've been doing to the performance of a COPY command, cause the specs differ, but I can compare pgBee's performance to psql's processing text files of SQL statements.

For the same input file, psql took 112m, whereas pgBee finished in 21m. Both were run on the same database server (localhost), pgBee was batching groups of 100 statements at a time and a real data file was used, with 1131753 SQL statements in total (511335 DELETEs and 567577 INSERTs). pgBee seems to be doing 898 operations/millisecond, or ~450 INSERTSs/millisecond ! Is that possible? psql seems to manage 168 operations/millisecond - half of those are INSERTs.

I suppose I am sort of doing what a JDBC COPY command would do (but with arbitrary SQL statements): pgBee handles the parsing of files, takes care of encoding issues, escapes some characters and find and tries to execute all statements in the most efficient way. Somehow, reading lines from a text file into a String buffer, splitting the statements up and executing them as a batch gives you a real performance boost. Does psql implicitly handle each statetement as a separate transaction? This could account for pgBee's better performance.

Michael

Re: Performance comparison to psql.

From

Maciek Sakrejda

Date:

25 September 2008, 16:17:50

Michael,

Try passing the "-1" (one) flag to psql--this processes the script file
as a single transaction. Otherwise, it indeed treats each statement as a
single implicit transaction.

--
Maciek Sakrejda
Truviso, Inc.
http://www.truviso.com

-----Original Message-----
From: Michael Nacos <m.nacos@gmail.com>
To: Kris Jurka <books@ejurka.com>
Cc: pgsql-jdbc@postgresql.org
Subject: Re: [JDBC] Performance comparison to psql.
Date: Thu, 25 Sep 2008 17:01:44 +0100

branching off from
http://archives.postgresql.org/pgsql-jdbc/2008-09/msg00074.php and
joining this thread

Dear Kris,

thanks for the link in regards to COPY versus JDBC batch. From the
numbers in that email, I am tempted to assume that, if your tables have
indices, you would expect something like 1x speed of COPY = 6x speed of
JDBC batch = 36x speed of single-statement JDBC
Comparison table (records inserted per millisecond)
                    COPY    JDBC    JDBC batch
WITHOUT INDEXES:    198     1.5        14
WITH  2 INDEXES:     45     1.5        10

In the last few weeks, I have been trying to improve a bulk update
process which involves arbitrary SQL statements, so a COPY command has
not been an option. I cannot really compare what I've been doing to the
performance of a COPY command, cause the specs differ, but I can compare
pgBee's performance to psql's processing text files of SQL statements.

For the same input file, psql took 112m, whereas pgBee finished in 21m.
Both were run on the same database server (localhost), pgBee was
batching groups of 100 statements at a time and a real data file was
used, with 1131753 SQL statements in total (511335 DELETEs and 567577
INSERTs). pgBee seems to be doing 898 operations/millisecond, or ~450
INSERTSs/millisecond ! Is that possible? psql seems to manage 168
operations/millisecond - half of those are INSERTs.

I suppose I am sort of doing what a JDBC COPY command would do (but with
arbitrary SQL statements): pgBee handles the parsing of files, takes
care of encoding issues, escapes some characters and find and tries to
execute all statements in the most efficient way. Somehow, reading lines
from a text file into a String buffer, splitting the statements up and
executing them as a batch gives you a real performance boost. Does psql
implicitly handle each statetement as a separate transaction? This could
account for pgBee's better performance.

Michael

Re: Performance comparison to psql.

From

"Michael Nacos"

Date:

25 September 2008, 18:12:56

right! the -1 flag...

I repeated the psql test, this time with the -1 flag, and the process completed in 107m
so the transaction overhead in the previous psql tests is 5m
psql is probably reading each line from the input file and immediately submitting it
pgBee groups together many lines and batches them off to the server in one step

btw, it's operations/sec, not milliseconds in my previous email -- sorry! I am not using
prepared statements as I have to cope with arbitrary SQL, so it looks like I'm approaching
the performance of unbatched but prepared JDBC statements mentioned in this table:

Comparison table (records inserted per millisecond)
                    COPY    JDBC    JDBC batch


WITHOUT INDEXES:    198     1.5        14
WITH  2 INDEXES:     45     1.5        10

898 operations/second vs. 1500 records/second in the table above. Besides, these numbers
must be hardware-specific (I'm using a laptop with a 5400rpm disk)

Michael

Re: Performance comparison to psql.

From

"Michael Nacos"

Date:

25 September 2008, 18:46:15

I seem unable to perform basic calculus today... sorry again, with psql -1 the process took 97m
just to set the record straight -- the overhead from the implicit transactions was 15m

M.

On Thu, Sep 25, 2008 at 7:12 PM, Michael Nacos <m.nacos@gmail.com> wrote:

right! the -1 flag...

I repeated the psql test, this time with the -1 flag, and the process completed in 107m
so the transaction overhead in the previous psql tests is 5m
psql is probably reading each line from the input file and immediately submitting it
pgBee groups together many lines and batches them off to the server in one step

btw, it's operations/sec, not milliseconds in my previous email -- sorry! I am not using
prepared statements as I have to cope with arbitrary SQL, so it looks like I'm approaching
the performance of unbatched but prepared JDBC statements mentioned in this table:
Comparison table (records inserted per millisecond)
                    COPY    JDBC    JDBC batch



WITHOUT INDEXES:    198     1.5        14
WITH  2 INDEXES:     45     1.5        10
898 operations/second vs. 1500 records/second in the table above. Besides, these numbers
must be hardware-specific (I'm using a laptop with a 5400rpm disk)

Michael