Thread: Why is JDBC so slow?

Why is JDBC so slow?

From
Joseph Shraibman
Date:
For this test I created a table named tt with ten entries in it.  These
selects thus select 10^6 rows.  As you can see time for jdbc is much
slower than psql.  Why is this?

The code for JDBCclient is at: http://www.tupari.net/JDBCclient.java

Tests were run on pg 7.4b2 on a redhat 9 machine with java 1.4.2_01

]$ time psql -c "select 'blah' from tt tta, tt ttb, tt ttc, tt ttd, tt
tte, tt ttf;" playpen > /dev/null

real    0m4.321s
user    0m1.390s
sys     0m0.150s
]$ time java -Xmx256m JDBCclient -U postgres -h localhost -c "select
'blah' from tt tta, tt ttb, tt ttc, tt ttd, tt tte, tt ttf;" playpen >
/dev/null

real    0m27.271s
user    0m21.040s
sys     0m0.710s
]$ time psql -c "select 1 from tt tta, tt ttb, tt ttc, tt ttd, tt tte,
tt ttf;" playpen > /dev/null

real    0m5.125s
user    0m1.660s
sys     0m0.030s
]$ time java -Xmx256m JDBCclient -U postgres -h localhost -c "select 1
from tt tta, tt ttb, tt ttc, tt ttd, tt tte, tt ttf;" playpen > /dev/null

real    0m26.462s
user    0m20.180s
sys     0m0.970s


Re: Why is JDBC so slow?

From
Fernando Nasser
Date:
Joseph Shraibman wrote:> For this test I created a table named tt with ten
entries in it.  These
> selects thus select 10^6 rows.  As you can see time for jdbc is much
> slower than psql.  Why is this?
>

Because this is not a fair comparison.  The 'psql' program knows in advance what
to do with the data, while the JDBC driver will allow you do do anything you
want with it and can guess what it is.

So the psql program reads data and spits it out (to /dev/null, which is fast)
every row it reads.

The JDBC driver has to store all these rows in memory and wait for you to tell
it what you want to do with it.

To prevent data to be stored, use setFetchSize() (in 7.4 drivers) or just use a
cursor.

Regards,
Fernando



> The code for JDBCclient is at: http://www.tupari.net/JDBCclient.java
>
> Tests were run on pg 7.4b2 on a redhat 9 machine with java 1.4.2_01
>
> ]$ time psql -c "select 'blah' from tt tta, tt ttb, tt ttc, tt ttd, tt
> tte, tt ttf;" playpen > /dev/null
>
> real    0m4.321s
> user    0m1.390s
> sys     0m0.150s
> ]$ time java -Xmx256m JDBCclient -U postgres -h localhost -c "select
> 'blah' from tt tta, tt ttb, tt ttc, tt ttd, tt tte, tt ttf;" playpen >
> /dev/null
>
> real    0m27.271s
> user    0m21.040s
> sys     0m0.710s
> ]$ time psql -c "select 1 from tt tta, tt ttb, tt ttc, tt ttd, tt tte,
> tt ttf;" playpen > /dev/null
>
> real    0m5.125s
> user    0m1.660s
> sys     0m0.030s
> ]$ time java -Xmx256m JDBCclient -U postgres -h localhost -c "select 1
> from tt tta, tt ttb, tt ttc, tt ttd, tt tte, tt ttf;" playpen > /dev/null
>
> real    0m26.462s
> user    0m20.180s
> sys     0m0.970s
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
>               http://www.postgresql.org/docs/faqs/FAQ.html
>

--
Fernando Nasser
Red Hat - Toronto                       E-Mail:  fnasser@redhat.com
2323 Yonge Street, Suite #300
Toronto, Ontario   M4P 2C9


Re: Why is JDBC so slow?

From
Tom Lane
Date:
Fernando Nasser <fnasser@redhat.com> writes:
> So the psql program reads data and spits it out (to /dev/null, which is fast)
> every row it reads.

> The JDBC driver has to store all these rows in memory and wait for you
> to tell it what you want to do with it.

But libpq also stores the entire query result in memory.  That doesn't
seem like a reason for JDBC to be slower.

One thing I'd wonder about is the overall startup overhead for Java.
What's the relative time to do a trivial "select 2+2" each way?

            regards, tom lane

Re: Why is JDBC so slow?

From
Joseph Shraibman
Date:
Tom Lane wrote:

> One thing I'd wonder about is the overall startup overhead for Java.
> What's the relative time to do a trivial "select 2+2" each way?
>
startup time for java was less than 1 sec

]$ time java -Xmx256m JDBCclient -U postgres -h localhost -c "select 'blah' ;" playpen >
/dev/null

real    0m0.588s
user    0m0.340s
sys     0m0.040s


Re: Why is JDBC so slow? [Viruschecked]

From
"Patric Bechtel"
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello Joseph,

the question rather is: why are you coding THAT inefficient?
What you are doing, is summing up all dead-time in your program together,
because you are:
- - querying the database without a cursor. Please try 7.4 driver with setFetchSize()
- - you call getColumnCount(), which needs to load ALL objects from the server and count them.
- - you are creating a HUGE array of objects, which is already done by the driver, btw.
- - after you've swallowed all these things one after another, you spit it out in a quite inefficient manner (at least
forsuch a huge amount of data). 

So:
- - do NOT use getColumnCount. psql doesn't use that, too.
- - use a buffered outputstream. That's a java issue, ok, but hey, this is a virtual machine. So try to omit ping-pong
withthe OS. 
- - spit out the values as you get them, directly. No intermediate array. An not that big ones (this one goes to
survivoror even old gen heap, so SLOW!). 
- - as hotspot compiles on the fly, try running the test several times, and see what the performance does. Normally,
theprogram get's faster after 5-10 runs, in  
case of -server after 10-20, but then... whow.

hope that helps...

Patric


-----BEGIN PGP SIGNATURE-----
Version: PGPsdk version 1.7.1 (C) 1997-1999 Network Associates, Inc. and its affiliated companies.

iQA/AwUBP1T59nxoBrvMu8qQEQIAjACeNU98QPbYsLHY2Pjyo6haXK5laF0AoNax
oGPWXeUzSwjTh8EwQnzzICeR
=NA69
-----END PGP SIGNATURE-----



Re: Why is JDBC so slow?

From
Oliver Jowett
Date:
On Tue, Sep 02, 2003 at 02:05:15PM -0400, Joseph Shraibman wrote:
> For this test I created a table named tt with ten entries in it.  These
> selects thus select 10^6 rows.  As you can see time for jdbc is much
> slower than psql.  Why is this?

> ]$ time java -Xmx256m JDBCclient -U postgres -h localhost -c "select
> 'blah' from tt tta, tt ttb, tt ttc, tt ttd, tt tte, tt ttf;" playpen >
> /dev/null
>
> real    0m27.271s
> user    0m21.040s
> sys     0m0.710s

With those heap settings you spend a lot of time doing GC. As pointed out by
others you can reduce the memory footprint of the driver via setFetchSize().
A quick estimate from the GC output shows that about 50% of allocations
immediately become garbage, which can probably be improved on if it's coming
from the driver. That's not an unusual allocation pattern for java programs,
though.

With -Xmx256m and --nooutput --time:

 query time: 25.696 seconds

 real    0m26.455s
 user    0m21.050s
 sys     0m0.450s

(GC totals ~14 seconds)

With -Xmx256m -Xms256m and --nooutput --time:

 query time: 16.739 seconds

 real    0m17.494s
 user    0m12.400s
 sys     0m0.670s

(GC totals ~5 seconds)

This can obviously still be improved on, but I think some of the blame for
slow execution can be pinned on your heap settings not being tuned for the
object allocation the client does.

Incidentally, your use of getFetchSize() to size the return array in
getResults() seems buggy .. I believe the "correct" way is to do
ResultSet.last() then ResultSet.getRow(). But that'll probably force the
whole resultset into memory, too ..

-O

Re: Why is JDBC so slow?

From
Paul Thomas
Date:
On 02/09/2003 20:22 Tom Lane wrote:
> Fernando Nasser <fnasser@redhat.com> writes:
> > So the psql program reads data and spits it out (to /dev/null, which is
> fast)
> > every row it reads.
>
> > The JDBC driver has to store all these rows in memory and wait for you
> > to tell it what you want to do with it.
>
> But libpq also stores the entire query result in memory.  That doesn't
> seem like a reason for JDBC to be slower.


Speaking as Java programmer (as well a long-time C programmer), you'd be
amazed/shocked just how much memory a Java program can burn (take the
hidden malloc usage of (s/f)printf and magnify it a few orders of
magnitude). The problem with OO programmers is that they seem have even
less understanding of how computers work that even COBOL programmers do.
Basically, I think Joseph's test program does more to stress his
computer's memory management than it tests PostgreSQL. And I think most
people on this list realize that JDBC will always be slower than a C
and/or assembler interface. That said and speaking as a Java/web
developer, I find the PostgreSQL JDBC to be fast enough for real life
applications.

regards

--
Paul Thomas
+------------------------------+---------------------------------------------+
| Thomas Micro Systems Limited | Software Solutions for the Smaller
Business |
| Computer Consultants         |
http://www.thomas-micro-systems-ltd.co.uk   |
+------------------------------+---------------------------------------------+

Re: Why is JDBC so slow?

From
Oliver Jowett
Date:
On Wed, Sep 03, 2003 at 09:59:51AM +1200, Oliver Jowett wrote:
> On Tue, Sep 02, 2003 at 02:05:15PM -0400, Joseph Shraibman wrote:
> > For this test I created a table named tt with ten entries in it.  These
> > selects thus select 10^6 rows.  As you can see time for jdbc is much
> > slower than psql.  Why is this?
>
> > ]$ time java -Xmx256m JDBCclient -U postgres -h localhost -c "select
> > 'blah' from tt tta, tt ttb, tt ttc, tt ttd, tt tte, tt ttf;" playpen >
> > /dev/null
> >
> > real    0m27.271s
> > user    0m21.040s
> > sys     0m0.710s
>
> With those heap settings you spend a lot of time doing GC. As pointed out by
> others you can reduce the memory footprint of the driver via setFetchSize().
> A quick estimate from the GC output shows that about 50% of allocations
> immediately become garbage, which can probably be improved on if it's coming
> from the driver. That's not an unusual allocation pattern for java programs,
> though.

Some quick heap profiling gives me (10^5 dataset):

          percent         live       alloc'ed  stack class
 rank   self  accum    bytes objs   bytes objs trace name
    1 31.39% 31.39%  9321928  272 9321928  272     1 [I
        (this is a profiling artifact)

    2  8.08% 39.47%  2400048 100002 2400048 100002   763 java.lang.String
        org.postgresql.core.Encoding.decodeUTF8(Encoding.java:283)
        org.postgresql.core.Encoding.decode(Encoding.java:181)
        org.postgresql.core.Encoding.decode(Encoding.java:197)
        org.postgresql.jdbc1.AbstractJdbc1ResultSet.getString(AbstractJdbc1ResultSet.java:179)

    3  8.08% 47.55%  2400048 100002 2400048 100002   764 [C
        java.lang.String.<init>(String.java:200)
        org.postgresql.core.Encoding.decodeUTF8(Encoding.java:283)
        org.postgresql.core.Encoding.decode(Encoding.java:181)
        org.postgresql.core.Encoding.decode(Encoding.java:197)

    4  5.39% 52.94%  1600160 100004 1600160 100004   709 [B
        org.postgresql.core.PGStream.Receive(PGStream.java:369)
        org.postgresql.core.PGStream.ReceiveTupleV2(PGStream.java:354)
        org.postgresql.core.QueryExecutor.receiveTupleV2(QueryExecutor.java:399)
        org.postgresql.core.QueryExecutor.executeV2(QueryExecutor.java:247)

    5  5.39% 58.33%  1600064 100004 1600064 100004   708 [B
        org.postgresql.core.PGStream.ReceiveTupleV2(PGStream.java:331)
        org.postgresql.core.QueryExecutor.receiveTupleV2(QueryExecutor.java:399)
        org.postgresql.core.QueryExecutor.executeV2(QueryExecutor.java:247)
        org.postgresql.core.QueryExecutor.execute(QueryExecutor.java:104)

    6  5.39% 63.71%  1600056 100003 1600056 100003   708 <Unknown_class>
        (same trace as above; this appears to be a byte[][] object)

    7  5.39% 69.10%  1600048 100003 1600048 100003   707 [B
        org.postgresql.core.PGStream.Receive(PGStream.java:369)
        org.postgresql.core.PGStream.ReceiveTupleV2(PGStream.java:330)
        org.postgresql.core.QueryExecutor.receiveTupleV2(QueryExecutor.java:399)
        org.postgresql.core.QueryExecutor.executeV2(QueryExecutor.java:247)

    8  5.39% 74.49%  1600000 100000 1600000 100000   795 java.lang.Object
        JDBCclient.getResults(JDBCclient.java:26)
        JDBCclient.doQuery(JDBCclient.java:15)
        JDBCclient.main(JDBCclient.java:95)

    9  5.39% 79.88%  1600000 100000 1600000 100000   815 java.lang.Integer
        org.postgresql.jdbc1.AbstractJdbc1Connection.getPGType(AbstractJdbc1Connection.java:1670)
        org.postgresql.core.Field.getPGType(Field.java:100)
        org.postgresql.jdbc2.AbstractJdbc2ResultSet.getObject(AbstractJdbc2ResultSet.java:150)
        JDBCclient.getResults(JDBCclient.java:30)

   10  5.39% 85.26%  1600000 100000 1600000 100000   797 java.lang.Integer
        org.postgresql.jdbc1.AbstractJdbc1Connection.getSQLType(AbstractJdbc1Connection.java:1593)
        org.postgresql.core.Field.getSQLType(Field.java:111)
        org.postgresql.jdbc2.AbstractJdbc2ResultSet.getObject(AbstractJdbc2ResultSet.java:105)
        JDBCclient.getResults(JDBCclient.java:30)

   11  5.39% 90.65%  1600000 100000 1600000 100000   796 <Unknown_class>
        org.postgresql.jdbc1.AbstractJdbc1ResultSet.next(AbstractJdbc1ResultSet.java:152)
        JDBCclient.getResults(JDBCclient.java:28)
        JDBCclient.doQuery(JDBCclient.java:15)
        JDBCclient.main(JDBCclient.java:95)

   12  4.41% 95.06%  1310864   14 1310864   14   785 java.lang.Object
        java.util.Vector.ensureCapacityHelper(Vector.java:222)
        java.util.Vector.addElement(Vector.java:584)
        org.postgresql.core.QueryExecutor.receiveTupleV2(QueryExecutor.java:403)
        org.postgresql.core.QueryExecutor.executeV2(QueryExecutor.java:247)

   13  1.35% 96.41%   400016    1  400016    1   795 <Unknown_class>
        JDBCclient.getResults(JDBCclient.java:26)
        JDBCclient.doQuery(JDBCclient.java:15)
        JDBCclient.main(JDBCclient.java:95)

Of these:

 #1 is a profiling artifact.
 #2 and #3 are unavoidable without changing the client.
 #4, #5, #7 are unavoidable without using setFetchSize() or a major resultset overhaul (resultsets are internally
storedas a byte[][]) 
 #8 is directly client-generated.
 #9, #10 could be fixed (temporary Integer objects used only to key the oid/type caches)
 #11 could be fixed (row_buffer is reallocated on every row even if it has not changed size.. and is always copied into
evenif no updates are ever done) 
 #12 is unavoidable unless the backend protocol provides a row count ahead of time (I don't think it does?)
 #13 is directly client-generated.

So some lowhanging fruit there, but not much.

-O


Re: Why is JDBC so slow?

From
Joseph Shraibman
Date:
> This can obviously still be improved on, but I think some of the blame for
> slow execution can be pinned on your heap settings not being tuned for the
> object allocation the client does.
>
Interesting.  I wouldn't have though that expanding the heap was such a slow operation.

I wrote this test so I could compare JDBC and psql side by side.  My real application uses
a cursor and the memory doesn't get so big.

> Incidentally, your use of getFetchSize() to size the return array in
> getResults() seems buggy .. I believe the "correct" way is to do
> ResultSet.last() then ResultSet.getRow().

That's what I used to do, but it makes the code cleaner to use getFetchSize().  Seems to
work all the time for me.