Thread: Why is JDBC so slow?
For this test I created a table named tt with ten entries in it. These selects thus select 10^6 rows. As you can see time for jdbc is much slower than psql. Why is this? The code for JDBCclient is at: http://www.tupari.net/JDBCclient.java Tests were run on pg 7.4b2 on a redhat 9 machine with java 1.4.2_01 ]$ time psql -c "select 'blah' from tt tta, tt ttb, tt ttc, tt ttd, tt tte, tt ttf;" playpen > /dev/null real 0m4.321s user 0m1.390s sys 0m0.150s ]$ time java -Xmx256m JDBCclient -U postgres -h localhost -c "select 'blah' from tt tta, tt ttb, tt ttc, tt ttd, tt tte, tt ttf;" playpen > /dev/null real 0m27.271s user 0m21.040s sys 0m0.710s ]$ time psql -c "select 1 from tt tta, tt ttb, tt ttc, tt ttd, tt tte, tt ttf;" playpen > /dev/null real 0m5.125s user 0m1.660s sys 0m0.030s ]$ time java -Xmx256m JDBCclient -U postgres -h localhost -c "select 1 from tt tta, tt ttb, tt ttc, tt ttd, tt tte, tt ttf;" playpen > /dev/null real 0m26.462s user 0m20.180s sys 0m0.970s
Joseph Shraibman wrote:> For this test I created a table named tt with ten entries in it. These > selects thus select 10^6 rows. As you can see time for jdbc is much > slower than psql. Why is this? > Because this is not a fair comparison. The 'psql' program knows in advance what to do with the data, while the JDBC driver will allow you do do anything you want with it and can guess what it is. So the psql program reads data and spits it out (to /dev/null, which is fast) every row it reads. The JDBC driver has to store all these rows in memory and wait for you to tell it what you want to do with it. To prevent data to be stored, use setFetchSize() (in 7.4 drivers) or just use a cursor. Regards, Fernando > The code for JDBCclient is at: http://www.tupari.net/JDBCclient.java > > Tests were run on pg 7.4b2 on a redhat 9 machine with java 1.4.2_01 > > ]$ time psql -c "select 'blah' from tt tta, tt ttb, tt ttc, tt ttd, tt > tte, tt ttf;" playpen > /dev/null > > real 0m4.321s > user 0m1.390s > sys 0m0.150s > ]$ time java -Xmx256m JDBCclient -U postgres -h localhost -c "select > 'blah' from tt tta, tt ttb, tt ttc, tt ttd, tt tte, tt ttf;" playpen > > /dev/null > > real 0m27.271s > user 0m21.040s > sys 0m0.710s > ]$ time psql -c "select 1 from tt tta, tt ttb, tt ttc, tt ttd, tt tte, > tt ttf;" playpen > /dev/null > > real 0m5.125s > user 0m1.660s > sys 0m0.030s > ]$ time java -Xmx256m JDBCclient -U postgres -h localhost -c "select 1 > from tt tta, tt ttb, tt ttc, tt ttd, tt tte, tt ttf;" playpen > /dev/null > > real 0m26.462s > user 0m20.180s > sys 0m0.970s > > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faqs/FAQ.html > -- Fernando Nasser Red Hat - Toronto E-Mail: fnasser@redhat.com 2323 Yonge Street, Suite #300 Toronto, Ontario M4P 2C9
Fernando Nasser <fnasser@redhat.com> writes: > So the psql program reads data and spits it out (to /dev/null, which is fast) > every row it reads. > The JDBC driver has to store all these rows in memory and wait for you > to tell it what you want to do with it. But libpq also stores the entire query result in memory. That doesn't seem like a reason for JDBC to be slower. One thing I'd wonder about is the overall startup overhead for Java. What's the relative time to do a trivial "select 2+2" each way? regards, tom lane
Tom Lane wrote: > One thing I'd wonder about is the overall startup overhead for Java. > What's the relative time to do a trivial "select 2+2" each way? > startup time for java was less than 1 sec ]$ time java -Xmx256m JDBCclient -U postgres -h localhost -c "select 'blah' ;" playpen > /dev/null real 0m0.588s user 0m0.340s sys 0m0.040s
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hello Joseph, the question rather is: why are you coding THAT inefficient? What you are doing, is summing up all dead-time in your program together, because you are: - - querying the database without a cursor. Please try 7.4 driver with setFetchSize() - - you call getColumnCount(), which needs to load ALL objects from the server and count them. - - you are creating a HUGE array of objects, which is already done by the driver, btw. - - after you've swallowed all these things one after another, you spit it out in a quite inefficient manner (at least forsuch a huge amount of data). So: - - do NOT use getColumnCount. psql doesn't use that, too. - - use a buffered outputstream. That's a java issue, ok, but hey, this is a virtual machine. So try to omit ping-pong withthe OS. - - spit out the values as you get them, directly. No intermediate array. An not that big ones (this one goes to survivoror even old gen heap, so SLOW!). - - as hotspot compiles on the fly, try running the test several times, and see what the performance does. Normally, theprogram get's faster after 5-10 runs, in case of -server after 10-20, but then... whow. hope that helps... Patric -----BEGIN PGP SIGNATURE----- Version: PGPsdk version 1.7.1 (C) 1997-1999 Network Associates, Inc. and its affiliated companies. iQA/AwUBP1T59nxoBrvMu8qQEQIAjACeNU98QPbYsLHY2Pjyo6haXK5laF0AoNax oGPWXeUzSwjTh8EwQnzzICeR =NA69 -----END PGP SIGNATURE-----
On Tue, Sep 02, 2003 at 02:05:15PM -0400, Joseph Shraibman wrote: > For this test I created a table named tt with ten entries in it. These > selects thus select 10^6 rows. As you can see time for jdbc is much > slower than psql. Why is this? > ]$ time java -Xmx256m JDBCclient -U postgres -h localhost -c "select > 'blah' from tt tta, tt ttb, tt ttc, tt ttd, tt tte, tt ttf;" playpen > > /dev/null > > real 0m27.271s > user 0m21.040s > sys 0m0.710s With those heap settings you spend a lot of time doing GC. As pointed out by others you can reduce the memory footprint of the driver via setFetchSize(). A quick estimate from the GC output shows that about 50% of allocations immediately become garbage, which can probably be improved on if it's coming from the driver. That's not an unusual allocation pattern for java programs, though. With -Xmx256m and --nooutput --time: query time: 25.696 seconds real 0m26.455s user 0m21.050s sys 0m0.450s (GC totals ~14 seconds) With -Xmx256m -Xms256m and --nooutput --time: query time: 16.739 seconds real 0m17.494s user 0m12.400s sys 0m0.670s (GC totals ~5 seconds) This can obviously still be improved on, but I think some of the blame for slow execution can be pinned on your heap settings not being tuned for the object allocation the client does. Incidentally, your use of getFetchSize() to size the return array in getResults() seems buggy .. I believe the "correct" way is to do ResultSet.last() then ResultSet.getRow(). But that'll probably force the whole resultset into memory, too .. -O
On 02/09/2003 20:22 Tom Lane wrote: > Fernando Nasser <fnasser@redhat.com> writes: > > So the psql program reads data and spits it out (to /dev/null, which is > fast) > > every row it reads. > > > The JDBC driver has to store all these rows in memory and wait for you > > to tell it what you want to do with it. > > But libpq also stores the entire query result in memory. That doesn't > seem like a reason for JDBC to be slower. Speaking as Java programmer (as well a long-time C programmer), you'd be amazed/shocked just how much memory a Java program can burn (take the hidden malloc usage of (s/f)printf and magnify it a few orders of magnitude). The problem with OO programmers is that they seem have even less understanding of how computers work that even COBOL programmers do. Basically, I think Joseph's test program does more to stress his computer's memory management than it tests PostgreSQL. And I think most people on this list realize that JDBC will always be slower than a C and/or assembler interface. That said and speaking as a Java/web developer, I find the PostgreSQL JDBC to be fast enough for real life applications. regards -- Paul Thomas +------------------------------+---------------------------------------------+ | Thomas Micro Systems Limited | Software Solutions for the Smaller Business | | Computer Consultants | http://www.thomas-micro-systems-ltd.co.uk | +------------------------------+---------------------------------------------+
On Wed, Sep 03, 2003 at 09:59:51AM +1200, Oliver Jowett wrote: > On Tue, Sep 02, 2003 at 02:05:15PM -0400, Joseph Shraibman wrote: > > For this test I created a table named tt with ten entries in it. These > > selects thus select 10^6 rows. As you can see time for jdbc is much > > slower than psql. Why is this? > > > ]$ time java -Xmx256m JDBCclient -U postgres -h localhost -c "select > > 'blah' from tt tta, tt ttb, tt ttc, tt ttd, tt tte, tt ttf;" playpen > > > /dev/null > > > > real 0m27.271s > > user 0m21.040s > > sys 0m0.710s > > With those heap settings you spend a lot of time doing GC. As pointed out by > others you can reduce the memory footprint of the driver via setFetchSize(). > A quick estimate from the GC output shows that about 50% of allocations > immediately become garbage, which can probably be improved on if it's coming > from the driver. That's not an unusual allocation pattern for java programs, > though. Some quick heap profiling gives me (10^5 dataset): percent live alloc'ed stack class rank self accum bytes objs bytes objs trace name 1 31.39% 31.39% 9321928 272 9321928 272 1 [I (this is a profiling artifact) 2 8.08% 39.47% 2400048 100002 2400048 100002 763 java.lang.String org.postgresql.core.Encoding.decodeUTF8(Encoding.java:283) org.postgresql.core.Encoding.decode(Encoding.java:181) org.postgresql.core.Encoding.decode(Encoding.java:197) org.postgresql.jdbc1.AbstractJdbc1ResultSet.getString(AbstractJdbc1ResultSet.java:179) 3 8.08% 47.55% 2400048 100002 2400048 100002 764 [C java.lang.String.<init>(String.java:200) org.postgresql.core.Encoding.decodeUTF8(Encoding.java:283) org.postgresql.core.Encoding.decode(Encoding.java:181) org.postgresql.core.Encoding.decode(Encoding.java:197) 4 5.39% 52.94% 1600160 100004 1600160 100004 709 [B org.postgresql.core.PGStream.Receive(PGStream.java:369) org.postgresql.core.PGStream.ReceiveTupleV2(PGStream.java:354) org.postgresql.core.QueryExecutor.receiveTupleV2(QueryExecutor.java:399) org.postgresql.core.QueryExecutor.executeV2(QueryExecutor.java:247) 5 5.39% 58.33% 1600064 100004 1600064 100004 708 [B org.postgresql.core.PGStream.ReceiveTupleV2(PGStream.java:331) org.postgresql.core.QueryExecutor.receiveTupleV2(QueryExecutor.java:399) org.postgresql.core.QueryExecutor.executeV2(QueryExecutor.java:247) org.postgresql.core.QueryExecutor.execute(QueryExecutor.java:104) 6 5.39% 63.71% 1600056 100003 1600056 100003 708 <Unknown_class> (same trace as above; this appears to be a byte[][] object) 7 5.39% 69.10% 1600048 100003 1600048 100003 707 [B org.postgresql.core.PGStream.Receive(PGStream.java:369) org.postgresql.core.PGStream.ReceiveTupleV2(PGStream.java:330) org.postgresql.core.QueryExecutor.receiveTupleV2(QueryExecutor.java:399) org.postgresql.core.QueryExecutor.executeV2(QueryExecutor.java:247) 8 5.39% 74.49% 1600000 100000 1600000 100000 795 java.lang.Object JDBCclient.getResults(JDBCclient.java:26) JDBCclient.doQuery(JDBCclient.java:15) JDBCclient.main(JDBCclient.java:95) 9 5.39% 79.88% 1600000 100000 1600000 100000 815 java.lang.Integer org.postgresql.jdbc1.AbstractJdbc1Connection.getPGType(AbstractJdbc1Connection.java:1670) org.postgresql.core.Field.getPGType(Field.java:100) org.postgresql.jdbc2.AbstractJdbc2ResultSet.getObject(AbstractJdbc2ResultSet.java:150) JDBCclient.getResults(JDBCclient.java:30) 10 5.39% 85.26% 1600000 100000 1600000 100000 797 java.lang.Integer org.postgresql.jdbc1.AbstractJdbc1Connection.getSQLType(AbstractJdbc1Connection.java:1593) org.postgresql.core.Field.getSQLType(Field.java:111) org.postgresql.jdbc2.AbstractJdbc2ResultSet.getObject(AbstractJdbc2ResultSet.java:105) JDBCclient.getResults(JDBCclient.java:30) 11 5.39% 90.65% 1600000 100000 1600000 100000 796 <Unknown_class> org.postgresql.jdbc1.AbstractJdbc1ResultSet.next(AbstractJdbc1ResultSet.java:152) JDBCclient.getResults(JDBCclient.java:28) JDBCclient.doQuery(JDBCclient.java:15) JDBCclient.main(JDBCclient.java:95) 12 4.41% 95.06% 1310864 14 1310864 14 785 java.lang.Object java.util.Vector.ensureCapacityHelper(Vector.java:222) java.util.Vector.addElement(Vector.java:584) org.postgresql.core.QueryExecutor.receiveTupleV2(QueryExecutor.java:403) org.postgresql.core.QueryExecutor.executeV2(QueryExecutor.java:247) 13 1.35% 96.41% 400016 1 400016 1 795 <Unknown_class> JDBCclient.getResults(JDBCclient.java:26) JDBCclient.doQuery(JDBCclient.java:15) JDBCclient.main(JDBCclient.java:95) Of these: #1 is a profiling artifact. #2 and #3 are unavoidable without changing the client. #4, #5, #7 are unavoidable without using setFetchSize() or a major resultset overhaul (resultsets are internally storedas a byte[][]) #8 is directly client-generated. #9, #10 could be fixed (temporary Integer objects used only to key the oid/type caches) #11 could be fixed (row_buffer is reallocated on every row even if it has not changed size.. and is always copied into evenif no updates are ever done) #12 is unavoidable unless the backend protocol provides a row count ahead of time (I don't think it does?) #13 is directly client-generated. So some lowhanging fruit there, but not much. -O
> This can obviously still be improved on, but I think some of the blame for > slow execution can be pinned on your heap settings not being tuned for the > object allocation the client does. > Interesting. I wouldn't have though that expanding the heap was such a slow operation. I wrote this test so I could compare JDBC and psql side by side. My real application uses a cursor and the memory doesn't get so big. > Incidentally, your use of getFetchSize() to size the return array in > getResults() seems buggy .. I believe the "correct" way is to do > ResultSet.last() then ResultSet.getRow(). That's what I used to do, but it makes the code cleaner to use getFetchSize(). Seems to work all the time for me.