Re: JDBC slow performance on resultSet iteration? - Mailing list pgsql-jdbc

From Edson Richter
Subject Re: JDBC slow performance on resultSet iteration?
Date
Msg-id BLU0-SMTP209E84ED19AFFF79EEDFC5FCFF20@phx.gbl
Whole thread Raw
In response to Re: JDBC slow performance on resultSet iteration?  (Rui Pedro Leal <rui.pedro.leal@gmail.com>)
List pgsql-jdbc
Hi, Rui,

I'm very interested in knowing your results.
Please, share your advances with us...

Regards,

Edson

Em 06/11/2013 01:46, Rui Pedro Leal escreveu:
Hi Kevin, 

I've tested that "-Dorg.postgresql.forcebinary=true” options with no apparent benefits.

Just recapping ...

Using postgres 9.1.2 + postgis .... with the 9.2-1003-jdbc4 driver.

For the sake of having consistent result, I'm now just focused now on running a createStatement / executeQuery with no cursor or fetchSize stuff (as per Dave suggestion).

As i've presented in my previous code, i'm measuring the process from the beginning to the end (always accumulating time) and measuring connection creation, statement creation (i was using preparedStatement previously), query execution and result set iteration + resultSet.getXXX.

So currently, i'm getting the following results (last execution after arriving at home):

[QUERY 12] - connection in 8ms.
[QUERY 12] - statement in 8ms.
[QUERY 12] - done/execution in 3035ms.
[QUERY 12] - iterated in 4381ms.

If i recall correctly, since i'm not using fetchSize all results should be loaded to memory, if there is space (around 190k of them).

Comparing to pgAdmin, where executing the query and presenting all the results is around ~2663ms (last execution after arriving at home).

So, my conclusions so far ... to which i welcome feedback or pointing out my utterly stupidity ;)

1) The performance difference on the execution is for me quite expected, not that abnormal and almost negligible (from my previous experience with JDBC on Oracle and MySQL).
2) However, the step of the iteration/gets on the resultSet (190k) .... which is mostly all resultSet.getXXX (please check my original code / post) is an additional 1.3s. If the query results are in fact in memory, it is strange and puzzling to me.

In the end, i'm getting an additional 1.7s performance hit vs. a simple query and return results of pgAdmin.

Tomorrow i will follow Dave Cramer suggestion and turn on log_duration on the server and check in more detail the server logs. Will report back.

Any additional feedback or suggestion?

Thanks everyone for the promptly feedback ...

Regards, 

Rui Leal

[] Rui Pedro Leal

On 2013/11/05, at 18:45, Kevin Wooten wrote:

Well yes and no.  The driver may be transferring your data in binary already but we should work out if that is actually true.  I believe that option "-Dorg.postgresql.forcebinary=boolean still works and forces an Extended Query which will allow binary transfers.

Try running with that option and see if it changes the timings.

On Nov 5, 2013, at 11:27 AM, Rui Pedro Leal <rui.pedro.leal@gmail.com> wrote:

Hello Kevin, 

That's exactly my immediate (and hopefully wrong) conclusion: the getXXX calls are really costly!

So, you are proposing using the binary protocol. I've never known the existence of that. Is this it: http://wiki.postgresql.org/wiki/JDBC-BinaryTransfer ?

Thanks.

[] Rui Pedro Leal

On 2013/11/05, at 18:14, Kevin Wooten wrote:

Well I can speak from experience…

When I was comparing the performance of my implementation with pgjdbc I was very disheartened at the beginning as pgjdbc was seemingly much faster.  It was only after I switched the test to take into account the getXXX calls that the true cost of using the driver was discovered.  If you just execute a query and ignore the results pgjdbc is essentially just receiving a stream of text.  It ignores all of the work of turning that text into Java objects/primitives.

My implementation uses binary protocol for this reason and that removes a lot of the overhead that pgjdbc has.  If pgAdmin is using the binary protocol as well this could very well be the difference that is being seen, because it is a significant amount of time.

On Nov 5, 2013, at 11:03 AM, Dave Cramer <pg@fastcrypt.com> wrote:

I do not know, the best way to figure this out is to use explain analyze in pg_admin.

What I am debating is the java code is using a cursor. pgAdmin is not.

Try it without setFetchSize()

Dave Cramer

dave.cramer(at)credativ(dot)ca
http://www.credativ.ca


On Tue, Nov 5, 2013 at 1:00 PM, Rui Pedro Leal <rui.pedro.leal@gmail.com> wrote:
Dave,

I understand what you are saying, but i must get the results, right? :) 

I think in pgAdmin the total query time, presented by the query execution windows, is also considering the rendering / presenting of the query results, so this is comparable ... right?

[] Rui Pedro Leal

On 2013/11/05, at 17:34, Dave Cramer wrote:

Rui Leal,

You shouldn't see that much of a penatly, but I question how you are measuring it. Since you are timing the loop which is using cursors.

Just run a simple query and time it without iterating through the result set. That would be equivalent 

Dave Cramer

dave.cramer(at)credativ(dot)ca
http://www.credativ.ca


On Tue, Nov 5, 2013 at 12:16 PM, Rui Pedro Leal <rui.pedro.leal@gmail.com> wrote:
Hello Dave, thanks for the quick reply.

I'm just executing the same query on pgAdmin query interface. Just a way to measure performance of the query + getting results vs the JDBC way.

I'm using fetchsize in order to get some faster results when iterating the resultSet. I've tried not to use fetchSize/cursor but the performance is naturally slower (and worser than the simple pgAdmin query).

What i'm asking is .. is this performance penalty i'm getting using postgres JDBC normal and within what's expected? I'm i doing something wrong? Can i do it in any other way?

Regards, 

Rui Leal

[] Rui Pedro Leal


On 2013/11/05, at 16:43, Dave Cramer wrote:

How can you measure the same query in pgAdmin ? JDBC will use a cursor when you set the fetchsize.

Dave Cramer

dave.cramer(at)credativ(dot)ca
http://www.credativ.ca


On Tue, Nov 5, 2013 at 11:12 AM, Rui Pedro Leal <rui.pedro.leal@gmail.com> wrote:
Just an quick info update: i'm using the postgresql-9.2-1003-jdbc4 driver

On 2013/11/05, at 16:09, Rui Pedro Leal wrote:

> Hello everyone,
>
> I'm having slow JDBC performance on iterating a resultSet obtained from a somewhat simple query.
>
> The query, although using PostGIS functions, is pretty straight forward:
>
> SELECT _id, _spatial_type, ST_AsBinary(_geometry), _attribute, _count, _references, countpersons, countfatals, countdrunks, density
> FROM accidents_5
> WHERE (_geometry && ST_MakeEnvelope(-126.60644531250001,26.43122806450644,-63.369140625,52.96187505907603, 4326) )
> ORDER BY _pk_id
>
> The results are the following:
> - Executing the query directly from pgAdmin: ~2807ms
> - Executing from JVM + JDBC: 4184ms
>
> The code i'm executing is pretty much standard:
>
> -------------------- code --------------------
>
>       public static void main(final String[] args) throws Exception {
>
>               String sql = "SELECT _id, _spatial_type, ST_AsBinary(_geometry), _attribute, _count, _references, countpersons, countfatals, countdrunks, density";
>               sql += " FROM accidents_5";
>               sql += " WHERE (_geometry && ST_MakeEnvelope(-126.60644531250001,26.43122806450644,-63.369140625,52.96187505907603, 4326) )";
>               sql += " ORDER BY _pk_id";
>
>               System.out.println(sql);
>
>               Long time = System.currentTimeMillis();
>               try {
>                       Connection connection = DataStoreInfo.getDataStores().get(0);
>                       connection.setAutoCommit(false);
>
>                       System.out.println("[QUERY " + Thread.currentThread().getId()
>                                       + "] - connection in "
>                                       + (System.currentTimeMillis() - time) + "ms.");
>
>                       Statement st = connection.createStatement(
>                                       ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY,
>                                       ResultSet.CLOSE_CURSORS_AT_COMMIT);
>                       st.setFetchSize(250);
>
>                       System.out.println("[QUERY " + Thread.currentThread().getId()
>                                       + "] - statement in " + (System.currentTimeMillis() - time)
>                                       + "ms.");
>
>                       ResultSet resultSet = st.executeQuery(sql);
>
>                       System.out.println("[QUERY " + Thread.currentThread().getId()
>                                       + "] - done in " + (System.currentTimeMillis() - time)
>                                       + "ms.");
>
>                       GeometryFactory geofact = new GeometryFactory(new PrecisionModel(),4326);
>                       WKBReader wkbReader = new WKBReader(geofact);
>
>                       String id;
>                       String spatialType;
>                       Geometry geometry;
>                       String attribute;
>                       Long count;
>                       String reference;
>                       Map<String, Double> properties;
>                       String granularSynthString = "GranularSynthesis";
>
>                       while (resultSet.next()) {
>                               id = resultSet.getString(1);
>                               spatialType = resultSet.getString(2);
>
>                               // geometry = wkbReader.read(resultSet.getBytes(3)); // ignored the WKBReader and the results are about the same.
>                               attribute = resultSet.getString(4);
>                               count = resultSet.getLong(5);
>                               reference = resultSet.getString(6);
>
>                               properties = new HashMap<String, Double>();
>                               Double aux = resultSet.getDouble(7);
>                               properties.put("countpersons", aux);
>                               aux = resultSet.getDouble(8);
>                               properties.put("countfatals", aux);
>                               aux = resultSet.getDouble(9);
>                               properties.put("countdrunks", aux);
>                               aux = resultSet.getDouble(10);
>                               properties.put("density", aux);
>                       }
>
>                       System.out.println("[QUERY " + Thread.currentThread().getId()
>                                       + "] - done & iterated in "
>                                       + (System.currentTimeMillis() - time) + "ms.");
>
>                       resultSet.close();
>                       st.close();
>                       connection.commit();
>                       connection.close();
>               } catch (SQLException exception) {
>                       exception.printStackTrace();
>               // } catch (ParseException exception) {  // ignored from WKBreader
>               // exception.printStackTrace();
>               }
>
>               System.out.println("[End " + Thread.currentThread().getId()
>                               + "] - done in " + (System.currentTimeMillis() - time) + "ms.");
>       }
>
> -------------------- end code --------------------
>
> Although i'm executing this on slow MacbookPro (2.6 core duo, 2Gbs RAM but SSD) and have a 9.1.2 postgres, i've also tested this on a recent retina MBP and the ratio between pgAdmin and JDBC execution is similar.
>
> Is this expected? Can someone point if i'm doing something terrible wrong?
>
> I'm not concerned about the query performance per-se (i know it CAN be optimized), but the differences just using JDBC and iterating the resultSet are really annoying.
>
> Thanks in advance for any help.
>
> Kind regards,
>
> Rui Leal
>
>



--
Sent via pgsql-jdbc mailing list (pgsql-jdbc@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-jdbc










pgsql-jdbc by date:

Previous
From: Craig Ringer
Date:
Subject: 404 on 9.2 docs
Next
From: Dave Cramer
Date:
Subject: Re: JDBC 9.3 released