Thread: Patch to fix bug #6293 - regression in driver performance with regards to ResultSetMetaData-heavy workloads

Patch to fix bug #6293 - regression in driver performance with regards to ResultSetMetaData-heavy workloads

From

Steven Schlansker

Date:

09 February 2012, 20:19:08

Hi,

There is a bug report and associated mailing list thread
[JDBC] [BUGS] BUG #6293: JDBC driver performance, dated Nov 15 2011

>>>
>>> The following bug has been logged online:
>>>
>>> Bug reference:      6293
>>> PostgreSQL version: 9.1
>>> Description:        JDBC driver performance
>>> Details:
>>
>> The 9.1 JDBC driver was changed to try and fetch all metadata for the
>> entire resultset in one query instead of potentially issuing multiple
>> queries for each column.  So this change was supposed to improve things.
>>
>> Looking at the code, the caching pattern has changed slightly, so now it's
>> important to hold onto the same ResultSetMetaData instance.  That is you
>> need to do:

I have a proposed fix available as a pull request on GitHub.  The commit itself is here:
https://github.com/NessComputing/pgjdbc/commit/15dee25198c0a7a4d3bdeca2193a003d552fac2f

and the pull request complete with an in-progress discussion is here:
https://github.com/pgjdbc/pgjdbc/pull/1

I requested guidance on the mailing list last week on the best way to approach this problem,
but there were no responses, so I have changed the ResultSet to cache the MetaData instances.

As best as I can tell the MetaData is immutable, so hopefully there will be no ill effects from
caching instances.

I saw some discussion about licensing re: GitHub on the mailing list the other day, so to be
perfectly clear I am releasing this code to the pgsql-jdbc project under whatever terms they
so choose, or the public domain if that is the appropriate choice.

I hope this will be an example of how moving to GitHub pull requests can be a positive change :-)


I believe this fixes the referenced bug, and I've asked the original submitter to test out my change
to see if it fixes it for him.

Regards,
Steven Schlansker
Ness Computing

Re: Patch to fix bug #6293 - regression in driver performance with regards to ResultSetMetaData-heavy workloads

From

Dave Cramer

Date:

09 February 2012, 23:00:22

Steven,

I think this http://www.javaworld.com/javaworld/jw-02-2001/jw-0209-double.html
applies to your implementation ?

Dave Cramer

dave.cramer(at)credativ(dot)ca
http://www.credativ.ca



On Thu, Feb 9, 2012 at 7:01 PM, Steven Schlansker <steven@likeness.com> wrote:
> Hi,
>
> There is a bug report and associated mailing list thread
> [JDBC] [BUGS] BUG #6293: JDBC driver performance, dated Nov 15 2011
>
>>>>
>>>> The following bug has been logged online:
>>>>
>>>> Bug reference:      6293
>>>> PostgreSQL version: 9.1
>>>> Description:        JDBC driver performance
>>>> Details:
>>>
>>> The 9.1 JDBC driver was changed to try and fetch all metadata for the
>>> entire resultset in one query instead of potentially issuing multiple
>>> queries for each column.  So this change was supposed to improve things.
>>>
>>> Looking at the code, the caching pattern has changed slightly, so now it's
>>> important to hold onto the same ResultSetMetaData instance.  That is you
>>> need to do:
>
> I have a proposed fix available as a pull request on GitHub.  The commit itself is here:
> https://github.com/NessComputing/pgjdbc/commit/15dee25198c0a7a4d3bdeca2193a003d552fac2f
>
> and the pull request complete with an in-progress discussion is here:
> https://github.com/pgjdbc/pgjdbc/pull/1
>
> I requested guidance on the mailing list last week on the best way to approach this problem,
> but there were no responses, so I have changed the ResultSet to cache the MetaData instances.
>
> As best as I can tell the MetaData is immutable, so hopefully there will be no ill effects from
> caching instances.
>
> I saw some discussion about licensing re: GitHub on the mailing list the other day, so to be
> perfectly clear I am releasing this code to the pgsql-jdbc project under whatever terms they
> so choose, or the public domain if that is the appropriate choice.
>
> I hope this will be an example of how moving to GitHub pull requests can be a positive change :-)
>
>
> I believe this fixes the referenced bug, and I've asked the original submitter to test out my change
> to see if it fixes it for him.
>
> Regards,
> Steven Schlansker
> Ness Computing
>
>
> --
> Sent via pgsql-jdbc mailing list (pgsql-jdbc@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-jdbc

Re: Patch to fix bug #6293 - regression in driver performance with regards to ResultSetMetaData-heavy workloads

From

Steven Schlansker

Date:

10 February 2012, 00:20:27

On Feb 9, 2012, at 6:59 PM, Dave Cramer wrote:

> Steven,
>
> I think this http://www.javaworld.com/javaworld/jw-02-2001/jw-0209-double.html
> applies to your implementation ?
>

Hi,

I'm not sure how locking applies here.  There is no locking code that I see in any of the ResultSet classes, nor did I
addany, and it's my understanding that ResultSet instances themselves are not to be shared amongst threads.  So this is
notrelevant. 

Let me know if I've misunderstood,
Steven

>
>
> On Thu, Feb 9, 2012 at 7:01 PM, Steven Schlansker <steven@likeness.com> wrote:
>> Hi,
>>
>> There is a bug report and associated mailing list thread
>> [JDBC] [BUGS] BUG #6293: JDBC driver performance, dated Nov 15 2011
>>
>>>>>
>>>>> The following bug has been logged online:
>>>>>
>>>>> Bug reference:      6293
>>>>> PostgreSQL version: 9.1
>>>>> Description:        JDBC driver performance
>>>>> Details:
>>>>
>>>> The 9.1 JDBC driver was changed to try and fetch all metadata for the
>>>> entire resultset in one query instead of potentially issuing multiple
>>>> queries for each column.  So this change was supposed to improve things.
>>>>
>>>> Looking at the code, the caching pattern has changed slightly, so now it's
>>>> important to hold onto the same ResultSetMetaData instance.  That is you
>>>> need to do:
>>
>> I have a proposed fix available as a pull request on GitHub.  The commit itself is here:
>> https://github.com/NessComputing/pgjdbc/commit/15dee25198c0a7a4d3bdeca2193a003d552fac2f
>>
>> and the pull request complete with an in-progress discussion is here:
>> https://github.com/pgjdbc/pgjdbc/pull/1
>>
>> I requested guidance on the mailing list last week on the best way to approach this problem,
>> but there were no responses, so I have changed the ResultSet to cache the MetaData instances.
>>
>> As best as I can tell the MetaData is immutable, so hopefully there will be no ill effects from
>> caching instances.
>>
>> I saw some discussion about licensing re: GitHub on the mailing list the other day, so to be
>> perfectly clear I am releasing this code to the pgsql-jdbc project under whatever terms they
>> so choose, or the public domain if that is the appropriate choice.
>>
>> I hope this will be an example of how moving to GitHub pull requests can be a positive change :-)
>>
>>
>> I believe this fixes the referenced bug, and I've asked the original submitter to test out my change
>> to see if it fixes it for him.
>>
>> Regards,
>> Steven Schlansker
>> Ness Computing
>>
>>
>> --
>> Sent via pgsql-jdbc mailing list (pgsql-jdbc@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-jdbc

Re: Patch to fix bug #6293 - regression in driver performance with regards to ResultSetMetaData-heavy workloads

From

Till Toenges

Date:

10 February 2012, 01:40:22

On 2012-02-10 05:20, Steven Schlansker wrote:
> I'm not sure how locking applies here.  There is no locking code that
> I see in any of the ResultSet classes, nor did I add any, and it's my
> understanding that ResultSet instances themselves are not to be
> shared amongst threads.  So this is not relevant.
>
> Let me know if I've misunderstood, Steven

I don't know about sharing the result set. But the code is thread safe
as it is. At worst, it would issue a few unneccessary extra requests to
the server. But since the write to metaData is atomic, and it never gets
set to null anywhere, eventually every thread would have a valid
instance, just not neccessarily the same. Unless it is absolutely
required that every call getMetaData() returns the same instance, that
will be good enough.

Till

--
Kyon, Till Toenges, tt@kyon.de, http://kyon.de
Obergplatz 14, 47804 Krefeld, +49-2151-3620334

Re: Patch to fix bug #6293 - regression in driver performance with regards to ResultSetMetaData-heavy workloads

From

Steven Schlansker

Date:

10 February 2012, 03:48:24

(Switching to my non-corporate email address which hopefully is approved to post to this list without moderator
intervention)

On Feb 9, 2012, at 9:39 PM, Till Toenges wrote:

> On 2012-02-10 05:20, Steven Schlansker wrote:
>> I'm not sure how locking applies here.  There is no locking code that
>> I see in any of the ResultSet classes, nor did I add any, and it's my
>> understanding that ResultSet instances themselves are not to be
>> shared amongst threads.  So this is not relevant.
>>
>> Let me know if I've misunderstood, Steven
>
> I don't know about sharing the result set. But the code is thread safe as it is. At worst, it would issue a few
unneccessaryextra requests to the server. But since the write to metaData is atomic, and it never gets set to null
anywhere,eventually every thread would have a valid instance, just not neccessarily the same. Unless it is absolutely
requiredthat every call getMetaData() returns the same instance, that will be good enough. 

I am reading over AbstractJdbc2ResultSet, and it seems that the code as it exists today is very much *not* thread safe.
For a simple example, the instance variable this_row is modified extensively with no coordination or synchronization
whatsoever. So sharing a ResultSet across threads is already a very bad idea, it will be about as close to undefined
behavioras Java allows. 

So IMO whether getMetaData() is thread safe or not is irrelevant, as nothing else in the class is.  Making the entire
classthread safe would probably not be worth the effort (sharing a ResultSet between threads just sounds like a bad
idea,and the increased synchronization overhead would punish everyone who does it "right")

Re: Patch to fix bug #6293 - regression in driver performance with regards to ResultSetMetaData-heavy workloads

From

Kris Jurka

Date:

10 February 2012, 05:09:32

On Thu, 9 Feb 2012, Steven Schlansker wrote:

> So IMO whether getMetaData() is thread safe or not is irrelevant, as
> nothing else in the class is.  Making the entire class thread safe would
> probably not be worth the effort (sharing a ResultSet between threads
> just sounds like a bad idea, and the increased synchronization overhead
> would punish everyone who does it "right")

I concur that thread safety is not an issue here.  I have committed a
variant of your fix that moves the caching into AbstractJdbc2ResultSet
instead of having each concrete implementation class have its own copy of
that code.

Kris Jurka

Re: Patch to fix bug #6293 - regression in driver performance with regards to ResultSetMetaData-heavy workloads

From

th76

Date:

10 February 2012, 06:28:44

On 10 feb, 01:01, ste...@likeness.com (Steven Schlansker) wrote:
> Hi,
>
> There is a bug report and associated mailing list thread
> [JDBC] [BUGS] BUG #6293: JDBC driver performance, dated Nov 15 2011
>
>
>
> >>> The following bug has been logged online:
>
> >>> Bug reference:      6293
> >>> PostgreSQL version: 9.1
> >>> Description:        JDBC driver performance
> >>> Details:
>
> >> The 9.1 JDBC driver was changed to try and fetch all metadata for the
> >> entire resultset in one query instead of potentially issuing multiple
> >> queries for each column.  So this change was supposed to improve things.
>
> >> Looking at the code, the caching pattern has changed slightly, so now it's
> >> important to hold onto the same ResultSetMetaData instance.  That is you
> >> need to do:
>
> I have a proposed fix available as a pull request on GitHub.  The commit itself is
here:https://github.com/NessComputing/pgjdbc/commit/15dee25198c0a7a4d3bdec...
>
> and the pull request complete with an in-progress discussion is here:https://github.com/pgjdbc/pgjdbc/pull/1
>
> I requested guidance on the mailing list last week on the best way to approach this problem,
> but there were no responses, so I have changed the ResultSet to cache the MetaData instances.
>
> As best as I can tell the MetaData is immutable, so hopefully there will be no ill effects from
> caching instances.
>
> I saw some discussion about licensing re: GitHub on the mailing list the other day, so to be
> perfectly clear I am releasing this code to the pgsql-jdbc project under whatever terms they
> so choose, or the public domain if that is the appropriate choice.
>
> I hope this will be an example of how moving to GitHub pull requests can be a positive change :-)
>
> I believe this fixes the referenced bug, and I've asked the original submitter to test out my change
> to see if it fixes it for him.
>
> Regards,
> Steven Schlansker
> Ness Computing
>
> --
> Sent via pgsql-jdbc mailing list (pgsql-j...@postgresql.org)
> To make changes to your subscription:http://www.postgresql.org/mailpref/pgsql-jdbc

Hi Steven,

I've tested your patch and it works great!

Thanks,

Teun Hoogendoorn