Thread: Re: [PATCHES] Re: Fixes and enhancements to JDBC driver (take 2)

Re: [PATCHES] Re: Fixes and enhancements to JDBC driver (take 2)

From
Barry Lind
Date:
Gunner,

Do your fixes address the concerns I raised on 12/21?  (I have included
that email to the list below).  To summarize the three major
concerns/bugs were:
1) Code incorrectly deallocates when a new statement is executed, even
though the byte[]s are still being used.
2) The cache isn't limited in size, resulting in essentially memory
leaks for long lived connections in a connection pool.
3) The implementation is limited to a max 256 byte byte[], whereas my
queries have many values that exceed this size, and the current
implementation doesn't lend itself well (because of #2) to cache things
upto 8K in size.


I have worked some on coming up with a different implementation that
addresses all of the above issues, however I am not sure it is the best
either.  I can share that with you if you are interested.

After working on it for a while, I have now come to the conclusion that
perhaps the best solution here is to avoid using byte[]s at all, and
instead use Strings.  Since for the 90% case all of the column results
are converted to Strings eventually anyway, why keep around a binary
representation?  Why not convert them to Strings immediately as the data
is read from the socket?  Of course this won't work for binary cursors,
but that could be special cased.  If I have time this weekend, I may
pursue this.

thanks,
--Barry



I recently pulled and built the 7.1beta1 code and ran our regression
tests against this version.  Our app is java based and uses the JDBC
drivers for client access.  Almost all of our regression tests failed. 
In looking into this further there are some significant problems with
the latest JDBC code.

Specifically the code that has been added to provide caching of byte[]'s
doesn't work correctly.  This code makes some assumptions that are not
valid, specifically it assumes that when a new SQL statement is
executed, all previously used byte[]'s are no longer in use and can be
reused.  Specifically the pg_stream.deallocate() call that appears in
the Connection.ExecSQL() method:
           // Deallocate all resources in the stream associated           // with a previous request.           // This
willlet the driver reuse byte arrays that has
 
already           // been allocated instead of allocating new ones in order           // to gain performance
improvements.          pg_stream.deallocate();         
 

The assumption here is invalid.  For example if code does the following:

1) execute a query A
2) for each row of the result set execute a different query B

When the query B is executed for the first row of query A, the
deallocate() method gets called which causes *all* byte[]'s originating
from the pg_stream object (which includes the row data in the ResultSet
object) to be reclaimed and reused, including those associated with the
subsequent rows of query A.  

So what ends up happening is that the data in query A's result set
starts to get corrupted as the underlying byte[]'s get reused by
subsequent queries.

Believe me this causes all sorts of problems.

A similar problem can be found with the getBytes() method of result
set.  This method returns the underlying byte[] to the caller.  The
caller could keep this object around for well past the next ExecSQL call
and find that the value gets corrupted as it is being reused in
subsequent queries.


In addition to the above bugs with this caching code, there are other
significant problems as well.  The code as written never frees up any of
these cached objects.  This causes significant problems when connections
are long lived (i.e. in a connection pool).  This results in essentially
a memory leak.  To illustrate the problem lets say a connection
performed the following queries:

select 10_byte_char_column from foo;
select 11_byte_char_column from bar;
select 12_byte_char_column from foo_bar;
select 13_byte_char_column from foo_foo;
...

further assume that each query returns 1000 rows.  
After the first query runs the cache will contain 1000 byte[10] arrays =
10K.
After the second query runs the cache will contain 1000 byte[10] arrays
+ 1000 byte[11] arrays = 21K.
After the third query runs the cache will contain 1000 byte[10] arrays +
1000 byte[11] + 1000 byte[12] arrays = 33K.
...

As you can see, over time it can easily be the case that the amount of
memory eaten up could be large, especially if lots of queries are being
run that return thousands of rows of differing sized values.  While it
is true the example above is worst case (i.e. it doesn't show any reuse
of what is in the cache), this worst case makes the entire caching
feature as it is currently written a problem for me as my connection
pool logic will keep these connections open for long periods of time
during which they will be performing a wide variety of differing sql
statements, many of which will return large numbers of rows.

Finally, my app does a lot of queries that select values that are > 256
bytes long.  The current cache implementation doesn't help in these
cases at all (as it only caches arrays up to 256 bytes), but it is with
these large byte[]'s where the overhead of allocating and then garbage
collecting them is greatest and could most take advantage caching
functionality.

I have looked at the source code and thought about how to fix the above
problems, and I have not been able to come up with anything that works
well.  While is certainly would be possible to code something that
solves these problems, I fear that the result would actually be slower
than the current situation in 7.0 where no cache exists and lots of
arrays are allocated and then reclaimed by the garbage collector.

I would suggest that the cache code be pulled out for 7.1beta2, until a
better overall solution is found.

thanks,
--Barry

Gunnar R|nning wrote:
> 
> Richard Bullington-McGuire <rbulling@microstate.com> writes:
> 
> > * A serious off-by-one error existed in both of the BytePoolDim classes in
> > the org.postgresql.PG_Stream class. On a request for a byte pool for the
> > maximum size, the old code would generate an
> > ArrayIndexOutOfBoundsException. I fixed this by increasing the size of
> > the array by one more than the maximum buffer size, and altering the
> > supporting methods to initialize and clean up the expanded array.
> 
> This BytePoolDim code got other problems as well. I posted another
> implementation of the same pooling idea back in October, but it never got
> in to replace the current buggy code. As I wrote in pgsql-general earlier
> today, I will look into creating a new patch with another pooling
> implementation that should work a lot better.
> 
> regards,
> 
>         Gunnar


Re: [PATCHES] Re: Fixes and enhancements to JDBC driver (take 2)

From
Gunnar R|nning
Date:
Barry Lind <barry@xythos.com> writes:

> Gunner,
> 
> Do your fixes address the concerns I raised on 12/21?  (I have included
> that email to the list below).  To summarize the three major
> concerns/bugs were:
> 1) Code incorrectly deallocates when a new statement is executed, even
> though the byte[]s are still being used.
> 2) The cache isn't limited in size, resulting in essentially memory
> leaks for long lived connections in a connection pool.
> 3) The implementation is limited to a max 256 byte byte[], whereas my
> queries have many values that exceed this size, and the current
> implementation doesn't lend itself well (because of #2) to cache things
> upto 8K in size.

The original patch that I supplied was a proof of concept on what kind of
performance improvements that could be made by reusing byte arrays. This
was unfortunately committed before anybody but me had done any testing at
all on it. 

The most serious problem with this code was your issue 1). Number 2) and 3)
should be easy to handle has config parameters. The reason for hardcoding
3) to 256 was simply because I found this to be the most optimal value for
the web application I was doing the testing on.

Eventually, it should be configurable whether to use the byte[] caching
implementation or not, as the perfomance of memory allocation may vary
greatly depending on VM and OS implementations.

If you go back to the October archives of pgsql-general you will find a
pointer to my second shot at an implementation - this one fix your issue 1)
but not the others.

I would like to see what you have been working on as well, so we can come
up with the best of breeds solution. 

Regards, 
Gunnar


Re: [PATCHES] Re: Fixes and enhancements to JDBC driver (take 2)

From
Peter T Mount
Date:
Quoting Gunnar R|nning <gunnar@candleweb.no>:

> Barry Lind <barry@xythos.com> writes:
> 
> > Gunner,
> > 
> > Do your fixes address the concerns I raised on 12/21?  (I have
> included
> > that email to the list below).  To summarize the three major
> > concerns/bugs were:
> > 1) Code incorrectly deallocates when a new statement is executed,
> even
> > though the byte[]s are still being used.
> > 2) The cache isn't limited in size, resulting in essentially memory
> > leaks for long lived connections in a connection pool.
> > 3) The implementation is limited to a max 256 byte byte[], whereas my
> > queries have many values that exceed this size, and the current
> > implementation doesn't lend itself well (because of #2) to cache
> things
> > upto 8K in size.
> 
> The original patch that I supplied was a proof of concept on what kind
> of
> performance improvements that could be made by reusing byte arrays.
> This
> was unfortunately committed before anybody but me had done any testing
> at
> all on it. 
> 
> The most serious problem with this code was your issue 1). Number 2) and
> 3)
> should be easy to handle has config parameters. The reason for
> hardcoding
> 3) to 256 was simply because I found this to be the most optimal value
> for
> the web application I was doing the testing on.
> 
> Eventually, it should be configurable whether to use the byte[] caching
> implementation or not, as the perfomance of memory allocation may vary
> greatly depending on VM and OS implementations.

Now we use ANT this is extremely easy to do. Also, I have made some changes, in 
that your supplied classes have moved to a new package org.postgresql.core, and 
extend an interface ObjectPool. I did this so that it would be easier to change 
the implementations without having to hack the main code too much.

Although your patch is in there, I've disabled the part that frees the arrays 
(as that was the bit causing problems). Hopefully...

> 
> If you go back to the October archives of pgsql-general you will find a
> pointer to my second shot at an implementation - this one fix your issue
> 1)
> but not the others.
> 
> I would like to see what you have been working on as well, so we can
> come up with the best of breeds solution. 

In cvs as of yesterday...

Peter

-- 
Peter Mount peter@retep.org.uk
PostgreSQL JDBC Driver: http://www.retep.org.uk/postgres/
RetepPDF PDF library for Java: http://www.retep.org.uk/pdf/


Re: Re: [PATCHES] Re: Fixes and enhancements to JDBC driver (take 2)

From
Bruce Momjian
Date:
Gunnar, everyone, can people check the CVS or snapshot and make sure you
are happy.  This has been a confusing item for me, and I want to make
sure we have it working as intended.

Thanks.

[ Charset ISO-8859-1 unsupported, converting... ]
> Quoting Gunnar R|nning <gunnar@candleweb.no>:
> 
> > Barry Lind <barry@xythos.com> writes:
> > 
> > > Gunner,
> > > 
> > > Do your fixes address the concerns I raised on 12/21?  (I have
> > included
> > > that email to the list below).  To summarize the three major
> > > concerns/bugs were:
> > > 1) Code incorrectly deallocates when a new statement is executed,
> > even
> > > though the byte[]s are still being used.
> > > 2) The cache isn't limited in size, resulting in essentially memory
> > > leaks for long lived connections in a connection pool.
> > > 3) The implementation is limited to a max 256 byte byte[], whereas my
> > > queries have many values that exceed this size, and the current
> > > implementation doesn't lend itself well (because of #2) to cache
> > things
> > > upto 8K in size.
> > 
> > The original patch that I supplied was a proof of concept on what kind
> > of
> > performance improvements that could be made by reusing byte arrays.
> > This
> > was unfortunately committed before anybody but me had done any testing
> > at
> > all on it. 
> > 
> > The most serious problem with this code was your issue 1). Number 2) and
> > 3)
> > should be easy to handle has config parameters. The reason for
> > hardcoding
> > 3) to 256 was simply because I found this to be the most optimal value
> > for
> > the web application I was doing the testing on.
> > 
> > Eventually, it should be configurable whether to use the byte[] caching
> > implementation or not, as the perfomance of memory allocation may vary
> > greatly depending on VM and OS implementations.
> 
> Now we use ANT this is extremely easy to do. Also, I have made some changes, in 
> that your supplied classes have moved to a new package org.postgresql.core, and 
> extend an interface ObjectPool. I did this so that it would be easier to change 
> the implementations without having to hack the main code too much.
> 
> Although your patch is in there, I've disabled the part that frees the arrays 
> (as that was the bit causing problems). Hopefully...
> 
> > 
> > If you go back to the October archives of pgsql-general you will find a
> > pointer to my second shot at an implementation - this one fix your issue
> > 1)
> > but not the others.
> > 
> > I would like to see what you have been working on as well, so we can
> > come up with the best of breeds solution. 
> 
> In cvs as of yesterday...
> 
> Peter
> 
> -- 
> Peter Mount peter@retep.org.uk
> PostgreSQL JDBC Driver: http://www.retep.org.uk/postgres/
> RetepPDF PDF library for Java: http://www.retep.org.uk/pdf/
> 


--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: Re: [PATCHES] Re: Fixes and enhancements to JDBC driver (take 2)

From
Gunnar R|nning
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:

> Gunnar, everyone, can people check the CVS or snapshot and make sure you
> are happy.  This has been a confusing item for me, and I want to make
> sure we have it working as intended.
> 

The code in CVS is the correct version - including some rearrangements by
Peter. I still think we should have the byte[] pooling as a config option,
because it is not obvious that it will help performance on all platforms -
it depends on the JVM implementation you use. 

regards, 
Gunnar