Home > mailing lists

Re: Bugs in 7.1beta JDBC driver - Mailing list pgsql-interfaces

From	Barry Lind
Subject	Re: Bugs in 7.1beta JDBC driver
Date	January 12, 2001 21:31:08
Msg-id	3A5FC0B8.DCD5980E@xythos.com Whole thread Raw
In response to	Re: Bugs in 7.1beta JDBC driver (Bruce Momjian <pgman@candle.pha.pa.us>)
List	pgsql-interfaces

Tree view

In looking at the current code from CVS I don't see any changes that
relate to my original bug report.  It appears that this is still
broken.  What files where patched and what versions?

thanks,
--Barry

Bruce Momjian wrote:
> 
> Yesterday, I applied a patch to jdbc that fixes this particular area.
> Can you grab the current snapshot and let me know if this is still a
> problem?
> 
> ---------------------------------------------------------------------------
> 
> I recently pulled and built the 7.1beta1 code and ran our regression
> tests against this version.  Our app is java based and uses the JDBC
> drivers for client access.  Almost all of our regression tests failed.
> In looking into this further there are some significant problems with
> the latest JDBC code.
> 
> Specifically the code that has been added to provide caching of byte[]'s
> doesn't work correctly.  This code makes some assumptions that are not
> valid, specifically it assumes that when a new SQL statement is
> executed, all previously used byte[]'s are no longer in use and can be
> reused.  Specifically the pg_stream.deallocate() call that appears in
> the Connection.ExecSQL() method:
> 
>             // Deallocate all resources in the stream associated
>             // with a previous request.
>             // This will let the driver reuse byte arrays that has already
>             // been allocated instead of allocating new ones in order
>             // to gain performance improvements.
>             pg_stream.deallocate();
> 
> The assumption here is invalid.  For example if code does the following:
> 
> 1) execute a query A
> 2) for each row of the result set execute a different query B
> 
> When the query B is executed for the first row of query A, the
> deallocate() method gets called which causes *all* byte[]'s originating
> from the pg_stream object (which includes the row data in the ResultSet
> object) to be reclaimed and reused, including those associated with the
> subsequent rows of query A.
> 
> So what ends up happening is that the data in query A's result set
> starts to get corrupted as the underlying byte[]'s get reused by
> subsequent queries.
> 
> Believe me this causes all sorts of problems.
> 
> A similar problem can be found with the getBytes() method of result
> set.  This method returns the underlying byte[] to the caller.  The
> caller could keep this object around for well past the next ExecSQL call
> and find that the value gets corrupted as it is being reused in
> subsequent queries.
> 
> In addition to the above bugs with this caching code, there are other
> significant problems as well.  The code as written never frees up any of
> these cached objects.  This causes significant problems when connections
> are long lived (i.e. in a connection pool).  This results in essentially
> a memory leak.  To illustrate the problem lets say a connection
> performed the following queries:
> 
> select 10_byte_char_column from foo;
> select 11_byte_char_column from bar;
> select 12_byte_char_column from foo_bar;
> select 13_byte_char_column from foo_foo;
> ...
> 
> further assume that each query returns 1000 rows.
> After the first query runs the cache will contain 1000 byte[10] arrays =
> 10K.
> After the second query runs the cache will contain 1000 byte[10] arrays
> + 1000 byte[11] arrays = 21K.
> After the third query runs the cache will contain 1000 byte[10] arrays +
> 1000 byte[11] + 1000 byte[12] arrays = 33K.
> ...
> 
> As you can see, over time it can easily be the case that the amount of
> memory eaten up could be large, especially if lots of queries are being
> run that return thousands of rows of differing sized values.  While it
> is true the example above is worst case (i.e. it doesn't show any reuse
> of what is in the cache), this worst case makes the entire caching
> feature as it is currently written a problem for me as my connection
> pool logic will keep these connections open for long periods of time
> during which they will be performing a wide variety of differing sql
> statements, many of which will return large numbers of rows.
> 
> Finally, my app does a lot of queries that select values that are > 256
> bytes long.  The current cache implementation doesn't help in these
> cases at all (as it only caches arrays up to 256 bytes), but it is with
> these large byte[]'s where the overhead of allocating and then garbage
> collecting them is greatest and could most take advantage caching
> functionality.
> 
> I have looked at the source code and thought about how to fix the above
> problems, and I have not been able to come up with anything that works
> well.  While is certainly would be possible to code something that
> solves these problems, I fear that the result would actually be slower
> than the current situation in 7.0 where no cache exists and lots of
> arrays are allocated and then reclaimed by the garbage collector.
> 
> I would suggest that the cache code be pulled out for 7.1beta2, until a
> better overall solution is found.
> 
> thanks,
> --Barry
> 
> [ Charset UTF-8 unsupported, skipping... ]
> 
> --
>   Bruce Momjian                        |  http://candle.pha.pa.us
>   pgman@candle.pha.pa.us               |  (610) 853-3000
>   +  If your life is a hard drive,     |  830 Blythe Avenue
>   +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

pgsql-interfaces by date:

From: Barry Lind
Date: 12 January 2001, 21:21:27
Subject: Patch for JDBC timestamp problems

From: Bruce Momjian
Date: 13 January 2001, 00:16:49
Subject: Re: [PATCHES] Patch for JDBC timestamp problems

Re: Bugs in 7.1beta JDBC driver - Mailing list pgsql-interfaces

Previous

Next