Re: Bugs in 7.1beta JDBC driver - Mailing list pgsql-interfaces
From | Barry Lind |
---|---|
Subject | Re: Bugs in 7.1beta JDBC driver |
Date | |
Msg-id | 3A5FC0B8.DCD5980E@xythos.com Whole thread Raw |
In response to | Re: Bugs in 7.1beta JDBC driver (Bruce Momjian <pgman@candle.pha.pa.us>) |
List | pgsql-interfaces |
In looking at the current code from CVS I don't see any changes that relate to my original bug report. It appears that this is still broken. What files where patched and what versions? thanks, --Barry Bruce Momjian wrote: > > Yesterday, I applied a patch to jdbc that fixes this particular area. > Can you grab the current snapshot and let me know if this is still a > problem? > > --------------------------------------------------------------------------- > > I recently pulled and built the 7.1beta1 code and ran our regression > tests against this version. Our app is java based and uses the JDBC > drivers for client access. Almost all of our regression tests failed. > In looking into this further there are some significant problems with > the latest JDBC code. > > Specifically the code that has been added to provide caching of byte[]'s > doesn't work correctly. This code makes some assumptions that are not > valid, specifically it assumes that when a new SQL statement is > executed, all previously used byte[]'s are no longer in use and can be > reused. Specifically the pg_stream.deallocate() call that appears in > the Connection.ExecSQL() method: > > // Deallocate all resources in the stream associated > // with a previous request. > // This will let the driver reuse byte arrays that has already > // been allocated instead of allocating new ones in order > // to gain performance improvements. > pg_stream.deallocate(); > > The assumption here is invalid. For example if code does the following: > > 1) execute a query A > 2) for each row of the result set execute a different query B > > When the query B is executed for the first row of query A, the > deallocate() method gets called which causes *all* byte[]'s originating > from the pg_stream object (which includes the row data in the ResultSet > object) to be reclaimed and reused, including those associated with the > subsequent rows of query A. > > So what ends up happening is that the data in query A's result set > starts to get corrupted as the underlying byte[]'s get reused by > subsequent queries. > > Believe me this causes all sorts of problems. > > A similar problem can be found with the getBytes() method of result > set. This method returns the underlying byte[] to the caller. The > caller could keep this object around for well past the next ExecSQL call > and find that the value gets corrupted as it is being reused in > subsequent queries. > > In addition to the above bugs with this caching code, there are other > significant problems as well. The code as written never frees up any of > these cached objects. This causes significant problems when connections > are long lived (i.e. in a connection pool). This results in essentially > a memory leak. To illustrate the problem lets say a connection > performed the following queries: > > select 10_byte_char_column from foo; > select 11_byte_char_column from bar; > select 12_byte_char_column from foo_bar; > select 13_byte_char_column from foo_foo; > ... > > further assume that each query returns 1000 rows. > After the first query runs the cache will contain 1000 byte[10] arrays = > 10K. > After the second query runs the cache will contain 1000 byte[10] arrays > + 1000 byte[11] arrays = 21K. > After the third query runs the cache will contain 1000 byte[10] arrays + > 1000 byte[11] + 1000 byte[12] arrays = 33K. > ... > > As you can see, over time it can easily be the case that the amount of > memory eaten up could be large, especially if lots of queries are being > run that return thousands of rows of differing sized values. While it > is true the example above is worst case (i.e. it doesn't show any reuse > of what is in the cache), this worst case makes the entire caching > feature as it is currently written a problem for me as my connection > pool logic will keep these connections open for long periods of time > during which they will be performing a wide variety of differing sql > statements, many of which will return large numbers of rows. > > Finally, my app does a lot of queries that select values that are > 256 > bytes long. The current cache implementation doesn't help in these > cases at all (as it only caches arrays up to 256 bytes), but it is with > these large byte[]'s where the overhead of allocating and then garbage > collecting them is greatest and could most take advantage caching > functionality. > > I have looked at the source code and thought about how to fix the above > problems, and I have not been able to come up with anything that works > well. While is certainly would be possible to code something that > solves these problems, I fear that the result would actually be slower > than the current situation in 7.0 where no cache exists and lots of > arrays are allocated and then reclaimed by the garbage collector. > > I would suggest that the cache code be pulled out for 7.1beta2, until a > better overall solution is found. > > thanks, > --Barry > > [ Charset UTF-8 unsupported, skipping... ] > > -- > Bruce Momjian | http://candle.pha.pa.us > pgman@candle.pha.pa.us | (610) 853-3000 > + If your life is a hard drive, | 830 Blythe Avenue > + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
pgsql-interfaces by date: