Thread: ResultSet construction and byte arrays
I can see from the archives this has been up more than once, but I'd still like to take a stab at it. Construction of ResultSets based on searches with many results or based on large tables does have some issues, the most noticable one being loading everything found into memory as byte arrays. I'm currently using PostgreSQL in the version created by dbExperts (7.3.2) with a win2k database server and a Java client application. I experienced some serious performance issues when using these searches with somewhat large result sets, so decided to use a profiler and the latest driver source from CVS. The testing showed there was two major points: Garbage collection and data loading from the server. Looking at the source it turns out the problems come from the PGStream class methods: public byte[][] ReceiveTupleV2(int nf, boolean bin) and public byte[] Receive(int siz) First one creating two dimentional byte arrays, second one dimentional. Seemingly a huge amount is created, in a test on a table with 67 columns and ~15000 tuples nearly 2 million byte arrays are created (operation takes 20+ seconds), all these are put into a Vector which in turn ends up in a ResultSet implementation I believe to be a AbstractJdbc1ResultSet. This is a lot of data to keep in memory. Since it's possible to have multiple search results open (eg. swing client view) these arrays does become a major problem until ResultSet.close() is called which nulls out the vector. Once this happens the garbage collector kicks in and again lowers performance, albeit for a short while. Based on the source I've tried a few things to better this, for instance adding a byte array pool for both the one and two dimentionals, but while it certainly speeds up the loading time with reuse over the new operator and less GC action, it doesn't help on memory consumption which eventually crashes the VM. Are there any ideas on how to fix this? Are there plans to look into it? On Oracle (same client, a tad different code) I had a major problem using ResultSet.absolute() since Oracle only have forward cursors natively, the cursor would behind the scene be rolled forward, flipping over and forward if you where to go backwards with absolute(). For instance, being at row 10.000 and going back to 9.999 would have the cursor scroll forward, flip and forward. This created a huge amount of temporary object creation, besides this operation just being painfully slow. To fix this I created wrappers for multiple java.sql interfaces, this way making sure to have full control over which ResultSet implementation was being used. The idea was to utillize an expression similiar to (taken from an article at OTN): from ( select name.*, rownum rnum from ( <everything in a query, including order by clause> ) name where rownum <= MAX_ROWS ) where rnum >= MIN_ROWS With the wrappers I could make sure only a certain number of rows where found, by maintaining some state on current min and max rows. The downside to this solution is additional queries, but still a lot faster than the cursor movement. If it's possible to make an expression like the above in Postgres, I think that could also be the solution to not loading everything in a result set. The result set should be enabled to re-query the database once some row above its max was requested. So, how would I make such an expression? As a last alternative, are there any commercial drivers available which aren't just wrappers for the standard one? The ones I can search out at Sun just seems to be. Best regards Jan Oksfeldt Jonasen
Jan, Have you tried the latest 7.4 driver, it implements setFetchSize which will scroll a cursor behind the scenes. I'm not sure it will move backward though? As far as your question regarding queries goes, have a look at the postgres LIMIT clause. Dave On Fri, 2004-02-06 at 07:25, Jan Oksfeldt Jonasen wrote: > I can see from the archives this has been up more than once, but I'd still > like to take a stab at it. > Construction of ResultSets based on searches with many results or based on > large tables does have some issues, the most noticable one being loading > everything found into memory as byte arrays. > > I'm currently using PostgreSQL in the version created by dbExperts (7.3.2) > with a win2k database server and a Java client application. I experienced > some serious performance issues when using these searches with somewhat > large result sets, so decided to use a profiler and the latest driver > source from CVS. The testing showed there was two major points: Garbage > collection and data loading from the server. > > Looking at the source it turns out the problems come from the PGStream > class methods: > public byte[][] ReceiveTupleV2(int nf, boolean bin) > and > public byte[] Receive(int siz) > First one creating two dimentional byte arrays, second one dimentional. > Seemingly a huge amount is created, in a test on a table with 67 columns > and ~15000 tuples nearly 2 million byte arrays are created (operation takes > 20+ seconds), all these are put into a Vector which in turn ends up in a > ResultSet implementation I believe to be a AbstractJdbc1ResultSet. > > This is a lot of data to keep in memory. Since it's possible to have > multiple search results open (eg. swing client view) these arrays does > become a major problem until ResultSet.close() is called which nulls out > the vector. Once this happens the garbage collector kicks in and again > lowers performance, albeit for a short while. > > Based on the source I've tried a few things to better this, for instance > adding a byte array pool for both the one and two dimentionals, but while > it certainly speeds up the loading time with reuse over the new operator > and less GC action, it doesn't help on memory consumption which eventually > crashes the VM. > > Are there any ideas on how to fix this? Are there plans to look into it? > > On Oracle (same client, a tad different code) I had a major problem using > ResultSet.absolute() since Oracle only have forward cursors natively, the > cursor would behind the scene be rolled forward, flipping over and forward > if you where to go backwards with absolute(). For instance, being at row > 10.000 and going back to 9.999 would have the cursor scroll forward, flip > and forward. This created a huge amount of temporary object creation, > besides this operation just being painfully slow. > > To fix this I created wrappers for multiple java.sql interfaces, this way > making sure to have full control over which ResultSet implementation was > being used. The idea was to utillize an expression similiar to (taken from > an article at OTN): > from ( > select name.*, rownum rnum > from ( <everything in a query, including order by clause> ) name > where rownum <= MAX_ROWS > ) > where rnum >= MIN_ROWS > > With the wrappers I could make sure only a certain number of rows where > found, by maintaining some state on current min and max rows. The downside > to this solution is additional queries, but still a lot faster than the > cursor movement. > > If it's possible to make an expression like the above in Postgres, I think > that could also be the solution to not loading everything in a result set. > The result set should be enabled to re-query the database once some row > above its max was requested. So, how would I make such an expression? > > As a last alternative, are there any commercial drivers available which > aren't just wrappers for the standard one? The ones I can search out at Sun > just seems to be. > > > Best regards > Jan Oksfeldt Jonasen > > > > ---------------------------(end of broadcast)--------------------------- > TIP 7: don't forget to increase your free space map settings > -- Dave Cramer 519 939 0336 ICQ # 14675561
On Fri, 6 Feb 2004, Jan Oksfeldt Jonasen wrote: > I can see from the archives this has been up more than once, but I'd still > like to take a stab at it. > Construction of ResultSets based on searches with many results or based on > large tables does have some issues, the most noticable one being loading > everything found into memory as byte arrays. > > Are there any ideas on how to fix this? Are there plans to look into it? The 7.4 driver (which can be used against older dbs) has support for using cursors behind the scenes to retrieve large ResultSets when hinted to do so via setFetchSize. This only works for ResultSets of TYPE_FORWARD_ONLY though. I have noted the potential obstacles to making this functionality available for scrollable result sets here: http://archives.postgresql.org/pgsql-jdbc/2004-01/msg00288.php > To fix this I created wrappers for multiple java.sql interfaces, this way > making sure to have full control over which ResultSet implementation was > being used. The idea was to utillize an expression similiar to (taken from > an article at OTN): > from ( > select name.*, rownum rnum > from ( <everything in a query, including order by clause> ) name > where rownum <= MAX_ROWS > ) > where rnum >= MIN_ROWS > > If it's possible to make an expression like the above in Postgres, I think > that could also be the solution to not loading everything in a result set. > The result set should be enabled to re-query the database once some row > above its max was requested. So, how would I make such an expression? See the LIMIT/OFFSET clause: http://www.postgresql.org/docs/current/static/queries-limit.html > > As a last alternative, are there any commercial drivers available which > aren't just wrappers for the standard one? The ones I can search out at Sun > just seems to be. > There is another jdbc driver available here: http://sourceforge.net/projects/jxdbcon/ but I have never used it. Kris Jurka