Thread: ResultSet construction and byte arrays

ResultSet construction and byte arrays

From

Jan Oksfeldt Jonasen

Date:

06 February 2004, 08:27:59

I can see from the archives this has been up more than once, but I'd still
like to take a stab at it.
Construction of ResultSets based on searches with many results or based on
large tables does have some issues, the most noticable one being loading
everything found into memory as byte arrays.

I'm currently using PostgreSQL in the version created by dbExperts (7.3.2)
with a win2k database server and a Java client application. I experienced
some serious performance issues when using these searches with somewhat
large result sets, so decided to use a profiler and the latest driver
source from CVS. The testing showed there was two major points: Garbage
collection and data loading from the server.

Looking at the source it turns out the problems come from the PGStream
class methods:
public byte[][] ReceiveTupleV2(int nf, boolean bin)
and
public byte[] Receive(int siz)
First one creating two dimentional byte arrays, second one dimentional.
Seemingly a huge amount is created, in a test on a table with 67 columns
and ~15000 tuples nearly 2 million byte arrays are created (operation takes
20+ seconds), all these are put into a Vector which in turn ends up in a
ResultSet implementation I believe to be a AbstractJdbc1ResultSet.

This is a lot of data to keep in memory. Since it's possible to have
multiple search results open (eg. swing client view) these arrays does
become a major problem until ResultSet.close() is called which nulls out
the vector. Once this happens the garbage collector kicks in and again
lowers performance, albeit for a short while.

Based on the source I've tried a few things to better this, for instance
adding a byte array pool for both the one and two dimentionals, but while
it certainly speeds up the loading time with reuse over the new operator
and less GC action, it doesn't help on memory consumption which eventually
crashes the VM.

Are there any ideas on how to fix this? Are there plans to look into it?

On Oracle (same client, a tad different code) I had a major problem using
ResultSet.absolute() since Oracle only have forward cursors natively, the
cursor would behind the scene be rolled forward, flipping over and forward
if you where to go backwards with absolute(). For instance, being at row
10.000 and going back to 9.999 would have the cursor scroll forward, flip
and forward. This created a huge amount of temporary object creation,
besides this operation just being painfully slow.

To fix this I created wrappers for multiple java.sql interfaces, this way
making sure to have full control over which ResultSet implementation was
being used. The idea was to utillize an expression similiar to (taken from
an article at OTN):
from (
  select name.*, rownum rnum
  from ( <everything in a query, including order by clause> ) name
  where rownum <= MAX_ROWS
)
where rnum >= MIN_ROWS

With the wrappers I could make sure only a certain number of rows where
found, by maintaining some state on current min and max rows. The downside
to this solution is additional queries, but still a lot faster than the
cursor movement.

If it's possible to make an expression like the above in Postgres, I think
that could also be the solution to not loading everything in a result set.
The result set should be enabled to re-query the database once some row
above its max was requested. So, how would I make such an expression?

As a last alternative, are there any commercial drivers available which
aren't just wrappers for the standard one? The ones I can search out at Sun
just seems to be.


Best regards
Jan Oksfeldt Jonasen

Re: ResultSet construction and byte arrays

From

Dave Cramer

Date:

06 February 2004, 08:50:33

Jan,

Have you tried the latest 7.4 driver, it implements setFetchSize which
will scroll a cursor behind the scenes. I'm not sure it will move
backward though?

As far as your question regarding queries goes, have a look at the
postgres LIMIT clause.


Dave
On Fri, 2004-02-06 at 07:25, Jan Oksfeldt Jonasen wrote:
> I can see from the archives this has been up more than once, but I'd still
> like to take a stab at it.
> Construction of ResultSets based on searches with many results or based on
> large tables does have some issues, the most noticable one being loading
> everything found into memory as byte arrays.
>
> I'm currently using PostgreSQL in the version created by dbExperts (7.3.2)
> with a win2k database server and a Java client application. I experienced
> some serious performance issues when using these searches with somewhat
> large result sets, so decided to use a profiler and the latest driver
> source from CVS. The testing showed there was two major points: Garbage
> collection and data loading from the server.
>
> Looking at the source it turns out the problems come from the PGStream
> class methods:
> public byte[][] ReceiveTupleV2(int nf, boolean bin)
> and
> public byte[] Receive(int siz)
> First one creating two dimentional byte arrays, second one dimentional.
> Seemingly a huge amount is created, in a test on a table with 67 columns
> and ~15000 tuples nearly 2 million byte arrays are created (operation takes
> 20+ seconds), all these are put into a Vector which in turn ends up in a
> ResultSet implementation I believe to be a AbstractJdbc1ResultSet.
>
> This is a lot of data to keep in memory. Since it's possible to have
> multiple search results open (eg. swing client view) these arrays does
> become a major problem until ResultSet.close() is called which nulls out
> the vector. Once this happens the garbage collector kicks in and again
> lowers performance, albeit for a short while.
>
> Based on the source I've tried a few things to better this, for instance
> adding a byte array pool for both the one and two dimentionals, but while
> it certainly speeds up the loading time with reuse over the new operator
> and less GC action, it doesn't help on memory consumption which eventually
> crashes the VM.
>
> Are there any ideas on how to fix this? Are there plans to look into it?
>
> On Oracle (same client, a tad different code) I had a major problem using
> ResultSet.absolute() since Oracle only have forward cursors natively, the
> cursor would behind the scene be rolled forward, flipping over and forward
> if you where to go backwards with absolute(). For instance, being at row
> 10.000 and going back to 9.999 would have the cursor scroll forward, flip
> and forward. This created a huge amount of temporary object creation,
> besides this operation just being painfully slow.
>
> To fix this I created wrappers for multiple java.sql interfaces, this way
> making sure to have full control over which ResultSet implementation was
> being used. The idea was to utillize an expression similiar to (taken from
> an article at OTN):
> from (
>   select name.*, rownum rnum
>   from ( <everything in a query, including order by clause> ) name
>   where rownum <= MAX_ROWS
> )
> where rnum >= MIN_ROWS
>
> With the wrappers I could make sure only a certain number of rows where
> found, by maintaining some state on current min and max rows. The downside
> to this solution is additional queries, but still a lot faster than the
> cursor movement.
>
> If it's possible to make an expression like the above in Postgres, I think
> that could also be the solution to not loading everything in a result set.
> The result set should be enabled to re-query the database once some row
> above its max was requested. So, how would I make such an expression?
>
> As a last alternative, are there any commercial drivers available which
> aren't just wrappers for the standard one? The ones I can search out at Sun
> just seems to be.
>
>
> Best regards
> Jan Oksfeldt Jonasen
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings
>
--
Dave Cramer
519 939 0336
ICQ # 14675561

Re: ResultSet construction and byte arrays

From

Kris Jurka

Date:

06 February 2004, 08:53:56


On Fri, 6 Feb 2004, Jan Oksfeldt Jonasen wrote:

> I can see from the archives this has been up more than once, but I'd still
> like to take a stab at it.
> Construction of ResultSets based on searches with many results or based on
> large tables does have some issues, the most noticable one being loading
> everything found into memory as byte arrays.
>
> Are there any ideas on how to fix this? Are there plans to look into it?

The 7.4 driver (which can be used against older dbs) has support for using
cursors behind the scenes to retrieve large ResultSets when hinted to do
so via setFetchSize.  This only works for ResultSets of TYPE_FORWARD_ONLY
though.  I have noted the potential obstacles to making this functionality
available for scrollable result sets here:

http://archives.postgresql.org/pgsql-jdbc/2004-01/msg00288.php

> To fix this I created wrappers for multiple java.sql interfaces, this way
> making sure to have full control over which ResultSet implementation was
> being used. The idea was to utillize an expression similiar to (taken from
> an article at OTN):
> from (
>   select name.*, rownum rnum
>   from ( <everything in a query, including order by clause> ) name
>   where rownum <= MAX_ROWS
> )
> where rnum >= MIN_ROWS
>
> If it's possible to make an expression like the above in Postgres, I think
> that could also be the solution to not loading everything in a result set.
> The result set should be enabled to re-query the database once some row
> above its max was requested. So, how would I make such an expression?

See the LIMIT/OFFSET clause:
http://www.postgresql.org/docs/current/static/queries-limit.html

>
> As a last alternative, are there any commercial drivers available which
> aren't just wrappers for the standard one? The ones I can search out at Sun
> just seems to be.
>

There is another jdbc driver available here:

http://sourceforge.net/projects/jxdbcon/

but I have never used it.

Kris Jurka