Re: java.lang.OutOfMemoryError - Mailing list pgsql-jdbc

From Matt Mello
Subject Re: java.lang.OutOfMemoryError
Date
Msg-id 3E3F475A.1050302@spaceship.com
Whole thread Raw
In response to java.lang.OutOfMemoryError  (Paul Cullum <pcullum@N-able.com>)
Responses Re: java.lang.OutOfMemoryError  (Matt Mello <alien@spaceship.com>)
Re: java.lang.OutOfMemoryError  (Matt Mello <alien@spaceship.com>)
List pgsql-jdbc
Paul Cullum wrote:
 > What
 > is the proper method for using JDBC to query result sets which are
 > potentially large?

I had a similar problem, but I never got an exception.  I will bring
that up in another thread.

I didn't see anyone else more knowledgable respond to this, so I will
give it a shot.

 From what I have read in the JDBC driver, it pulls the entire resultset
across the socket, caching it locally in a vector of rows [or something
to that effect].

I, too, used a driver that fetched at need [Informix], and not all at
execution time.  It was very nice.  :)

 From what I have seen on this maillist, there are possibly some
developers working on enhancing the drivers to allow fetch-at-need
[however, I'm not sure if the backend supports it].

Anyway, I believe the current method for handling this is to use the
LIMIT/OFFSET clause in your PG SQL command.  See:
http://www.ca.postgresql.org/users-lounge/docs/7.3/postgres/sql-select.html#SQL-LIMIT

With that clause, you can control exactly which records you want for the
query.

Problems with this?  Possibly...

I'm thinking that since every "chunk" you ask for is actually a
different query, your results might not be properly contiguous between
queries. For example, someone could insert or delete a record that will
or would have been in the middle of the data you are looking for.  This
could cause your next "chunk" to contain a duplicate record from the
first chunk (in the case of an insert by someone else), or to skip a
record (in the case of a deletion).  It may be possible to control this
situation by doing all chunk queries in a single transaction, but I'm
not sure.  Perhaps someone more knowledgable can talk about that.

For those of us converting from other jdbc drivers that DID
fetch-at-need, this is truly troublesome.  We may have coded more lax
systems that allow users to execute queries that could crash our entire
VM under PG-JDBC!  [I have a screen that allows the user to input
several variables for a search query.  It is impossible at times for me
to know how large the resultset might be.]  If you give your users ways
to query your system, then with PG, you have to be very careful how much
power they are given.  Sometimes it is very hard to prevent large
queries from squeaking by.  This makes a fully-read-execution jdbc
driver a deadly weapon.

Anybody have any ideas about what we can do to speed the development of
a fetch-at-need jdbc driver?

Can we patch / extend the driver to allow us to set an absolute maximum
resultset length on a query and alert us when it would have been
exceeded WITHOUT it causing our system to go haywire / crash / get OOME?

--
Matt Mello


pgsql-jdbc by date:

Previous
From: Kris Jurka
Date:
Subject: Re: DatabaseMetaData oddities
Next
From: Matt Mello
Date:
Subject: Re: Unusable resultset [Viruschecked]