Thread: BLOB is read into memory instead of streaming (bug?)

BLOB is read into memory instead of streaming (bug?)

From
Andrew Perepelytsya
Date:
PostgreSQL 8.3.1
Win XP SP2
JDK 6
JDBC drivers: 8.3.603.<tried them all from this release>

I'm trying to stream a BLOB from the database, and expected to get it
via rs.getBinaryStream(1), but the execution fails without reaching
this point:

org.postgresql.util.PSQLException: Ran out of memory retrieving query
results.
Exception: java.lang.OutOfMemoryError: Java heap space
Stack Trace:
java.lang.OutOfMemoryError: Java heap space
    at org.postgresql.core.PGStream.ReceiveTupleV3(PGStream.java:349)
    at
org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:
1306)
    at
org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:
192)
    at
org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:
451)
    at
org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:
350)
    at
org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:
254)
    at org.mule.galaxy.JDBCBlobTest.testRetrieveBlob(JDBCBlobTest.java:
42)

The test method is trivial:

public void testRetrieveBlob() throws Exception
    {
        System.out.println("Retrieveing BLOB");

        Class.forName("org.postgresql.Driver");
        Connection conn =
DriverManager.getConnection("jdbc:postgresql://localhost/test",
"postgres", "postgres");

        PreparedStatement ps = conn.prepareStatement("SELECT
binval_data FROM jackrabbit_x0020_core_binval WHERE binval_id = ?");
        ps.setString(1, "some_file_name");
        ResultSet rs = ps.executeQuery();  <<<<<< Fails here with OOME
        assertTrue(rs.next());

        InputStream is = rs.getBinaryStream(1);
        assertTrue(is.available() > 0);

        rs.close();
        ps.close();
    }

Upload through the driver was fine - got a 70MB file in without
problems (just can't get it out of db now).

The table structure is generated by a JCR implementation (Jackrabbit
1.4.1), here's a reverse script according to pgAdmin III:

CREATE TABLE jackrabbit_x0020_core_binval
(
  binval_id character varying NOT NULL,
  binval_data bytea NOT NULL
)
WITH (OIDS=FALSE);


I also noticed that a select blob statement works the same with
pgAdmin (reads it all into memory), but it could be the same
programming error/limitation.

The expected behavior is to execute a statement and get a ref to the
blob's stream, read it from there, which doesn't work yet
unfortunately.

Any thoughts are more than welcome?

Thanks,
Andrew

Re: BLOB is read into memory instead of streaming (bug?)

From
Kris Jurka
Date:

On Wed, 30 Apr 2008, Andrew Perepelytsya wrote:

> I'm trying to stream a BLOB from the database, and expected to get it
> via rs.getBinaryStream(1), but the execution fails without reaching
> this point:
>
> org.postgresql.util.PSQLException: Ran out of memory retrieving query
> results.
> Exception: java.lang.OutOfMemoryError: Java heap space
> Stack Trace:
> java.lang.OutOfMemoryError: Java heap space
>     at org.postgresql.core.PGStream.ReceiveTupleV3(PGStream.java:349)
>
> CREATE TABLE jackrabbit_x0020_core_binval
> (
>  binval_id character varying NOT NULL,
>  binval_data bytea NOT NULL
> )
> WITH (OIDS=FALSE);
>
>
> The expected behavior is to execute a statement and get a ref to the
> blob's stream, read it from there, which doesn't work yet
> unfortunately.
>

There are two methods to store binary data in pg and they have different
access methods and performance characteristics.  Bytea data is expected to
be shorter and is returned in whole with a ResultSet by the server.  For
larger data you want to use large objects which return a pointer (oid) to
the actual data which you can then stream from the server at will.

This page describes some of the differences between the two and
demonstrates using a pg specific api to access large objects, but
getBlob/setBlob will work just fine.

http://jdbc.postgresql.org/documentation/83/binary-data.html

Kris Jurka

Re: BLOB is read into memory instead of streaming (bug?)

From
robert engels
Date:
This seems like a very bad impl - at least for JDBC.

Why are the details of this access not hidden in the JDBC driver? The
column type is the only thing that a user should be concerned with.

Why would someone want to code proprietary Postgres code just to
access BLOBs?

The JDBC blob API is very good. Using either the BLOB/locator
interface or the getInputStream();

Oracle used to do a similar thing, and finally got with the program
and uses strict JDBC in their latest driver releases.


Re: BLOB is read into memory instead of streaming (bug?)

From
Kris Jurka
Date:

robert engels wrote:
> This seems like a very bad impl - at least for JDBC.
>
> Why are the details of this access not hidden in the JDBC driver? The
> column type is the only thing that a user should be concerned with.
>
> Why would someone want to code proprietary Postgres code just to access
> BLOBs?
>
> The JDBC blob API is very good. Using either the BLOB/locator interface
> or the getInputStream();
>

I think you've misunderstood me.  The documentation shows using a
proprietary API, but get/setBlob works just fine.  I pointed to the
documentation because it explains some of the differences between the
bytea and large object datatypes.  It's really that the documentation
needs an additional example for the standard blob usage.

Kris Jurka


Re: BLOB is read into memory instead of streaming (bug?)

From
robert engels
Date:
That's good to know.

The spec allows BLOBs to be read using getBytes() and getBinaryStream
() as well.

getBinaryStream should allow bytea to be read without allocating an
array to hold all of the data.

BUT, the low-level db protocol would need to support reading the
column in chunks.

On May 1, 2008, at 6:44 PM, Kris Jurka wrote:

>
>
> robert engels wrote:
>> This seems like a very bad impl - at least for JDBC.
>> Why are the details of this access not hidden in the JDBC driver?
>> The column type is the only thing that a user should be concerned
>> with.
>> Why would someone want to code proprietary Postgres code just to
>> access BLOBs?
>> The JDBC blob API is very good. Using either the BLOB/locator
>> interface or the getInputStream();
>
> I think you've misunderstood me.  The documentation shows using a
> proprietary API, but get/setBlob works just fine.  I pointed to the
> documentation because it explains some of the differences between
> the bytea and large object datatypes.  It's really that the
> documentation needs an additional example for the standard blob usage.
>
> Kris Jurka
>