Thread: BLOB is read into memory instead of streaming (bug?)
PostgreSQL 8.3.1 Win XP SP2 JDK 6 JDBC drivers: 8.3.603.<tried them all from this release> I'm trying to stream a BLOB from the database, and expected to get it via rs.getBinaryStream(1), but the execution fails without reaching this point: org.postgresql.util.PSQLException: Ran out of memory retrieving query results. Exception: java.lang.OutOfMemoryError: Java heap space Stack Trace: java.lang.OutOfMemoryError: Java heap space at org.postgresql.core.PGStream.ReceiveTupleV3(PGStream.java:349) at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java: 1306) at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java: 192) at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java: 451) at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java: 350) at org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java: 254) at org.mule.galaxy.JDBCBlobTest.testRetrieveBlob(JDBCBlobTest.java: 42) The test method is trivial: public void testRetrieveBlob() throws Exception { System.out.println("Retrieveing BLOB"); Class.forName("org.postgresql.Driver"); Connection conn = DriverManager.getConnection("jdbc:postgresql://localhost/test", "postgres", "postgres"); PreparedStatement ps = conn.prepareStatement("SELECT binval_data FROM jackrabbit_x0020_core_binval WHERE binval_id = ?"); ps.setString(1, "some_file_name"); ResultSet rs = ps.executeQuery(); <<<<<< Fails here with OOME assertTrue(rs.next()); InputStream is = rs.getBinaryStream(1); assertTrue(is.available() > 0); rs.close(); ps.close(); } Upload through the driver was fine - got a 70MB file in without problems (just can't get it out of db now). The table structure is generated by a JCR implementation (Jackrabbit 1.4.1), here's a reverse script according to pgAdmin III: CREATE TABLE jackrabbit_x0020_core_binval ( binval_id character varying NOT NULL, binval_data bytea NOT NULL ) WITH (OIDS=FALSE); I also noticed that a select blob statement works the same with pgAdmin (reads it all into memory), but it could be the same programming error/limitation. The expected behavior is to execute a statement and get a ref to the blob's stream, read it from there, which doesn't work yet unfortunately. Any thoughts are more than welcome? Thanks, Andrew
On Wed, 30 Apr 2008, Andrew Perepelytsya wrote: > I'm trying to stream a BLOB from the database, and expected to get it > via rs.getBinaryStream(1), but the execution fails without reaching > this point: > > org.postgresql.util.PSQLException: Ran out of memory retrieving query > results. > Exception: java.lang.OutOfMemoryError: Java heap space > Stack Trace: > java.lang.OutOfMemoryError: Java heap space > at org.postgresql.core.PGStream.ReceiveTupleV3(PGStream.java:349) > > CREATE TABLE jackrabbit_x0020_core_binval > ( > binval_id character varying NOT NULL, > binval_data bytea NOT NULL > ) > WITH (OIDS=FALSE); > > > The expected behavior is to execute a statement and get a ref to the > blob's stream, read it from there, which doesn't work yet > unfortunately. > There are two methods to store binary data in pg and they have different access methods and performance characteristics. Bytea data is expected to be shorter and is returned in whole with a ResultSet by the server. For larger data you want to use large objects which return a pointer (oid) to the actual data which you can then stream from the server at will. This page describes some of the differences between the two and demonstrates using a pg specific api to access large objects, but getBlob/setBlob will work just fine. http://jdbc.postgresql.org/documentation/83/binary-data.html Kris Jurka
This seems like a very bad impl - at least for JDBC. Why are the details of this access not hidden in the JDBC driver? The column type is the only thing that a user should be concerned with. Why would someone want to code proprietary Postgres code just to access BLOBs? The JDBC blob API is very good. Using either the BLOB/locator interface or the getInputStream(); Oracle used to do a similar thing, and finally got with the program and uses strict JDBC in their latest driver releases.
robert engels wrote: > This seems like a very bad impl - at least for JDBC. > > Why are the details of this access not hidden in the JDBC driver? The > column type is the only thing that a user should be concerned with. > > Why would someone want to code proprietary Postgres code just to access > BLOBs? > > The JDBC blob API is very good. Using either the BLOB/locator interface > or the getInputStream(); > I think you've misunderstood me. The documentation shows using a proprietary API, but get/setBlob works just fine. I pointed to the documentation because it explains some of the differences between the bytea and large object datatypes. It's really that the documentation needs an additional example for the standard blob usage. Kris Jurka
That's good to know. The spec allows BLOBs to be read using getBytes() and getBinaryStream () as well. getBinaryStream should allow bytea to be read without allocating an array to hold all of the data. BUT, the low-level db protocol would need to support reading the column in chunks. On May 1, 2008, at 6:44 PM, Kris Jurka wrote: > > > robert engels wrote: >> This seems like a very bad impl - at least for JDBC. >> Why are the details of this access not hidden in the JDBC driver? >> The column type is the only thing that a user should be concerned >> with. >> Why would someone want to code proprietary Postgres code just to >> access BLOBs? >> The JDBC blob API is very good. Using either the BLOB/locator >> interface or the getInputStream(); > > I think you've misunderstood me. The documentation shows using a > proprietary API, but get/setBlob works just fine. I pointed to the > documentation because it explains some of the differences between > the bytea and large object datatypes. It's really that the > documentation needs an additional example for the standard blob usage. > > Kris Jurka >