Thread: Selecting large objects stored as bytea
Hello, I have the following problem accessing postgres (version: 8.2.X) from java 1.6_03 using driver (currently version: postgresql-8.2-508.jdbc4.jar)and hope someone can help me. I have created an table 'files' with the columns 'file_id', 'filename'and 'file_content'. 'file_content'is of type bytea and should store large binary files. A method for storing filesin that table also exists and work fine. Now I am trying to select a stored 'file_content' from the table and store it directly on the filesystem. Because the filescould be very large, i don't want to hold them in the main memory -- so I am using Input- and Outputstreams (see sourcecode below). When i am now selecting a file_content (of only 50 MByte -- according to the pgsql documentation files of 1 GByte are supported)the running of the source code below results in a 'Out of Memory Exception' at the command line 'ResultSet rs = statement.executeQuery(selectClause);' After increase of HEAP in java it works, but also the java needs much memory, as I don't expect it. I have also tried tosubstitute this command line by prepared-statement like 'PreparedStatement ps = this.dbConnection.prepareStatement(selectClause);' Do have anyone a solution for that problem? Thanks for any help. Regards, Ludger Zachewitz SOURCE-CODE: public void selectLargeObject(String selectClause, File file) throws SQLException, IOException { InputStream in = null; this.dbConnection.setAutoCommit(false); Statement statement = this.dbConnection.createStatement( ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY); ResultSet rs = statement.executeQuery(selectClause); if (rs != null) { while (rs.next()) { in = rs.getBinaryStream('file_content'); } FileOutputStream fos = new FileOutputStream(file); int readBytes = 0; byte[] value = new byte[1024]; while ((readBytes = in.read(value)) > 0) { fos.write(value, 0, readBytes); } fos.flush(); fos.close(); rs.close(); } statement.close(); } -- GMX startet ShortView.de. Hier findest Du Leute mit Deinen Interessen! Jetzt dabei sein: http://www.shortview.de/?mc=sv_ext_mf@gmx
Ludger Zachewitz wrote: > Hello, > > I have the following problem accessing postgres (version: 8.2.X) from > java 1.6_03 using driver (currently version: > postgresql-8.2-508.jdbc4.jar) and hope someone can help me. I have > created an table 'files' with the columns 'file_id', 'filename' and > 'file_content'. 'file_content'is of type bytea and should store large > binary files. A method for storing files in that table also exists > and work fine. > > Now I am trying to select a stored 'file_content' from the table and > store it directly on the filesystem. Because the files could be very > large, i don't want to hold them in the main memory -- so I am using > Input- and Outputstreams (see source code below). If you don't want the whole field at once, you don't want a bytea. The large-object interface offers lo_read/lo_write to let you grab data in smaller chunks. See the manuals for details, I'm afraid I'm not sure of the procedure for accessing lo_read from JDBC. -- Richard Huxton Archonet Ltd
Ludger Zachewitz wrote: > 'ResultSet rs = statement.executeQuery(selectClause);' > > After increase of HEAP in java it works, but also the java > needs much memory, as I don't expect it. I have also > tried to substitute this command line by prepared-statement > like 'PreparedStatement ps = > this.dbConnection.prepareStatement(selectClause);' > > Do have anyone a solution for that problem? You could use the function below that breaks a bytea value into pieces of 'chunksize' length and returns them as a set of rows. Syntax of call: SELECT * FROM chunks((SELECT subquery that returns one bytea column), 1024*1024) CREATE OR REPLACE FUNCTION chunks (contents bytea,chunksize int) RETURNS SETOF bytea AS $$ DECLARE length int; current int; chunk bytea; BEGIN IF contents IS NULL THEN RETURN NEXT NULL; RETURN; END IF; SELECT octet_length(contents) INTO length; current:=1; LOOP SELECT substring(contents FROM current FOR chunksize) INTO chunk; RETURN NEXT chunk; current:=current+chunksize; EXIT WHEN current>=length; END LOOP; RETURN; END; $$ language 'plpgsql'; Another option would be not to use that function, but instead implement its logic in your client-side code (multiple SELECTs in a loop). I expect this would lessen the server-side memory consumption. Best regards, -- Daniel PostgreSQL-powered mail user agent and storage: http://www.manitou-mail.org
On Fri, 22 Feb 2008, Daniel Verite wrote: > You could use the function below that breaks a bytea value into > pieces of 'chunksize' length and returns them as a set of rows. When you do this then make sure that this column has external storage: alter column [column_name] set storage external; Then getting this chunks will be fast. But be aware that external storage does not compress data, so you would need more space for database if your data is compressible. Also you'd need to dump, truncate and restore this table, as storage option is only honored for new rows. Maybe just "cluster [table_name]" will do, I don't know. Regards Tometzky -- ...although Eating Honey was a very good thing to do, there was a moment just before you began to eat it which was better than when you were... Winnie the Pooh