Thread: Selecting large objects stored as bytea

Selecting large objects stored as bytea

From
"Ludger Zachewitz"
Date:
Hello,

I have the following problem accessing postgres (version: 8.2.X) from java 1.6_03 using driver (currently version:
postgresql-8.2-508.jdbc4.jar)and hope someone can help me. I have created an table 'files' with the columns 'file_id',
'filename'and 'file_content'. 'file_content'is of type bytea and should store large binary files. A method for storing
filesin that table also exists and work fine. 

Now I am trying to select a stored 'file_content' from the table and store it directly on the filesystem. Because the
filescould be very large, i don't want to hold them in the main memory -- so I am using Input- and Outputstreams (see
sourcecode below). 

When i am now selecting a file_content (of only 50 MByte -- according to the pgsql documentation files of 1 GByte are
supported)the running of the source code below results in a 'Out of Memory Exception' at the command line 

'ResultSet rs = statement.executeQuery(selectClause);'

After increase of HEAP in java it works, but also the java needs much memory, as I don't expect it. I have also tried
tosubstitute this command line by prepared-statement like 
'PreparedStatement ps = this.dbConnection.prepareStatement(selectClause);'

Do have anyone a solution for that problem?

Thanks for any help.

Regards,
Ludger Zachewitz


SOURCE-CODE:


    public void selectLargeObject(String selectClause, File file)
           throws SQLException, IOException {
       InputStream in = null;

       this.dbConnection.setAutoCommit(false);
       Statement statement = this.dbConnection.createStatement(
               ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY);

       ResultSet rs = statement.executeQuery(selectClause);

       if (rs != null) {
           while (rs.next()) {
               in = rs.getBinaryStream('file_content');
           }

           FileOutputStream fos = new FileOutputStream(file);

           int readBytes = 0;
           byte[] value = new byte[1024];

           while ((readBytes = in.read(value)) > 0) {
               fos.write(value, 0, readBytes);
           }
           fos.flush();
           fos.close();
           rs.close();
       }
       statement.close();
   }


--
GMX startet ShortView.de. Hier findest Du Leute mit Deinen Interessen!
Jetzt dabei sein: http://www.shortview.de/?mc=sv_ext_mf@gmx

Re: Selecting large objects stored as bytea

From
Richard Huxton
Date:
Ludger Zachewitz wrote:
> Hello,
>
> I have the following problem accessing postgres (version: 8.2.X) from
> java 1.6_03 using driver (currently version:
> postgresql-8.2-508.jdbc4.jar) and hope someone can help me. I have
> created an table 'files' with the columns 'file_id', 'filename' and
> 'file_content'. 'file_content'is of type bytea and should store large
> binary files. A method for storing files in that table also exists
> and work fine.
>
> Now I am trying to select a stored 'file_content' from the table and
> store it directly on the filesystem. Because the files could be very
> large, i don't want to hold them in the main memory -- so I am using
> Input- and Outputstreams (see source code below).

If you don't want the whole field at once, you don't want a bytea. The
large-object interface offers lo_read/lo_write to let you grab data in
smaller chunks.

See the manuals for details, I'm afraid I'm not sure of the procedure
for accessing lo_read from JDBC.

--
   Richard Huxton
   Archonet Ltd

Re: Selecting large objects stored as bytea

From
"Daniel Verite"
Date:
    Ludger Zachewitz wrote:

> 'ResultSet rs = statement.executeQuery(selectClause);'
>
> After increase of HEAP in java it works, but also the java
> needs much memory, as I don't expect it. I have also
> tried to substitute this command line by prepared-statement
>  like 'PreparedStatement ps =
> this.dbConnection.prepareStatement(selectClause);'
>
> Do have anyone a solution for that problem?

You could use the function below that breaks a bytea value into pieces
of 'chunksize' length and returns them as a set of rows.
Syntax of call:
SELECT * FROM chunks((SELECT subquery that returns one bytea column),
1024*1024)

CREATE OR REPLACE FUNCTION chunks (contents bytea,chunksize int)
 RETURNS SETOF bytea AS $$
DECLARE
 length int;
 current int;
 chunk bytea;
BEGIN
  IF contents IS NULL THEN
    RETURN NEXT NULL;
    RETURN;
  END IF;
  SELECT octet_length(contents) INTO length;
  current:=1;
  LOOP
    SELECT substring(contents FROM current FOR chunksize) INTO chunk;
    RETURN NEXT chunk;
    current:=current+chunksize;
    EXIT WHEN current>=length;
  END LOOP;
  RETURN;
END;
$$ language 'plpgsql';

Another option would be not to use that function, but instead implement
its logic in your client-side code (multiple SELECTs in a loop). I
expect this would lessen the server-side memory consumption.

 Best regards,
--
 Daniel
 PostgreSQL-powered mail user agent and storage:
http://www.manitou-mail.org

Re: Selecting large objects stored as bytea

From
Tomasz Ostrowski
Date:
On Fri, 22 Feb 2008, Daniel Verite wrote:

> You could use the function below that breaks a bytea value into
> pieces of 'chunksize' length and returns them as a set of rows.

When you do this then make sure that this column has external
storage:
    alter column [column_name] set storage external;
Then getting this chunks will be fast.

But be aware that external storage does not compress data, so you
would need more space for database if your data is compressible. Also
you'd need to dump, truncate and restore this table, as storage
option is only honored for new rows. Maybe just "cluster
[table_name]" will do, I don't know.

Regards
Tometzky
--
...although Eating Honey was a very good thing to do, there was a
moment just before you began to eat it which was better than when you
were...
                                                      Winnie the Pooh