Re: setBlob loop performance? - Mailing list pgsql-jdbc

From David Wall
Subject Re: setBlob loop performance?
Date
Msg-id 004201c24e4a$c191cee0$3201a8c0@expertrade.com
Whole thread Raw
In response to setBlob loop performance?  ("David Wall" <dwall@Yozons.com>)
List pgsql-jdbc
> Both of these issues look like bugs to me.  Please submit a patch.  I
> agree that using a 4 or 8k buffer for reading/writing between the
> streams is much better than how the code is currently implemented.
>  Also, setBlob should be calling close on the input stream when it is
done.

Oddly enough, I don't have the capability to test this just yet, but I
thought I'd share the code as I think would work better for
org.postgresql.jdbc2.PreparedStatement.  Of course, if anybody sees anything
screwy, please let me know.  The main changes are the closing of the two
streams in the finally block so that even if an exception is thrown, the
streams are closed, and the primary loop attempts to read/write 4k blocks.
I hope to be able to test shortly, but we're in the process of converting
the database to 7.2.2 and the sources from getBytes/setBytes to
getBlob/setBlob (we thought this would be easier than converting existing
OIDs to bytea in the db).

David


 public void setBlob(int i, Blob x) throws SQLException
 {
  InputStream l_inStream = x.getBinaryStream();
  LargeObjectManager lom = connection.getLargeObjectAPI();
  int oid = lom.create();
  LargeObject lob = lom.open(oid);
  OutputStream los = lob.getOutputStream();
  byte[] buf = new byte[4096];
  try
  {
   // could be buffered, but then the OutputStream returned by LargeObject
   // is buffered internally anyhow, so there would be no performance
   // boost gained, if anything it would be worse!
   int bytesRemaining = (int)x.length();
   int numRead = l_inStream.read(buf,0,Math.min(buf.length,bytesRemaining));
   while (numRead != -1 && bytesRemaining > 0)
   {
      bytesRemaining -= numRead;
      los.write(buf,0,numRead);
      numRead = l_inStream.read(buf,0,Math.min(buf.length,bytesRemaining));
   }
  }
  catch (IOException se)
  {
   throw new PSQLException("postgresql.unusual", se);
  }
  finally
  {
      try
      {
         los.close();
         l_inStream.close();
      }
      catch( Exception e ) {}
  }
  // lob is closed by the stream so don't call lob.close()
  setInt(i, oid);
 }


> Is there anything else that can be done to improve this.  I have always
> found the jdbc spec for Blobs to be limited in the blob creation area.
>  Is there anything you would like to see in the driver to make this
> easier for you?

I don't think so, but it's interesting to see the API and wonder if anybody
actually coded around it.  For example, assume you have a nice InputStream
to a blob on disk.  Getting that into a Blob field would be a pain since
you'd have to wrap the InputStream in your object that implements Blob and
then have that interface drive the blob insertion.  If you already have a
byte array, you still have to wrap it insert it as a blob.  I suppose a
really smart JDBC could use setBytes()/getBytes() and simply handle the
conversions internally if the data field was actually a blob and not a
longvarbinary.

It's also interesting that the interface returns all numbers as a long,
including the length, but the read requires it to be an int, meaning you can
attempt to suck in an entire blob in a single call to getBytes().  Oh
well...

David

> David Wall wrote:
>
> >In the 7.2.2 codeset, PreparedStatement.setBlob() shows a loop as it
reads a
> >byte from the input stream (the blob) and writes it to the output stream
> >(PG's LO routines).
> >
> >This seems highly inefficient since most large objects are, well,
large...
> >So if I want to insert a 1MB image, this will loop a million times.  Is
> >there a reason it's not read in chunks (even a 4096 sized array would
reduce
> >such a loop down to 250 iterations)?
> >
> >This is much worse than the 7.1 code which simply took my byte array and
> >wrote it all to the LargeObject stream in one call.
> >
> >+++
> >        public void setBlob(int i, Blob x) throws SQLException
> >        {
> >                InputStream l_inStream = x.getBinaryStream();
> >                int l_length = (int) x.length();
> >                LargeObjectManager lom = connection.getLargeObjectAPI();
> >                int oid = lom.create();
> >                LargeObject lob = lom.open(oid);
> >                OutputStream los = lob.getOutputStream();
> >                try
> >                {
> >                        // could be buffered, but then the OutputStream
> >returned by LargeObject
> >                        // is buffered internally anyhow, so there would
be
> >no performance
> >                        // boost gained, if anything it would be worse!
> >                        int c = l_inStream.read();
> >                        int p = 0;
> >                        while (c > -1 && p < l_length)
> >                        {
> >                                los.write(c);
> >                                c = l_inStream.read();
> >                                p++;
> >                        }
> >                        los.close();
> >                }
> >                catch (IOException se)
> >                {
> >                        throw new PSQLException("postgresql.unusual",
se);
> >                }
> >                // lob is closed by the stream so don't call lob.close()
> >                setInt(i, oid);
> >        }
> >
> >+++
> >
> >Since the getBinaryStream() returns an InputStream, should this routine
> >close that inputstream once it's done, or does the Blob itself have to
> >somehow know that a stream it creates can be closed and discarded (and if
> >so, how?)?


pgsql-jdbc by date:

Previous
From: chris@creatop.com.au
Date:
Subject: Re: JDBC Problem with Timestamptz
Next
From: Barry Lind
Date:
Subject: Re: setBlob loop performance?