Re: setBlob loop performance? - Mailing list pgsql-jdbc

From Barry Lind
Subject Re: setBlob loop performance?
Date
Msg-id 3D6C87E0.6000502@xythos.com
Whole thread Raw
In response to setBlob loop performance?  ("David Wall" <dwall@Yozons.com>)
List pgsql-jdbc
David,

Could you resend this in a diff -c format.  Since I don't know the exact
version you have it is difficult to pick out the changes.  A diff would
make that much easier.

thanks,
--Barry

David Wall wrote:

>>Both of these issues look like bugs to me.  Please submit a patch.  I
>>agree that using a 4 or 8k buffer for reading/writing between the
>>streams is much better than how the code is currently implemented.
>> Also, setBlob should be calling close on the input stream when it is
>>
>>
>done.
>
>Oddly enough, I don't have the capability to test this just yet, but I
>thought I'd share the code as I think would work better for
>org.postgresql.jdbc2.PreparedStatement.  Of course, if anybody sees anything
>screwy, please let me know.  The main changes are the closing of the two
>streams in the finally block so that even if an exception is thrown, the
>streams are closed, and the primary loop attempts to read/write 4k blocks.
>I hope to be able to test shortly, but we're in the process of converting
>the database to 7.2.2 and the sources from getBytes/setBytes to
>getBlob/setBlob (we thought this would be easier than converting existing
>OIDs to bytea in the db).
>
>David
>
>
> public void setBlob(int i, Blob x) throws SQLException
> {
>  InputStream l_inStream = x.getBinaryStream();
>  LargeObjectManager lom = connection.getLargeObjectAPI();
>  int oid = lom.create();
>  LargeObject lob = lom.open(oid);
>  OutputStream los = lob.getOutputStream();
>  byte[] buf = new byte[4096];
>  try
>  {
>   // could be buffered, but then the OutputStream returned by LargeObject
>   // is buffered internally anyhow, so there would be no performance
>   // boost gained, if anything it would be worse!
>   int bytesRemaining = (int)x.length();
>   int numRead = l_inStream.read(buf,0,Math.min(buf.length,bytesRemaining));
>   while (numRead != -1 && bytesRemaining > 0)
>   {
>      bytesRemaining -= numRead;
>      los.write(buf,0,numRead);
>      numRead = l_inStream.read(buf,0,Math.min(buf.length,bytesRemaining));
>   }
>  }
>  catch (IOException se)
>  {
>   throw new PSQLException("postgresql.unusual", se);
>  }
>  finally
>  {
>      try
>      {
>         los.close();
>         l_inStream.close();
>      }
>      catch( Exception e ) {}
>  }
>  // lob is closed by the stream so don't call lob.close()
>  setInt(i, oid);
> }
>
>
>
>
>>Is there anything else that can be done to improve this.  I have always
>>found the jdbc spec for Blobs to be limited in the blob creation area.
>> Is there anything you would like to see in the driver to make this
>>easier for you?
>>
>>
>
>I don't think so, but it's interesting to see the API and wonder if anybody
>actually coded around it.  For example, assume you have a nice InputStream
>to a blob on disk.  Getting that into a Blob field would be a pain since
>you'd have to wrap the InputStream in your object that implements Blob and
>then have that interface drive the blob insertion.  If you already have a
>byte array, you still have to wrap it insert it as a blob.  I suppose a
>really smart JDBC could use setBytes()/getBytes() and simply handle the
>conversions internally if the data field was actually a blob and not a
>longvarbinary.
>
>It's also interesting that the interface returns all numbers as a long,
>including the length, but the read requires it to be an int, meaning you can
>attempt to suck in an entire blob in a single call to getBytes().  Oh
>well...
>
>David
>
>
>
>>David Wall wrote:
>>
>>
>>
>>>In the 7.2.2 codeset, PreparedStatement.setBlob() shows a loop as it
>>>
>>>
>reads a
>
>
>>>byte from the input stream (the blob) and writes it to the output stream
>>>(PG's LO routines).
>>>
>>>This seems highly inefficient since most large objects are, well,
>>>
>>>
>large...
>
>
>>>So if I want to insert a 1MB image, this will loop a million times.  Is
>>>there a reason it's not read in chunks (even a 4096 sized array would
>>>
>>>
>reduce
>
>
>>>such a loop down to 250 iterations)?
>>>
>>>This is much worse than the 7.1 code which simply took my byte array and
>>>wrote it all to the LargeObject stream in one call.
>>>
>>>+++
>>>       public void setBlob(int i, Blob x) throws SQLException
>>>       {
>>>               InputStream l_inStream = x.getBinaryStream();
>>>               int l_length = (int) x.length();
>>>               LargeObjectManager lom = connection.getLargeObjectAPI();
>>>               int oid = lom.create();
>>>               LargeObject lob = lom.open(oid);
>>>               OutputStream los = lob.getOutputStream();
>>>               try
>>>               {
>>>                       // could be buffered, but then the OutputStream
>>>returned by LargeObject
>>>                       // is buffered internally anyhow, so there would
>>>
>>>
>be
>
>
>>>no performance
>>>                       // boost gained, if anything it would be worse!
>>>                       int c = l_inStream.read();
>>>                       int p = 0;
>>>                       while (c > -1 && p < l_length)
>>>                       {
>>>                               los.write(c);
>>>                               c = l_inStream.read();
>>>                               p++;
>>>                       }
>>>                       los.close();
>>>               }
>>>               catch (IOException se)
>>>               {
>>>                       throw new PSQLException("postgresql.unusual",
>>>
>>>
>se);
>
>
>>>               }
>>>               // lob is closed by the stream so don't call lob.close()
>>>               setInt(i, oid);
>>>       }
>>>
>>>+++
>>>
>>>Since the getBinaryStream() returns an InputStream, should this routine
>>>close that inputstream once it's done, or does the Blob itself have to
>>>somehow know that a stream it creates can be closed and discarded (and if
>>>so, how?)?
>>>
>>>
>
>
>
>



pgsql-jdbc by date:

Previous
From: "David Wall"
Date:
Subject: Re: setBlob loop performance?
Next
From: "saadna"
Date:
Subject: connection error