Re: setBlob loop performance? - Mailing list pgsql-jdbc
From | Barry Lind |
---|---|
Subject | Re: setBlob loop performance? |
Date | |
Msg-id | 3D6C87E0.6000502@xythos.com Whole thread Raw |
In response to | setBlob loop performance? ("David Wall" <dwall@Yozons.com>) |
List | pgsql-jdbc |
David, Could you resend this in a diff -c format. Since I don't know the exact version you have it is difficult to pick out the changes. A diff would make that much easier. thanks, --Barry David Wall wrote: >>Both of these issues look like bugs to me. Please submit a patch. I >>agree that using a 4 or 8k buffer for reading/writing between the >>streams is much better than how the code is currently implemented. >> Also, setBlob should be calling close on the input stream when it is >> >> >done. > >Oddly enough, I don't have the capability to test this just yet, but I >thought I'd share the code as I think would work better for >org.postgresql.jdbc2.PreparedStatement. Of course, if anybody sees anything >screwy, please let me know. The main changes are the closing of the two >streams in the finally block so that even if an exception is thrown, the >streams are closed, and the primary loop attempts to read/write 4k blocks. >I hope to be able to test shortly, but we're in the process of converting >the database to 7.2.2 and the sources from getBytes/setBytes to >getBlob/setBlob (we thought this would be easier than converting existing >OIDs to bytea in the db). > >David > > > public void setBlob(int i, Blob x) throws SQLException > { > InputStream l_inStream = x.getBinaryStream(); > LargeObjectManager lom = connection.getLargeObjectAPI(); > int oid = lom.create(); > LargeObject lob = lom.open(oid); > OutputStream los = lob.getOutputStream(); > byte[] buf = new byte[4096]; > try > { > // could be buffered, but then the OutputStream returned by LargeObject > // is buffered internally anyhow, so there would be no performance > // boost gained, if anything it would be worse! > int bytesRemaining = (int)x.length(); > int numRead = l_inStream.read(buf,0,Math.min(buf.length,bytesRemaining)); > while (numRead != -1 && bytesRemaining > 0) > { > bytesRemaining -= numRead; > los.write(buf,0,numRead); > numRead = l_inStream.read(buf,0,Math.min(buf.length,bytesRemaining)); > } > } > catch (IOException se) > { > throw new PSQLException("postgresql.unusual", se); > } > finally > { > try > { > los.close(); > l_inStream.close(); > } > catch( Exception e ) {} > } > // lob is closed by the stream so don't call lob.close() > setInt(i, oid); > } > > > > >>Is there anything else that can be done to improve this. I have always >>found the jdbc spec for Blobs to be limited in the blob creation area. >> Is there anything you would like to see in the driver to make this >>easier for you? >> >> > >I don't think so, but it's interesting to see the API and wonder if anybody >actually coded around it. For example, assume you have a nice InputStream >to a blob on disk. Getting that into a Blob field would be a pain since >you'd have to wrap the InputStream in your object that implements Blob and >then have that interface drive the blob insertion. If you already have a >byte array, you still have to wrap it insert it as a blob. I suppose a >really smart JDBC could use setBytes()/getBytes() and simply handle the >conversions internally if the data field was actually a blob and not a >longvarbinary. > >It's also interesting that the interface returns all numbers as a long, >including the length, but the read requires it to be an int, meaning you can >attempt to suck in an entire blob in a single call to getBytes(). Oh >well... > >David > > > >>David Wall wrote: >> >> >> >>>In the 7.2.2 codeset, PreparedStatement.setBlob() shows a loop as it >>> >>> >reads a > > >>>byte from the input stream (the blob) and writes it to the output stream >>>(PG's LO routines). >>> >>>This seems highly inefficient since most large objects are, well, >>> >>> >large... > > >>>So if I want to insert a 1MB image, this will loop a million times. Is >>>there a reason it's not read in chunks (even a 4096 sized array would >>> >>> >reduce > > >>>such a loop down to 250 iterations)? >>> >>>This is much worse than the 7.1 code which simply took my byte array and >>>wrote it all to the LargeObject stream in one call. >>> >>>+++ >>> public void setBlob(int i, Blob x) throws SQLException >>> { >>> InputStream l_inStream = x.getBinaryStream(); >>> int l_length = (int) x.length(); >>> LargeObjectManager lom = connection.getLargeObjectAPI(); >>> int oid = lom.create(); >>> LargeObject lob = lom.open(oid); >>> OutputStream los = lob.getOutputStream(); >>> try >>> { >>> // could be buffered, but then the OutputStream >>>returned by LargeObject >>> // is buffered internally anyhow, so there would >>> >>> >be > > >>>no performance >>> // boost gained, if anything it would be worse! >>> int c = l_inStream.read(); >>> int p = 0; >>> while (c > -1 && p < l_length) >>> { >>> los.write(c); >>> c = l_inStream.read(); >>> p++; >>> } >>> los.close(); >>> } >>> catch (IOException se) >>> { >>> throw new PSQLException("postgresql.unusual", >>> >>> >se); > > >>> } >>> // lob is closed by the stream so don't call lob.close() >>> setInt(i, oid); >>> } >>> >>>+++ >>> >>>Since the getBinaryStream() returns an InputStream, should this routine >>>close that inputstream once it's done, or does the Blob itself have to >>>somehow know that a stream it creates can be closed and discarded (and if >>>so, how?)? >>> >>> > > > >
pgsql-jdbc by date: