Re: Inserting large BLOBs via JDBC - OutOfMemoryError - Mailing list pgsql-jdbc

From Barry Lind
Subject Re: Inserting large BLOBs via JDBC - OutOfMemoryError
Date
Msg-id 3D5BCEC9.70007@xythos.com
Whole thread Raw
In response to Inserting large BLOBs via JDBC - OutOfMemoryError  (hhaag@gmx.de)
List pgsql-jdbc
Holger,

There certainly are areas where the driver can be improved.  And this is
probably one of them.  But like many things the solution isn't as easy
as it first appears.

hhaag@gmx.de wrote:

>mainly toPGString seems to be programmed inefficiently in terms of memory
>consumption, e.g.
>
>the string buffer allocation
>
>    public static String toPGString(byte[] p_buf) throws SQLException
>    {
>        if (p_buf == null)
>            return null;
>        StringBuffer l_strbuf = new StringBuffer();
>
>
>should contain a senseful inital size
>
>
>    public static String toPGString(byte[] p_buf) throws SQLException
>    {
>        if (p_buf == null)
>            return null;
>        StringBuffer l_strbuf = new StringBuffer(p_buf.length);
>
>or even
>
>        StringBuffer l_strbuf = new StringBuffer(p_buf.length*2);
>
>because of special characters
>
>
>to avoid a repeated re-allocation of the internal char[] array in the
>stringbuffer.
>
>
>
>
While "new StringBuffer(p_buf.length)" is probably an improvement, it is
difficult to predict what size buffer you will really need.  This is
because depending on the data you will see between zero and four times
data expansion.  Because the  protocol postgres uses to talk between the
client and server is string based, the binary data needs to be encoded
in an ascii safe way.  The encoding for the bytea datatype is to use
\OOO octal escaping.  Therefore each byte of data may take up to four
bytes in the output.  However if the data is mostly printable 7bit ascii
bytes then there will be little expansion.

I think your idea of initializing the buffer to be the size of the
byte[] is a good idea.  I will apply that change unless someone has a
better suggestion.

thanks,
--Barry

PS.  It should also be possible to not prebuild the entire sql statement
in memory (which includes the entire escaped binary value), but to
stream the value back to the server.  Thus instead of building the
following String in memory:
insert into table foo (bar, baz, boo) values (1000, '\123\432\543\567...
continuing on for possibly megabytes...', 'boo value')
The driver should be able to just send back the pieces and stream the
bytea value from the inputstream:
send -> "insert int table foo (bar, baz, boo) values("
send -> 1000            (value for column bar)
send -> ", "
send -> (value for column baz by reading from the input stream here,
encoding the results byte by byte and sending them)
send -> ", "
send -> "'boo value'"  (value for column boo)
send -> ")"
I am not sure how much work this will be, but it is probably the correct
long term solution.  Any volunteers for looking into this and submitting
a patch?



pgsql-jdbc by date:

Previous
From: hhaag@gmx.de
Date:
Subject: Inserting large BLOBs via JDBC - OutOfMemoryError
Next
From: Matthias Brunner
Date:
Subject: Calling Connection.close() in Connection.finalize()