Re: Inserting large BLOBs via JDBC - OutOfMemoryError - Mailing list pgsql-jdbc

From hhaag@gmx.de
Subject Re: Inserting large BLOBs via JDBC - OutOfMemoryError
Date
Msg-id 24814.1029484864@www9.gmx.net
Whole thread Raw
In response to Inserting large BLOBs via JDBC - OutOfMemoryError  (hhaag@gmx.de)
List pgsql-jdbc
>While "new StringBuffer(p_buf.length)" is probably an improvement, it is
>difficult to predict what size buffer you will really need.  This is
>because depending on the data you will see between zero and four times
>data expansion.  Because the  protocol postgres uses to talk between the
>client and server is string based, the binary data needs to be encoded
>in an ascii safe way.  The encoding for the bytea datatype is to use
>\OOO octal escaping.  Therefore each byte of data may take up to four
>bytes in the output.  However if the data is mostly printable 7bit ascii
>bytes then there will be little expansion.
>
>I think your idea of initializing the buffer to be the size of the
>byte[] is a good idea.  I will apply that change unless someone has a
>better suggestion.

I think it's at least better than initializing the stringbuffer with the
default capacity, which is 16. And as long as the stringbuffer is used only
internally (as a local variable) in a private method, no other parts of the code
should be affected. Of course you cannot predict the final size of the
created string.

There are also other places where StringBuffer usage could be improved in my
opinion:

(1) org.postgresql.jdbc1.AbstractJdbc1Statement#setString()

    // Some performance caches
    private StringBuffer sbuf = new StringBuffer();
...

current:

    public void setString(int parameterIndex, String x) throws SQLException {
                           ....
          synchronized (sbuf) {
                              sbuf.setLength(0);


proposed:

          StringBuffer sbuf = new StringBuffer(x.length());

--> use a local, non-synchronized variable. initialize the stringbuffer with
a smart capacity.

please note that I have not fully explored the usage of synchronized and the
re-usage of the stringbuffer. but as the synchronized keyword indicates,
this variable will only be accessed by one thread at a time. additionally the
actual contents of the stringbuffer are always disposed at the beginning of a
method. so a local variable should be fine - and faster than a synchronized
instance variable


(2) org.postgresql.jdbc1.AbstractJdbc1Statement#compileQuery()


protected synchronized String compileQuery()
throws SQLException
{
    sbuf.setLength(0);
    int i;

    if (isFunction && !returnTypeSet)
        throw new PSQLException("postgresql.call.noreturntype");
    if (isFunction) { // set entry 1 to dummy entry..
        inStrings[0] = ""; // dummy entry which ensured that no one overrode
        // and calls to setXXX (2,..) really went to first arg in a function
call..
    }

    for (i = 0 ; i < inStrings.length ; ++i)
    {
        if (inStrings[i] == null)
            throw new PSQLException("postgresql.prep.param", new Integer(i + 1));
        sbuf.append (templateStrings[i]).append (inStrings[i]);
    }
    sbuf.append(templateStrings[inStrings.length]);
    return sbuf.toString();
}


also in this case the stringbuffer should be initialized with a smart
capacity.

something like the sum of all string lengths to be appended. I'm a bit in a
rush today, but I'll try to find an algorithm in the next few days

--


GMX - Die Kommunikationsplattform im Internet.
http://www.gmx.net


pgsql-jdbc by date:

Previous
From: Barry Lind
Date:
Subject: Re: Exception retrieving timestamp without timezone value
Next
From: hhaag@gmx.de
Date:
Subject: Re: Inserting large BLOBs via JDBC - OutOfMemoryError