bytea memory improvement - Mailing list pgsql-jdbc

From Luis Vilar Flores
Subject bytea memory improvement
Date
Msg-id 448EB28B.2020301@evolute.pt
Whole thread Raw
Responses Re: bytea memory improvement
List pgsql-jdbc
   The current postgresql driver has some memory issues when reading
bytea fields from the backend.

   The problem is in the class org.postgresql.util.PGbytea, in method
public static byte[] toBytes(byte[] s).
   This method (as I understood it) translates from the wire protocol to
the java byte[] for the user. The current implementation uses 2 buffers
(the receiving buffer and one temp) with the wire size plus one smaller
buffer (the final translated buffer).
   For big files (bytea fields can be as big as 2GB) this is too
expensive in RAM (to download a field of 100MB I need at least 300MB
free in the client).
     One workaround could be to translate inplace on the receive buffer
(I think it is not used for anything else), and then copy to the right
size final buffer, but this would imply to alter the the receiving
buffer, not very elegant.

   My solution is to have a threshold (I think 1MB is a balanced value)
and below that execute as always, 3 buffers. Above, do an extra cycle
through the incoming buffer, compute the final buffer size, and then
behave as befoe (but skip the last part - we already have the right size
of the buffer, so we don't need to do the last copy).

   I've implemented the code (it's very simple), and tested it
(comparing the old function and the new) and it look ok.

   The overhead for passing one more time in the initial buffer is about
30ms for each 5MB in a Celeron M 1.6GHz.

     I hope this code (or some improved version of it) could make it's
way into the driver, I need to work with lots of big bytea fields and
the memory constraints are very hard to meet.

    Thanks for the nice work,
--

Luis Flores

Analista de Sistemas

*Evolute* - Consultoria Informática

<http://www.evolute.pt> Email: lflores@evolute.pt
<mailto:lflores@evolute.pt>

Tel: (+351) 212949689


AVISO DE CONFIDENCIALIDADE
Esta mensagem de correio electrónico e eventuais ficheiros anexos são
confidenciais e destinados apenas à(s) pessoa(s) ou entidade(s) acima
referida(s), podendo conter informação privilegiada e confidencial, a
qual não poderá ser divulgada, copiada, gravada ou distribuída nos
termos da lei vigente. Caso não seja o destinatário da mensagem, ou se
ela lhe foi enviada por engano, agradecemos que não faça uso ou
divulgação da mesma. A distribuição ou utilização da informação nela
contida é interdita. Se recebeu esta mensagem por engano, por favor
notifique o remetente e apague este e-mail do seu sistema. Obrigado.

CONFIDENTIALITY NOTICE
This e-mail transmission and eventual attached files are intended only
for the use of the individual(s) or entity(ies) named above and may
contain information that is both privileged and confidential and is
exempt from disclosure under applicable law. If you are not the intended
recipient, you are hereby notified that any disclosure, copying,
distribution or use of any of the information contained in this
transmission is strictly restricted. If by any means you have received
this transmission in error, please immediately notify the sender and
delete this e-mail from your system. Thank you.
/*-------------------------------------------------------------------------
*
* Copyright (c) 2003-2005, PostgreSQL Global Development Group
*
* IDENTIFICATION
*   $PostgreSQL: pgjdbc/org/postgresql/util/PGbytea.java,v 1.12 2005/01/11 08:25:49 jurka Exp $
*
*-------------------------------------------------------------------------
*/
package org.postgresql.util;

import java.sql.*;

/**
 * Converts to and from the postgresql bytea datatype used by the backend.
 */
public class PGbytea
{
    private static final int MAX_3_BUFF_SIZE = 1024*1024;

    /*
     * Converts a PG bytea raw value (i.e. the raw binary representation
     * of the bytea data type) into a java byte[]
     */
    public static byte[] toBytes(byte[] s) throws SQLException
    {
        if (s == null)
            return null;
        final int slength = s.length;
        byte[] buf = null;
        int correctSize = slength;
        if (slength > MAX_3_BUFF_SIZE)
        {
//            long l = System.currentTimeMillis();
            // count / * //
            for (int i = 0; i < slength; ++i)
            {
                byte current = s[i];
                if (current == '\\')
                {
                    byte next = s[ ++i ];
                    if (next == '\\')
                    {
                        --correctSize;
                    }
                    else
                    {
                        correctSize -= 3;
                    }
                }
            }
//System.out.println( "TOOK: " + ( System.currentTimeMillis() - l ) + "ms - SAVED " + ( slength - correctSize ) + " ON
SECONDBUFFER" ); 
            buf = new byte[correctSize];
        }
        else
        {
            buf = new byte[slength];
        }
        int bufpos = 0;
        int thebyte;
        byte nextbyte;
        byte secondbyte;
        for (int i = 0; i < slength; i++)
        {
            nextbyte = s[i];
            if (nextbyte == (byte)'\\')
            {
                secondbyte = s[++i];
                if (secondbyte == (byte)'\\')
                {
                    //escaped \
                    buf[bufpos++] = (byte)'\\';
                }
                else
                {
                    thebyte = (secondbyte - 48) * 64 + (s[++i] - 48) * 8 + (s[++i] - 48);
                    if (thebyte > 127)
                        thebyte -= 256;
                    buf[bufpos++] = (byte)thebyte;
                }
            }
            else
            {
                buf[bufpos++] = nextbyte;
            }
        }
        if (bufpos == correctSize)
        {
//System.out.println( "SKIPPED LAST BUFFER" );
            return buf;
        }
        byte[] l_return = new byte[bufpos];
        System.arraycopy(buf, 0, l_return, 0, bufpos);
        return l_return;
    }

    /*
     * Converts a java byte[] into a PG bytea string (i.e. the text
     * representation of the bytea data type)
     */
    public static String toPGString(byte[] p_buf) throws SQLException
    {
        if (p_buf == null)
            return null;
        StringBuffer l_strbuf = new StringBuffer(2 * p_buf.length);
        for (int i = 0; i < p_buf.length; i++)
        {
            int l_int = (int)p_buf[i];
            if (l_int < 0)
            {
                l_int = 256 + l_int;
            }
            //we escape the same non-printable characters as the backend
            //we must escape all 8bit characters otherwise when convering
            //from java unicode to the db character set we may end up with
            //question marks if the character set is SQL_ASCII
            if (l_int < 040 || l_int > 0176)
            {
                //escape charcter with the form \000, but need two \\ because of
                //the parser
                l_strbuf.append("\\");
                l_strbuf.append((char)(((l_int >> 6) & 0x3) + 48));
                l_strbuf.append((char)(((l_int >> 3) & 0x7) + 48));
                l_strbuf.append((char)((l_int & 0x07) + 48));
            }
            else if (p_buf[i] == (byte)'\\')
            {
                //escape the backslash character as \\, but need four \\\\ because
                //of the parser
                l_strbuf.append("\\\\");
            }
            else
            {
                //other characters are left alone
                l_strbuf.append((char)p_buf[i]);
            }
        }
        return l_strbuf.toString();
    }
}

pgsql-jdbc by date:

Previous
From: Stef
Date:
Subject: Re: Column description in ResultSetMetaData
Next
From: "Stefan Arentz"
Date:
Subject: XA Status