bytea memory improvement - Mailing list pgsql-jdbc
From | Luis Vilar Flores |
---|---|
Subject | bytea memory improvement |
Date | |
Msg-id | 448EB28B.2020301@evolute.pt Whole thread Raw |
Responses |
Re: bytea memory improvement
|
List | pgsql-jdbc |
The current postgresql driver has some memory issues when reading bytea fields from the backend. The problem is in the class org.postgresql.util.PGbytea, in method public static byte[] toBytes(byte[] s). This method (as I understood it) translates from the wire protocol to the java byte[] for the user. The current implementation uses 2 buffers (the receiving buffer and one temp) with the wire size plus one smaller buffer (the final translated buffer). For big files (bytea fields can be as big as 2GB) this is too expensive in RAM (to download a field of 100MB I need at least 300MB free in the client). One workaround could be to translate inplace on the receive buffer (I think it is not used for anything else), and then copy to the right size final buffer, but this would imply to alter the the receiving buffer, not very elegant. My solution is to have a threshold (I think 1MB is a balanced value) and below that execute as always, 3 buffers. Above, do an extra cycle through the incoming buffer, compute the final buffer size, and then behave as befoe (but skip the last part - we already have the right size of the buffer, so we don't need to do the last copy). I've implemented the code (it's very simple), and tested it (comparing the old function and the new) and it look ok. The overhead for passing one more time in the initial buffer is about 30ms for each 5MB in a Celeron M 1.6GHz. I hope this code (or some improved version of it) could make it's way into the driver, I need to work with lots of big bytea fields and the memory constraints are very hard to meet. Thanks for the nice work, -- Luis Flores Analista de Sistemas *Evolute* - Consultoria Informática <http://www.evolute.pt> Email: lflores@evolute.pt <mailto:lflores@evolute.pt> Tel: (+351) 212949689 AVISO DE CONFIDENCIALIDADE Esta mensagem de correio electrónico e eventuais ficheiros anexos são confidenciais e destinados apenas à(s) pessoa(s) ou entidade(s) acima referida(s), podendo conter informação privilegiada e confidencial, a qual não poderá ser divulgada, copiada, gravada ou distribuída nos termos da lei vigente. Caso não seja o destinatário da mensagem, ou se ela lhe foi enviada por engano, agradecemos que não faça uso ou divulgação da mesma. A distribuição ou utilização da informação nela contida é interdita. Se recebeu esta mensagem por engano, por favor notifique o remetente e apague este e-mail do seu sistema. Obrigado. CONFIDENTIALITY NOTICE This e-mail transmission and eventual attached files are intended only for the use of the individual(s) or entity(ies) named above and may contain information that is both privileged and confidential and is exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of any of the information contained in this transmission is strictly restricted. If by any means you have received this transmission in error, please immediately notify the sender and delete this e-mail from your system. Thank you. /*------------------------------------------------------------------------- * * Copyright (c) 2003-2005, PostgreSQL Global Development Group * * IDENTIFICATION * $PostgreSQL: pgjdbc/org/postgresql/util/PGbytea.java,v 1.12 2005/01/11 08:25:49 jurka Exp $ * *------------------------------------------------------------------------- */ package org.postgresql.util; import java.sql.*; /** * Converts to and from the postgresql bytea datatype used by the backend. */ public class PGbytea { private static final int MAX_3_BUFF_SIZE = 1024*1024; /* * Converts a PG bytea raw value (i.e. the raw binary representation * of the bytea data type) into a java byte[] */ public static byte[] toBytes(byte[] s) throws SQLException { if (s == null) return null; final int slength = s.length; byte[] buf = null; int correctSize = slength; if (slength > MAX_3_BUFF_SIZE) { // long l = System.currentTimeMillis(); // count / * // for (int i = 0; i < slength; ++i) { byte current = s[i]; if (current == '\\') { byte next = s[ ++i ]; if (next == '\\') { --correctSize; } else { correctSize -= 3; } } } //System.out.println( "TOOK: " + ( System.currentTimeMillis() - l ) + "ms - SAVED " + ( slength - correctSize ) + " ON SECONDBUFFER" ); buf = new byte[correctSize]; } else { buf = new byte[slength]; } int bufpos = 0; int thebyte; byte nextbyte; byte secondbyte; for (int i = 0; i < slength; i++) { nextbyte = s[i]; if (nextbyte == (byte)'\\') { secondbyte = s[++i]; if (secondbyte == (byte)'\\') { //escaped \ buf[bufpos++] = (byte)'\\'; } else { thebyte = (secondbyte - 48) * 64 + (s[++i] - 48) * 8 + (s[++i] - 48); if (thebyte > 127) thebyte -= 256; buf[bufpos++] = (byte)thebyte; } } else { buf[bufpos++] = nextbyte; } } if (bufpos == correctSize) { //System.out.println( "SKIPPED LAST BUFFER" ); return buf; } byte[] l_return = new byte[bufpos]; System.arraycopy(buf, 0, l_return, 0, bufpos); return l_return; } /* * Converts a java byte[] into a PG bytea string (i.e. the text * representation of the bytea data type) */ public static String toPGString(byte[] p_buf) throws SQLException { if (p_buf == null) return null; StringBuffer l_strbuf = new StringBuffer(2 * p_buf.length); for (int i = 0; i < p_buf.length; i++) { int l_int = (int)p_buf[i]; if (l_int < 0) { l_int = 256 + l_int; } //we escape the same non-printable characters as the backend //we must escape all 8bit characters otherwise when convering //from java unicode to the db character set we may end up with //question marks if the character set is SQL_ASCII if (l_int < 040 || l_int > 0176) { //escape charcter with the form \000, but need two \\ because of //the parser l_strbuf.append("\\"); l_strbuf.append((char)(((l_int >> 6) & 0x3) + 48)); l_strbuf.append((char)(((l_int >> 3) & 0x7) + 48)); l_strbuf.append((char)((l_int & 0x07) + 48)); } else if (p_buf[i] == (byte)'\\') { //escape the backslash character as \\, but need four \\\\ because //of the parser l_strbuf.append("\\\\"); } else { //other characters are left alone l_strbuf.append((char)p_buf[i]); } } return l_strbuf.toString(); } }
pgsql-jdbc by date: