JDBC: LONGVARBINARY upload patch - Mailing list pgsql-jdbc
From | Fernando Nasser |
---|---|
Subject | JDBC: LONGVARBINARY upload patch |
Date | |
Msg-id | 3ED4CCD4.70209@redhat.com Whole thread Raw |
List | pgsql-jdbc |
(Resending as it did not show up in either list) This patch drastically reduces the memory requirements and time taken to upload large binary values to bytea fields, which are which JDBC LONGVARBINARY have been implemented with. The idea is to use this one for fixing setBinaryStream() and the patch by Chris Smith <cdsmith@twu.net> to fix setAsciiStream() and setUnicodeStream(). In that case we do not have a backend function to make a LO into a text field and at least it will not be all binary values. I have included in the diff my previous patch "JDBC: Wrong type" as it is a pre-requisite for both my patch and Chris' and that doesn't seem to have been checked in yet. Here is how we get to this solution: I used a test program (written by Holger Haag <hhaag@gmx.de>) to create a bytea field and read it back. I've used a 54M text file for testing (a binary file would perform worse as there would be more expansion into octal representation). With the current driver I needed mx = 660M to upload the file to a bytea and it would take forever. By fixing the code (not changing the method of doing it) I was able to lower this requirement to mx=440M. This is the patch I've submitted as "JDBC: Better initial capacity for StringBuffers reduces memory usage". But remember this was an ASCII file, things would be much worse with a binary one. And it still takes too much time to do. Anyway, sending encoded binary at five characters per byte from the front-end to the back-end and having that parsed is not a good option, so I did change the method to avoid all that. To do so, I had to use the interface that is provided to PostgreSQL Large Objects (LOs) to get them into a LO and from there to the bytea field itself. Tom Lane has kindly pointed me to a backend function that could be used for loading the bytea from the LO. The method creates a LO that is used as a staging area, moves the data to the bytea destination and then removes the LO. If something goes wrong this LO goes away as well as this is done inside a transaction (LOs can only be manipulated inside a transaction -- but I did it transparent to the driver's user). It is also secure as the LO will not be seem by others until the transaction is committed and it will be removed before that happens. With this new method, an mx of only 70M is required! And it takes much less time do do it (an order of magnitude less!). But before you get overly excited be warned that there is a price to pay for that. As I mentioned, I create and delete a LO in the process (to use as the staging area). This uses up (temporarily) space in a table called pg_largeobjects. Unfortunately, PostgreSQL still does not reuse empty space on tables. So, this table will increase by the size of every file uploaded (plus some index overhead). The only way to recup this space is by running periodically the command: VACUUM [ANALYZE] [VERBOSE] pg_largeobjects; After this gets checked in, I will adjust Chris' patch to work with non-binary data and probably also send a patch for getBytes() as well. Regards to all, Fernando -- Fernando Nasser Red Hat Canada Ltd. E-Mail: fnasser@redhat.com 2323 Yonge Street, Suite #300 Toronto, Ontario M4P 2C9 Index: src/interfaces/jdbc/org/postgresql/core/QueryExecutor.java =================================================================== RCS file: /projects/cvsroot/pgsql-server/src/interfaces/jdbc/org/postgresql/core/QueryExecutor.java,v retrieving revision 1.21 diff -c -p -r1.21 QueryExecutor.java *** src/interfaces/jdbc/org/postgresql/core/QueryExecutor.java 7 May 2003 03:03:30 -0000 1.21 --- src/interfaces/jdbc/org/postgresql/core/QueryExecutor.java 27 May 2003 18:22:05 -0000 *************** import java.util.Vector; *** 16,21 **** --- 16,22 ---- import java.io.IOException; import java.sql.*; import org.postgresql.util.PSQLException; + import org.postgresql.util.PGLOInputStream; import org.postgresql.jdbc1.AbstractJdbc1Connection; import org.postgresql.jdbc1.AbstractJdbc1ResultSet; import org.postgresql.jdbc1.AbstractJdbc1Statement; *************** public class QueryExecutor *** 84,89 **** --- 85,91 ---- private Field[] fields = null; private Vector tuples = new Vector(); private boolean binaryCursor = false; + private boolean wasAutoCommit = false; private String status = null; private int update_count = 1; private long insert_oid = 0; *************** public class QueryExecutor *** 165,170 **** --- 167,186 ---- } } + + // Regardless of sucess or failure, get rid of any LO staging area created + for ( int i = 0; i < m_binds.length ; i++ ) + { + if ( m_binds[i].getClass() == PGLOInputStream.class ) + { + // Close and unlink (delete) LO + ((PGLOInputStream)m_binds[i]).dematerialize(); + } + } + + // If we started the transaction end it + if (wasAutoCommit) + connection.setAutoCommit(true); // did we get an error during this query? if ( errorMessage != null ) *************** public class QueryExecutor *** 194,199 **** --- 210,232 ---- { if ( m_binds[i] == null ) throw new PSQLException("postgresql.prep.param", new Integer(i + 1)); + } + // Now loop again and materialize all Streams into LOs + wasAutoCommit = false; + for ( int i = 0; i < m_binds.length ; i++ ) + { + // TODO FENN + if ( m_binds[i].getClass() == PGLOInputStream.class ) + { + // Make sure we are in a transaction + if (connection.getAutoCommit()) + { + connection.setAutoCommit(false); + wasAutoCommit = true; + } + // Create and open LO + ((PGLOInputStream)m_binds[i]).materialize(); + } } try { Index: src/interfaces/jdbc/org/postgresql/jdbc1/AbstractJdbc1Statement.java =================================================================== RCS file: /projects/cvsroot/pgsql-server/src/interfaces/jdbc/org/postgresql/jdbc1/AbstractJdbc1Statement.java,v retrieving revision 1.21 diff -c -p -r1.21 AbstractJdbc1Statement.java *** src/interfaces/jdbc/org/postgresql/jdbc1/AbstractJdbc1Statement.java 3 May 2003 20:40:45 -0000 1.21 --- src/interfaces/jdbc/org/postgresql/jdbc1/AbstractJdbc1Statement.java 27 May 2003 18:22:06 -0000 *************** public abstract class AbstractJdbc1State *** 132,138 **** v.addElement(l_sql.substring (lastParmEnd, l_sql.length())); m_sqlFragments = new String[v.size()]; ! m_binds = new String[v.size() - 1]; m_bindTypes = new String[v.size() - 1]; for (i = 0 ; i < m_sqlFragments.length; ++i) --- 132,138 ---- v.addElement(l_sql.substring (lastParmEnd, l_sql.length())); m_sqlFragments = new String[v.size()]; ! m_binds = new Object[v.size() - 1]; m_bindTypes = new String[v.size() - 1]; for (i = 0 ; i < m_sqlFragments.length; ++i) *************** public abstract class AbstractJdbc1State *** 1343,1370 **** //As the spec/javadoc for this method indicate this is to be used for //large binary values (i.e. LONGVARBINARY) PG doesn't have a separate //long binary datatype, but with toast the bytea datatype is capable of ! //handling very large values. Thus the implementation ends up calling ! //setBytes() since there is no current way to stream the value to the server ! byte[] l_bytes = new byte[length]; ! int l_bytesRead; ! try ! { ! l_bytesRead = x.read(l_bytes, 0, length); ! } ! catch (IOException l_ioe) { ! throw new PSQLException("postgresql.unusual", l_ioe); ! } ! if (l_bytesRead == length) ! { ! setBytes(parameterIndex, l_bytes); } else { ! //the stream contained less data than they said ! byte[] l_bytes2 = new byte[l_bytesRead]; ! System.arraycopy(l_bytes, 0, l_bytes2, 0, l_bytesRead); ! setBytes(parameterIndex, l_bytes2); } } else --- 1343,1356 ---- //As the spec/javadoc for this method indicate this is to be used for //large binary values (i.e. LONGVARBINARY) PG doesn't have a separate //long binary datatype, but with toast the bytea datatype is capable of ! //handling very large values. ! if (x == null) { ! setNull(parameterIndex, Types.VARBINARY); } else { ! bind(parameterIndex, new PGLOInputStream(connection, x, length), PG_BYTEA); } } else Index: src/interfaces/jdbc/org/postgresql/jdbc2/AbstractJdbc2Statement.java =================================================================== RCS file: /projects/cvsroot/pgsql-server/src/interfaces/jdbc/org/postgresql/jdbc2/AbstractJdbc2Statement.java,v retrieving revision 1.13 diff -c -p -r1.13 AbstractJdbc2Statement.java *** src/interfaces/jdbc/org/postgresql/jdbc2/AbstractJdbc2Statement.java 14 Mar 2003 01:21:47 -0000 1.13 --- src/interfaces/jdbc/org/postgresql/jdbc2/AbstractJdbc2Statement.java 27 May 2003 18:22:06 -0000 *************** public abstract class AbstractJdbc2State *** 175,181 **** l_newSqlFragments = new String[m_sqlFragments.length]; System.arraycopy(m_sqlFragments,0,l_newSqlFragments,0,m_sqlFragments.length); } ! Object[] l_newBinds = new String[m_binds.length]; System.arraycopy(m_binds,0,l_newBinds,0,m_binds.length); String[] l_newBindTypes = new String[m_bindTypes.length]; System.arraycopy(m_bindTypes,0,l_newBindTypes,0,m_bindTypes.length); --- 175,181 ---- l_newSqlFragments = new String[m_sqlFragments.length]; System.arraycopy(m_sqlFragments,0,l_newSqlFragments,0,m_sqlFragments.length); } ! Object[] l_newBinds = new Object[m_binds.length]; System.arraycopy(m_binds,0,l_newBinds,0,m_binds.length); String[] l_newBindTypes = new String[m_bindTypes.length]; System.arraycopy(m_bindTypes,0,l_newBindTypes,0,m_bindTypes.length);
pgsql-jdbc by date: