JDBC: LONGVARBINARY upload patch - Mailing list pgsql-jdbc

From Fernando Nasser
Subject JDBC: LONGVARBINARY upload patch
Date
Msg-id 3ED4CCD4.70209@redhat.com
Whole thread Raw
List pgsql-jdbc
(Resending as it did not show up in either list)

This patch drastically reduces the memory requirements and time taken to
upload large binary values to bytea fields, which are which JDBC
LONGVARBINARY have been implemented with.

The idea is to use this one for fixing setBinaryStream() and the patch
by Chris Smith <cdsmith@twu.net> to fix setAsciiStream() and
setUnicodeStream().  In that case we do not have a backend function to
make a LO into a text field and at least it will not be all binary values.

I have included in the diff my previous patch "JDBC: Wrong type" as it
is a pre-requisite for both my patch and Chris' and that doesn't seem to
have been checked in yet.

Here is how we get to this solution:

I used a test program (written by Holger Haag <hhaag@gmx.de>) to create
a bytea field and read it back.  I've used a 54M text file for testing
(a binary file would perform worse as there would be more expansion into
octal representation).

With the current driver I needed mx = 660M to upload the file to a bytea
and it would take forever.

By fixing the code (not changing the method of doing it) I was able to
lower this requirement to mx=440M.  This is the patch I've submitted as
"JDBC: Better initial capacity for StringBuffers reduces memory usage".
But remember this was an ASCII file, things would be much worse with a
binary one.  And it still takes too much time to do.

Anyway, sending encoded binary at five characters per byte from the
front-end to the back-end and having that parsed is not a good option,
so I did change the method to avoid all that.

To do so, I had to use the interface that is provided to PostgreSQL
Large Objects (LOs) to get them into a LO and from there to the bytea
field itself. Tom Lane has kindly pointed me to a backend function that
could be used for loading the bytea from the LO.

The method creates a LO that is used as a staging area, moves the data
to the bytea destination and then removes the LO.  If something goes
wrong this LO goes away as well as this is done inside a transaction
(LOs can only be manipulated inside a transaction -- but I did it
transparent to the driver's user).  It is also secure as the LO will not
be seem by others until the transaction is committed and it will be
removed before that happens.

With this new method, an mx of only 70M is required!  And it takes much
less time do do it (an order of magnitude less!).

But before you get overly excited be warned that there is a price to pay
for that.  As I mentioned, I create and delete a LO in the process (to
use as the staging area).  This uses up (temporarily) space in a table
called pg_largeobjects.  Unfortunately, PostgreSQL still does not reuse
empty space on tables.  So, this table will increase by the size of
every file uploaded (plus some index overhead).  The only way to recup
this space is by running periodically the command:

VACUUM [ANALYZE] [VERBOSE] pg_largeobjects;


After this gets checked in, I will adjust Chris' patch to work with
non-binary data and probably also send a patch for getBytes() as well.


Regards to all,
Fernando


--
Fernando Nasser
Red Hat Canada Ltd.                     E-Mail:  fnasser@redhat.com
2323 Yonge Street, Suite #300
Toronto, Ontario   M4P 2C9

Index: src/interfaces/jdbc/org/postgresql/core/QueryExecutor.java
===================================================================
RCS file: /projects/cvsroot/pgsql-server/src/interfaces/jdbc/org/postgresql/core/QueryExecutor.java,v
retrieving revision 1.21
diff -c -p -r1.21 QueryExecutor.java
*** src/interfaces/jdbc/org/postgresql/core/QueryExecutor.java    7 May 2003 03:03:30 -0000    1.21
--- src/interfaces/jdbc/org/postgresql/core/QueryExecutor.java    27 May 2003 18:22:05 -0000
*************** import java.util.Vector;
*** 16,21 ****
--- 16,22 ----
  import java.io.IOException;
  import java.sql.*;
  import org.postgresql.util.PSQLException;
+ import org.postgresql.util.PGLOInputStream;
  import org.postgresql.jdbc1.AbstractJdbc1Connection;
  import org.postgresql.jdbc1.AbstractJdbc1ResultSet;
  import org.postgresql.jdbc1.AbstractJdbc1Statement;
*************** public class QueryExecutor
*** 84,89 ****
--- 85,91 ----
      private Field[] fields = null;
      private Vector tuples = new Vector();
      private boolean binaryCursor = false;
+     private boolean wasAutoCommit = false;
      private String status = null;
      private int update_count = 1;
      private long insert_oid = 0;
*************** public class QueryExecutor
*** 165,170 ****
--- 167,186 ----
                  }

              }
+
+             // Regardless of sucess or failure, get rid of any LO staging area created
+             for ( int i = 0; i < m_binds.length ; i++ )
+             {
+                 if ( m_binds[i].getClass() == PGLOInputStream.class )
+                 {
+                     // Close and unlink (delete) LO
+                     ((PGLOInputStream)m_binds[i]).dematerialize();
+                 }
+             }
+
+             // If we started the transaction end it
+             if (wasAutoCommit)
+                 connection.setAutoCommit(true);

              // did we get an error during this query?
              if ( errorMessage != null )
*************** public class QueryExecutor
*** 194,199 ****
--- 210,232 ----
          {
              if ( m_binds[i] == null )
                  throw new PSQLException("postgresql.prep.param", new Integer(i + 1));
+         }
+         // Now loop again and materialize all Streams into LOs
+         wasAutoCommit = false;
+         for ( int i = 0; i < m_binds.length ; i++ )
+         {
+             // TODO FENN
+             if ( m_binds[i].getClass() == PGLOInputStream.class )
+             {
+                 // Make sure we are in a transaction
+                 if (connection.getAutoCommit())
+                 {
+                     connection.setAutoCommit(false);
+                     wasAutoCommit = true;
+                 }
+                 // Create and open LO
+                 ((PGLOInputStream)m_binds[i]).materialize();
+             }
          }
          try
          {
Index: src/interfaces/jdbc/org/postgresql/jdbc1/AbstractJdbc1Statement.java
===================================================================
RCS file: /projects/cvsroot/pgsql-server/src/interfaces/jdbc/org/postgresql/jdbc1/AbstractJdbc1Statement.java,v
retrieving revision 1.21
diff -c -p -r1.21 AbstractJdbc1Statement.java
*** src/interfaces/jdbc/org/postgresql/jdbc1/AbstractJdbc1Statement.java    3 May 2003 20:40:45 -0000    1.21
--- src/interfaces/jdbc/org/postgresql/jdbc1/AbstractJdbc1Statement.java    27 May 2003 18:22:06 -0000
*************** public abstract class AbstractJdbc1State
*** 132,138 ****
          v.addElement(l_sql.substring (lastParmEnd, l_sql.length()));

          m_sqlFragments = new String[v.size()];
!         m_binds = new String[v.size() - 1];
          m_bindTypes = new String[v.size() - 1];

          for (i = 0 ; i < m_sqlFragments.length; ++i)
--- 132,138 ----
          v.addElement(l_sql.substring (lastParmEnd, l_sql.length()));

          m_sqlFragments = new String[v.size()];
!         m_binds = new Object[v.size() - 1];
          m_bindTypes = new String[v.size() - 1];

          for (i = 0 ; i < m_sqlFragments.length; ++i)
*************** public abstract class AbstractJdbc1State
*** 1343,1370 ****
              //As the spec/javadoc for this method indicate this is to be used for
              //large binary values (i.e. LONGVARBINARY)    PG doesn't have a separate
              //long binary datatype, but with toast the bytea datatype is capable of
!             //handling very large values.  Thus the implementation ends up calling
!             //setBytes() since there is no current way to stream the value to the server
!             byte[] l_bytes = new byte[length];
!             int l_bytesRead;
!             try
!             {
!                 l_bytesRead = x.read(l_bytes, 0, length);
!             }
!             catch (IOException l_ioe)
              {
!                 throw new PSQLException("postgresql.unusual", l_ioe);
!             }
!             if (l_bytesRead == length)
!             {
!                 setBytes(parameterIndex, l_bytes);
              }
              else
              {
!                 //the stream contained less data than they said
!                 byte[] l_bytes2 = new byte[l_bytesRead];
!                 System.arraycopy(l_bytes, 0, l_bytes2, 0, l_bytesRead);
!                 setBytes(parameterIndex, l_bytes2);
              }
          }
          else
--- 1343,1356 ----
              //As the spec/javadoc for this method indicate this is to be used for
              //large binary values (i.e. LONGVARBINARY)    PG doesn't have a separate
              //long binary datatype, but with toast the bytea datatype is capable of
!             //handling very large values.
!             if (x == null)
              {
!                 setNull(parameterIndex, Types.VARBINARY);
              }
              else
              {
!                 bind(parameterIndex, new PGLOInputStream(connection, x, length), PG_BYTEA);
              }
          }
          else
Index: src/interfaces/jdbc/org/postgresql/jdbc2/AbstractJdbc2Statement.java
===================================================================
RCS file: /projects/cvsroot/pgsql-server/src/interfaces/jdbc/org/postgresql/jdbc2/AbstractJdbc2Statement.java,v
retrieving revision 1.13
diff -c -p -r1.13 AbstractJdbc2Statement.java
*** src/interfaces/jdbc/org/postgresql/jdbc2/AbstractJdbc2Statement.java    14 Mar 2003 01:21:47 -0000    1.13
--- src/interfaces/jdbc/org/postgresql/jdbc2/AbstractJdbc2Statement.java    27 May 2003 18:22:06 -0000
*************** public abstract class AbstractJdbc2State
*** 175,181 ****
              l_newSqlFragments = new String[m_sqlFragments.length];
              System.arraycopy(m_sqlFragments,0,l_newSqlFragments,0,m_sqlFragments.length);
          }
!         Object[] l_newBinds = new String[m_binds.length];
          System.arraycopy(m_binds,0,l_newBinds,0,m_binds.length);
          String[] l_newBindTypes = new String[m_bindTypes.length];
          System.arraycopy(m_bindTypes,0,l_newBindTypes,0,m_bindTypes.length);
--- 175,181 ----
              l_newSqlFragments = new String[m_sqlFragments.length];
              System.arraycopy(m_sqlFragments,0,l_newSqlFragments,0,m_sqlFragments.length);
          }
!         Object[] l_newBinds = new Object[m_binds.length];
          System.arraycopy(m_binds,0,l_newBinds,0,m_binds.length);
          String[] l_newBindTypes = new String[m_bindTypes.length];
          System.arraycopy(m_bindTypes,0,l_newBindTypes,0,m_bindTypes.length);


pgsql-jdbc by date:

Previous
From: Fernando Nasser
Date:
Subject: JDBC: Reinstate autocommit client hack for 7.4
Next
From: Fernando Nasser
Date:
Subject: Re: [BUGS] Bug #926: if old postgresql.jar in CLASSPATH, ant fails