Re: PreparedStatement parameters and mutable objects - Mailing list pgsql-jdbc

From Oliver Jowett
Subject Re: PreparedStatement parameters and mutable objects
Date
Msg-id 400351C1.7020609@opencloud.com
Whole thread Raw
In response to Re: PreparedStatement parameters and mutable objects  (Paul Thomas <paul@tmsl.demon.co.uk>)
Responses Re: PreparedStatement parameters and mutable objects  (Dave Cramer <pg@fastcrypt.com>)
List pgsql-jdbc
Paul Thomas wrote:
> On 12/01/2004 00:39 Oliver Jowett wrote:
>> The problem is that it's very specific to the application workload;
>> what case do we measure?
>
> I think you would need something which closely models you app's work load.
>
>> The reason I'm interested in doing this for is not the direct CPU
>> overhead of object creation (none of the JDBC code is on our main
>> execution path), but the effect that object creation has on GC
>> interval and pause. We're running a low-latency app where an extra
>> 50ms pause due to GC has a large impact on our latency figures .. so
>> the less garbage we generate in the first place, the better. We could
>> farm the JDBC work out to a separate VM, but that gets complex quite
>> fast.
>
>
> Reducing the amount of garbage created is never a bad thing. Do you have
> an estimate of the % reduction you think you could achieve?

See the attached testcase, it does batched inserts of byte arrays. I've
run this as:

   java -Xrunhprof:heap=all,file=insertgarbage.txt,depth=10,cutoff=0 \
      -verbose:gc -cp .:/net/java/java_libs/postgresql.jar \
      TestInsertGarbage \
      'jdbc:postgresql:rhino?user=oliver&password=oliver' \
      100000 100 100

which does 100k inserts each with a 100 byte array, in batches of 100.
This is roughly the same as our application workload when it is doing a
DB resync of a reasonably-sized provisioning database.

In the resulting heap dump, the allocation hotspots are:

  QueryExecutor.sendQueryV2 -> Encoding.encode -> String.getBytes
   (100000 byte[], 143738400 bytes)
   (100000 byte[], 37144800 bytes)
   (100000 byte[], 20000000 bytes) // This is encoding the query text
  Statement.setBytes -> setString -> escapeString -> StringBuffer.append
   (63000 char[], 81608000 bytes)
   (19200 char[], 25018400 bytes)
   (17800 char[], 23091200 bytes)
  Statement.setBytes -> setString -> StringBuffer.ensureCapacity
   (100000 char[], 65445600 bytes)
  Statement.setBytes -> PGbytea.toPGString -> StringBuffer.<init>
   (100000 char[], 41600000 bytes)
  Statement.setBytes -> PGbytea.toPGString -> StringBuffer.append
   (21800 char[], 17788800 bytes)
   (21400 char[], 17462400 bytes)
   (21300 char[], 17380800 bytes)
   (20800 char[], 16972800 bytes)
   (14400 char[], 11750400 bytes)
  QueryExecutor.executeV2 -> Jdbc3PreparedStatement.createResultSet
   (100000 Jdbc3ResultSet, 13600000 bytes)

All other allocation points allocate <10MB. There's quite a bit of other
object allocation that could be cleaned up, e.g. repeated allocation of
QueryExecutor objects.

Total allocations were around 631800000 bytes. I estimate that streaming
the parameter data would generate about 499000000 bytes less garbage
(basically all of the above goes away except for encoding the query text
and creating the resultsets), which is a 79% reduction (this is actually
a lot more than I expected!).

There are improvements we can make here without streaming, though. Not
creating resultset objects for updates, and doing all necessary escaping
in a single pass rather than two, both seem like low-hanging fruit.

-O
import java.sql.*;

public class TestInsertGarbage {
    public static void main(String args[]) throws Exception {
        if (args.length < 4) {
            System.err.println("usage: java TestInsertGarbage <jdbc: URL> <# of inserts> <array size> <batch size>");
            return;
        }

        String driverURL = args[0];
        int numInserts = Integer.parseInt(args[1]);
        int arraySize = Integer.parseInt(args[2]);
        int batchSize = Integer.parseInt(args[3]);

        // We don't want to count the data to be inserted itself, since that's
        // mostly static and isn't garbage. So just create a single array
        // to represent all the data. We'll reuse that on each insert.

        // we also give it a random distribution of values since that affects
        // the exact length needed to escape the value.
        byte[] dummyArray = new byte[arraySize];

        Class.forName("org.postgresql.Driver");
        Connection conn = DriverManager.getConnection(driverURL);

        // Schema setup.
        Statement stmt = conn.createStatement();
        try {
            stmt.executeUpdate("DROP TABLE test_insert_garbage");
        } catch (SQLException e) {}
        stmt.executeUpdate("CREATE TABLE test_insert_garbage(data bytea)");

        // Testcase itself.
        conn.setAutoCommit(false);
        PreparedStatement inserter =
            conn.prepareStatement("INSERT INTO test_insert_garbage(data) VALUES (?)");

        // Main loop.
        java.util.Random randomizer = new java.util.Random();
        for (int i = 0; i < numInserts; ) {
            randomizer.nextBytes(dummyArray); // Hee hee.
            for (int j = 0; j < batchSize && i < numInserts; ++i, ++j) {
                inserter.setBytes(1, dummyArray);
                inserter.addBatch();
            }
            inserter.executeBatch();
            System.err.println("Inserted " + i + " rows");
        }

        // Done.
        conn.commit();
        System.err.println("Committed transaction.");
    }
}

pgsql-jdbc by date:

Previous
From: Oliver Jowett
Date:
Subject: Re: PreparedStatement parameters and mutable objects
Next
From: Dave Cramer
Date:
Subject: Re: PreparedStatement parameters and mutable objects