Re: PreparedStatement parameters and mutable objects - Mailing list pgsql-jdbc

From Dave Cramer
Subject Re: PreparedStatement parameters and mutable objects
Date
Msg-id 1073997299.1149.26.camel@localhost.localdomain
Whole thread Raw
In response to Re: PreparedStatement parameters and mutable objects  (Oliver Jowett <oliver@opencloud.com>)
List pgsql-jdbc
Oliver,


OK, now how do we test your estimates? If it truly shows this kind of
improvement you would sway me!

Dave
On Mon, 2004-01-12 at 21:02, Oliver Jowett wrote:
> Paul Thomas wrote:
> > On 12/01/2004 00:39 Oliver Jowett wrote:
> >> The problem is that it's very specific to the application workload;
> >> what case do we measure?
> >
> > I think you would need something which closely models you app's work load.
> >
> >> The reason I'm interested in doing this for is not the direct CPU
> >> overhead of object creation (none of the JDBC code is on our main
> >> execution path), but the effect that object creation has on GC
> >> interval and pause. We're running a low-latency app where an extra
> >> 50ms pause due to GC has a large impact on our latency figures .. so
> >> the less garbage we generate in the first place, the better. We could
> >> farm the JDBC work out to a separate VM, but that gets complex quite
> >> fast.
> >
> >
> > Reducing the amount of garbage created is never a bad thing. Do you have
> > an estimate of the % reduction you think you could achieve?
>
> See the attached testcase, it does batched inserts of byte arrays. I've
> run this as:
>
>    java -Xrunhprof:heap=all,file=insertgarbage.txt,depth=10,cutoff=0 \
>       -verbose:gc -cp .:/net/java/java_libs/postgresql.jar \
>       TestInsertGarbage \
>       'jdbc:postgresql:rhino?user=oliver&password=oliver' \
>       100000 100 100
>
> which does 100k inserts each with a 100 byte array, in batches of 100.
> This is roughly the same as our application workload when it is doing a
> DB resync of a reasonably-sized provisioning database.
>
> In the resulting heap dump, the allocation hotspots are:
>
>   QueryExecutor.sendQueryV2 -> Encoding.encode -> String.getBytes
>    (100000 byte[], 143738400 bytes)
>    (100000 byte[], 37144800 bytes)
>    (100000 byte[], 20000000 bytes) // This is encoding the query text
>   Statement.setBytes -> setString -> escapeString -> StringBuffer.append
>    (63000 char[], 81608000 bytes)
>    (19200 char[], 25018400 bytes)
>    (17800 char[], 23091200 bytes)
>   Statement.setBytes -> setString -> StringBuffer.ensureCapacity
>    (100000 char[], 65445600 bytes)
>   Statement.setBytes -> PGbytea.toPGString -> StringBuffer.<init>
>    (100000 char[], 41600000 bytes)
>   Statement.setBytes -> PGbytea.toPGString -> StringBuffer.append
>    (21800 char[], 17788800 bytes)
>    (21400 char[], 17462400 bytes)
>    (21300 char[], 17380800 bytes)
>    (20800 char[], 16972800 bytes)
>    (14400 char[], 11750400 bytes)
>   QueryExecutor.executeV2 -> Jdbc3PreparedStatement.createResultSet
>    (100000 Jdbc3ResultSet, 13600000 bytes)
>
> All other allocation points allocate <10MB. There's quite a bit of other
> object allocation that could be cleaned up, e.g. repeated allocation of
> QueryExecutor objects.
>
> Total allocations were around 631800000 bytes. I estimate that streaming
> the parameter data would generate about 499000000 bytes less garbage
> (basically all of the above goes away except for encoding the query text
> and creating the resultsets), which is a 79% reduction (this is actually
> a lot more than I expected!).
>
> There are improvements we can make here without streaming, though. Not
> creating resultset objects for updates, and doing all necessary escaping
> in a single pass rather than two, both seem like low-hanging fruit.
>
> -O
>
> ______________________________________________________________________
>
> import java.sql.*;
>
> public class TestInsertGarbage {
>     public static void main(String args[]) throws Exception {
>         if (args.length < 4) {
>             System.err.println("usage: java TestInsertGarbage <jdbc: URL> <# of inserts> <array size> <batch size>");
>             return;
>         }
>
>         String driverURL = args[0];
>         int numInserts = Integer.parseInt(args[1]);
>         int arraySize = Integer.parseInt(args[2]);
>         int batchSize = Integer.parseInt(args[3]);
>
>         // We don't want to count the data to be inserted itself, since that's
>         // mostly static and isn't garbage. So just create a single array
>         // to represent all the data. We'll reuse that on each insert.
>
>         // we also give it a random distribution of values since that affects
>         // the exact length needed to escape the value.
>         byte[] dummyArray = new byte[arraySize];
>
>         Class.forName("org.postgresql.Driver");
>         Connection conn = DriverManager.getConnection(driverURL);
>
>         // Schema setup.
>         Statement stmt = conn.createStatement();
>         try {
>             stmt.executeUpdate("DROP TABLE test_insert_garbage");
>         } catch (SQLException e) {}
>         stmt.executeUpdate("CREATE TABLE test_insert_garbage(data bytea)");
>
>         // Testcase itself.
>         conn.setAutoCommit(false);
>         PreparedStatement inserter =
>             conn.prepareStatement("INSERT INTO test_insert_garbage(data) VALUES (?)");
>
>         // Main loop.
>         java.util.Random randomizer = new java.util.Random();
>         for (int i = 0; i < numInserts; ) {
>             randomizer.nextBytes(dummyArray); // Hee hee.
>             for (int j = 0; j < batchSize && i < numInserts; ++i, ++j) {
>                 inserter.setBytes(1, dummyArray);
>                 inserter.addBatch();
>             }
>             inserter.executeBatch();
>             System.err.println("Inserted " + i + " rows");
>         }
>
>         // Done.
>         conn.commit();
>         System.err.println("Committed transaction.");
>     }
> }
>
> ______________________________________________________________________
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
--
Dave Cramer
519 939 0336
ICQ # 1467551


pgsql-jdbc by date:

Previous
From: Csaba Nagy
Date:
Subject: Re: COPY support implemented
Next
From: Anderson dos Santos
Date:
Subject: to leave