Re: PreparedStatement parameters and mutable objects - Mailing list pgsql-jdbc
From | Dave Cramer |
---|---|
Subject | Re: PreparedStatement parameters and mutable objects |
Date | |
Msg-id | 1073997299.1149.26.camel@localhost.localdomain Whole thread Raw |
In response to | Re: PreparedStatement parameters and mutable objects (Oliver Jowett <oliver@opencloud.com>) |
List | pgsql-jdbc |
Oliver, OK, now how do we test your estimates? If it truly shows this kind of improvement you would sway me! Dave On Mon, 2004-01-12 at 21:02, Oliver Jowett wrote: > Paul Thomas wrote: > > On 12/01/2004 00:39 Oliver Jowett wrote: > >> The problem is that it's very specific to the application workload; > >> what case do we measure? > > > > I think you would need something which closely models you app's work load. > > > >> The reason I'm interested in doing this for is not the direct CPU > >> overhead of object creation (none of the JDBC code is on our main > >> execution path), but the effect that object creation has on GC > >> interval and pause. We're running a low-latency app where an extra > >> 50ms pause due to GC has a large impact on our latency figures .. so > >> the less garbage we generate in the first place, the better. We could > >> farm the JDBC work out to a separate VM, but that gets complex quite > >> fast. > > > > > > Reducing the amount of garbage created is never a bad thing. Do you have > > an estimate of the % reduction you think you could achieve? > > See the attached testcase, it does batched inserts of byte arrays. I've > run this as: > > java -Xrunhprof:heap=all,file=insertgarbage.txt,depth=10,cutoff=0 \ > -verbose:gc -cp .:/net/java/java_libs/postgresql.jar \ > TestInsertGarbage \ > 'jdbc:postgresql:rhino?user=oliver&password=oliver' \ > 100000 100 100 > > which does 100k inserts each with a 100 byte array, in batches of 100. > This is roughly the same as our application workload when it is doing a > DB resync of a reasonably-sized provisioning database. > > In the resulting heap dump, the allocation hotspots are: > > QueryExecutor.sendQueryV2 -> Encoding.encode -> String.getBytes > (100000 byte[], 143738400 bytes) > (100000 byte[], 37144800 bytes) > (100000 byte[], 20000000 bytes) // This is encoding the query text > Statement.setBytes -> setString -> escapeString -> StringBuffer.append > (63000 char[], 81608000 bytes) > (19200 char[], 25018400 bytes) > (17800 char[], 23091200 bytes) > Statement.setBytes -> setString -> StringBuffer.ensureCapacity > (100000 char[], 65445600 bytes) > Statement.setBytes -> PGbytea.toPGString -> StringBuffer.<init> > (100000 char[], 41600000 bytes) > Statement.setBytes -> PGbytea.toPGString -> StringBuffer.append > (21800 char[], 17788800 bytes) > (21400 char[], 17462400 bytes) > (21300 char[], 17380800 bytes) > (20800 char[], 16972800 bytes) > (14400 char[], 11750400 bytes) > QueryExecutor.executeV2 -> Jdbc3PreparedStatement.createResultSet > (100000 Jdbc3ResultSet, 13600000 bytes) > > All other allocation points allocate <10MB. There's quite a bit of other > object allocation that could be cleaned up, e.g. repeated allocation of > QueryExecutor objects. > > Total allocations were around 631800000 bytes. I estimate that streaming > the parameter data would generate about 499000000 bytes less garbage > (basically all of the above goes away except for encoding the query text > and creating the resultsets), which is a 79% reduction (this is actually > a lot more than I expected!). > > There are improvements we can make here without streaming, though. Not > creating resultset objects for updates, and doing all necessary escaping > in a single pass rather than two, both seem like low-hanging fruit. > > -O > > ______________________________________________________________________ > > import java.sql.*; > > public class TestInsertGarbage { > public static void main(String args[]) throws Exception { > if (args.length < 4) { > System.err.println("usage: java TestInsertGarbage <jdbc: URL> <# of inserts> <array size> <batch size>"); > return; > } > > String driverURL = args[0]; > int numInserts = Integer.parseInt(args[1]); > int arraySize = Integer.parseInt(args[2]); > int batchSize = Integer.parseInt(args[3]); > > // We don't want to count the data to be inserted itself, since that's > // mostly static and isn't garbage. So just create a single array > // to represent all the data. We'll reuse that on each insert. > > // we also give it a random distribution of values since that affects > // the exact length needed to escape the value. > byte[] dummyArray = new byte[arraySize]; > > Class.forName("org.postgresql.Driver"); > Connection conn = DriverManager.getConnection(driverURL); > > // Schema setup. > Statement stmt = conn.createStatement(); > try { > stmt.executeUpdate("DROP TABLE test_insert_garbage"); > } catch (SQLException e) {} > stmt.executeUpdate("CREATE TABLE test_insert_garbage(data bytea)"); > > // Testcase itself. > conn.setAutoCommit(false); > PreparedStatement inserter = > conn.prepareStatement("INSERT INTO test_insert_garbage(data) VALUES (?)"); > > // Main loop. > java.util.Random randomizer = new java.util.Random(); > for (int i = 0; i < numInserts; ) { > randomizer.nextBytes(dummyArray); // Hee hee. > for (int j = 0; j < batchSize && i < numInserts; ++i, ++j) { > inserter.setBytes(1, dummyArray); > inserter.addBatch(); > } > inserter.executeBatch(); > System.err.println("Inserted " + i + " rows"); > } > > // Done. > conn.commit(); > System.err.println("Committed transaction."); > } > } > > ______________________________________________________________________ > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster -- Dave Cramer 519 939 0336 ICQ # 1467551
pgsql-jdbc by date: