Thread: odd jdbc driver synchronization issue
Hi, I'm trying to move my database platform from MSSQL Server to PostgreSQL. I have a load simulation program I wrote which I'm running against a test PostgreSQL instance. The application creates a number of client threads, each with their own database connection, and turns them loose on a workload. Currently, it's lagging the SQLServer test. I'm still tuning the postgresql.conf file, but I also noticed something very odd wrt the postmaster process. I see something like the following: 3 threads started, 0 done, pid 1 99% pid 2 0% pid 3 0% 2 threads started, 1 done, pid 1 0% pid 2 99% pid 3 0% 1 threads started, 2 done, pid 1 0% pid 2 0% pid 3 99% Even though all the treads are runnable, with their own connection, and their own postmaster process on the DB, they are still serializing their writes. If I look in the table where the data is being generated, I see rows inserted in order similar to what top is showing (Client3's data, then Client6's data, then Client1's data, etc). As this application is written in java and using the PostgreSQL jdbc driver I'm starting here, if you feel I should post elsewhere, let me know. Thanks, George --Details-- Client machine: Windows XP, Java 1.4.2 (build 1.4.2-b28) PostgreSQL JDBC driver 7.4, 7.4.1 (no difference) Database machine: Debian Linux 2.6.0-test11 #4 SMP i686 GNU/Linux 2x 2.4GHz Xeon w/HT, 1GB RAM PostgreSQL 7.4
One thing that would help is the code? Dave On Wed, 2003-12-31 at 15:06, George Lessmann wrote: > Hi, > > I'm trying to move my database platform from MSSQL Server to PostgreSQL. > I have a load simulation program I wrote which I'm running against a > test PostgreSQL instance. The application creates a number of client > threads, each with their own database connection, and turns them loose > on a workload. Currently, it's lagging the SQLServer test. I'm still > tuning the postgresql.conf file, but I also noticed something very odd > wrt the postmaster process. I see something like the following: > > 3 threads started, 0 done, > pid 1 99% > pid 2 0% > pid 3 0% > > 2 threads started, 1 done, > pid 1 0% > pid 2 99% > pid 3 0% > > 1 threads started, 2 done, > pid 1 0% > pid 2 0% > pid 3 99% > > Even though all the treads are runnable, with their own connection, and > their own postmaster process on the DB, they are still serializing their > writes. If I look in the table where the data is being generated, I see > rows inserted in order similar to what top is showing (Client3's data, > then Client6's data, then Client1's data, etc). > > As this application is written in java and using the PostgreSQL jdbc > driver I'm starting here, if you feel I should post elsewhere, let me > know. > > Thanks, > > George > > --Details-- > > Client machine: > Windows XP, Java 1.4.2 (build 1.4.2-b28) > PostgreSQL JDBC driver 7.4, 7.4.1 (no difference) > > Database machine: > Debian Linux 2.6.0-test11 #4 SMP i686 GNU/Linux > 2x 2.4GHz Xeon w/HT, 1GB RAM > PostgreSQL 7.4 > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly > -- Dave Cramer 519 939 0336 ICQ # 1467551
Dave, all, Sorry, I thought this was such an obvious thing to do; it would be some setting or something I missed (does the jdbc driver have any debug options?). Some of the code is proprietary. Hopefully, this will give you an idea of what is happening. Basically, create a number of clients, load them with parents, and then run() them. While running, a client inserts one parent and a number of children. package jdbhammer; /** * @author George Lessmann * copyright 2003, George Lessmann * code may not be reused without permission */ import java.util.ArrayList; import org.apache.commons.dbcp.DriverManagerConnectionFactory; import org.apache.commons.dbcp.PoolableConnectionFactory; import org.apache.commons.dbcp.PoolingDriver; import org.apache.commons.pool.impl.GenericObjectPool; public final class Main { private static String resultsLocation = null; private static String username = "postgres"; private static String password = ""; public static void main(String[] args) { if (args == null | "?".equals(args[0])) { System.out.println("Usage: java -jar jDBHammer.jar [#clients] [#dbconnections] [#parents] [#children per parent] [server] [username] [password]"); return; }// if if (args.length >= 5) { server = args[4]; }// if if (args.length >= 6) { username = args[5]; }// if if (args.length >= 7) { password = args[6]; }// if resultsLocation = "c:\\jdbhammer"+"-"+args[0]+"-"+args[1]+"-"+args[2]+"-"+args[3]+".txt"; System.out.println("Creating DB Pool."); try { Class.forName("org.postgresql.Driver"); } catch (Exception e) { System.out.println("Error! Unable to load JDBC Driver! Error was:" + e.getMessage()); return; }// try // create the pool GenericObjectPool connectionPool = new GenericObjectPool(null, Integer.parseInt(args[1]), GenericObjectPool.WHEN_EXHAUSTED_GROW, 0); DriverManagerConnectionFactory connectionFactory = new DriverManagerConnectionFactory("jdbc:postgresql://theserver/thedb", username, password); try { PoolableConnectionFactory poolableConnectionFactory = new PoolableConnectionFactory(connectionFactory,connectionPool,null,null,fal se,true); } catch (Exception e) { System.out.println("Error! Unable to create PoolableConnectionFactory! Error was:" + e.getMessage()); return; }// try PoolingDriver driver = new PoolingDriver(); driver.registerPool("jdbhammer",connectionPool); ArrayList clients = new ArrayList(); DbClient client = null; System.out.println("Creating Clients."); // load them for (int i=0, j=Integer.parseInt(args[0]); i < j; i++) { client = new DbClient("DbClient" + Integer.toString(i), resultsLocation); client.loadParents(Integer.parseInt(args[2]), Integer.parseInt(args[3])); System.out.println("Creating Clients: " +"DbClient" + Integer.toString(i)); clients.add(client); }// for System.out.println("Starting Threads."); Thread[] t = new Thread[clients.size()]; for (int i=0, j=clients.size(); i < j; i++) { client = (DbClient) clients.get(i); t[i] = new Thread(client, client.getClientId()); t[i].start(); }// for }// main }// Main // DBClient private ArrayList workload = new ArrayList(10000); private String clientId = null; private long startTime = 0L; private long endTime = 0L; private boolean isComplete = Boolean.FALSE.booleanValue(); private static final String procInsertParent = "{ call proc_insert_parent( ?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,? ) }"; private static final String procInsertChild = "{ call proc_insert_child( ?,?,?,?,?,?,?,?,?,?,?,?,? ) }"; private String resultsLocation = null; private Connection conn = null; private CallableStatement parentCall = null; private CallableStatement childCall = null; public DbClient(String clientId, String resultsLocation) { this.clientId = clientId; this.resultsLocation = resultsLocation; try { System.out.println("grabbing connection"); conn = DriverManager.getConnection("jdbc:apache:commons:dbcp:jdbhammer"); conn.setAutoCommit(false); parentCall = conn.prepareCall(procInsertParent); childCall = conn.prepareCall(procInsertChild); } catch (SQLException sql) { System.out.println("Error! There was an error creating the connection. Error was: " + sql.getMessage()); } // try } // DbClient public final void run() { Parent p = null; Child c = null; ArrayList children = null; startTime = System.currentTimeMillis(); try { for (int i = 0, j = workload.size(); i < j; i++) { p = (Parent) workload.get(i); try { p.setStartTime(System.currentTimeMillis()); p.insert(parentCall); p.setEndTime(System.currentTimeMillis()); } catch (SQLException sql) { System.out.println("Error! " + sql.getMessage()); continue; // skip the children } // try children = p.getChildren(); for (int k = 0, m = children.size(); k < m; k++) { child = (Child) children.get(k); try { child.setStartTime(System.currentTimeMillis()); child.insert(childCall); child.setEndTime(System.currentTimeMillis()); } catch (SQLException sql) { System.out.println("Error! " + sql.getMessage()); // do nothing, keep on running } // try } // for if ((i + 1 == j)) { conn.commit(); } // if } // for } catch (SQLException sql) { System.out.println("Error! " + sql.getMessage()); try { conn.rollback(); } catch (SQLException sqle) { } } finally { try { parentCall.close(); childCall.close(); conn.close(); } catch (SQLException sql) { } } endTime = System.currentTimeMillis(); isComplete = Boolean.TRUE.booleanValue(); // done System.out.println(Thread.currentThread().getName() + " finished!"); writeOutput(); } // run The parent and child insert() method sets a number of parameters one each callable statement, and then execute() a function on the PostgreSQL server, resulting in a row being inserted into a table. Let me know what other information I can provide, George -----Original Message----- From: Dave Cramer [mailto:pg@fastcrypt.com] Sent: Wednesday, December 31, 2003 2:28 PM To: George Lessmann Cc: pgsql-jdbc@postgresql.org Subject: Re: [JDBC] odd jdbc driver synchronization issue One thing that would help is the code? Dave On Wed, 2003-12-31 at 15:06, George Lessmann wrote: > Hi, > > I'm trying to move my database platform from MSSQL Server to PostgreSQL. > I have a load simulation program I wrote which I'm running against a > test PostgreSQL instance. The application creates a number of client > threads, each with their own database connection, and turns them loose > on a workload. Currently, it's lagging the SQLServer test. I'm still > tuning the postgresql.conf file, but I also noticed something very odd > wrt the postmaster process. I see something like the following: > > 3 threads started, 0 done, > pid 1 99% > pid 2 0% > pid 3 0% > > 2 threads started, 1 done, > pid 1 0% > pid 2 99% > pid 3 0% > > 1 threads started, 2 done, > pid 1 0% > pid 2 0% > pid 3 99% > > Even though all the treads are runnable, with their own connection, and > their own postmaster process on the DB, they are still serializing their > writes. If I look in the table where the data is being generated, I see > rows inserted in order similar to what top is showing (Client3's data, > then Client6's data, then Client1's data, etc). > > As this application is written in java and using the PostgreSQL jdbc > driver I'm starting here, if you feel I should post elsewhere, let me > know. > > Thanks, > > George > > --Details-- > > Client machine: > Windows XP, Java 1.4.2 (build 1.4.2-b28) > PostgreSQL JDBC driver 7.4, 7.4.1 (no difference) > > Database machine: > Debian Linux 2.6.0-test11 #4 SMP i686 GNU/Linux > 2x 2.4GHz Xeon w/HT, 1GB RAM > PostgreSQL 7.4 > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly > -- Dave Cramer 519 939 0336 ICQ # 1467551
George Lessmann wrote: > Dave, all, > > Sorry, I thought this was such an obvious thing to do; it would be some > setting or something I missed (does the jdbc driver have any debug > options?). I have multiple connections from a single JVM accessing the same DB concurrently with no problems; it's likely to be a quirk of the exact query you run or how you're using the JDBC API. > Some of the code is proprietary. Hopefully, this will give > you an idea of what is happening. Basically, create a number of clients, > load them with parents, and then run() them. While running, a client > inserts one parent and a number of children. Ideally we need a selfcontained, compilable, testcase that demonstrates the problem. Otherwise it's just guesswork. Some things you could try to narrow down the problem: - Run a couple of instances of your test in separate JVMs, see if you get any concurrency between them. - Get thread dumps from the JVM while your test is running (under solaris & linux sending SIGQUIT to the JVM does this) and see where your client threads are blocked. - Turn on statement logging (in postgresql.conf) and see exactly when the queries are being submitted. - Run the same queries by hand via psql and see if you get the same behaviour. -O
Oliver, all, The original design of my test application allocated one Connection and two CallableStatements per client (one for the parent insert and one for the child insert). As I reported earlier, each client would then be serialized, even though they each existed in their own thread/connection/callablestatement. I then refactored the application to allocate a connection per client, and prepareCall() before every execute(). This change resulted in each client being able to concurrently access the database, even though, in my opinion, it violates the whole reason behind a prepareCall() which is to reuse it often without recompilation. So, now I have a test application which I can run against my test Postgres server and my dev MSSQL Server box. With 5 client connections, each inserting 1100 rows from 1100 function calls, the Postgres/Java test is generating around 42 executions/second. MSSQL Server/Java is generating around 368 executions/second. I then created a sql file with the 1100 function calls, and it ran at about 580 executions/second in a pgAdmin III query window. That's where I would like the Postgres/JDBC test to be at, but obviously something is amiss with the Postgres JDBC driver. I will work on getting this test application shareable, but does anyone else have a cross-db loadtest app using JDBC? Any help would be appreciated; I'm off to look at the ODBC driver... George ps-- The Postgres test DB server is a 2x2.4GHz Xeon, 1 GB ram, 10k SCSI mirror, debian 2.6.0-test11 The MSSQL DB server is a 1x2.8Ghz Celeron, 1 GB ram, 7200RPM IDE mirror, ms2k adv srvr -----Original Message----- From: pgsql-jdbc-owner@postgresql.org [mailto:pgsql-jdbc-owner@postgresql.org] On Behalf Of Oliver Jowett Sent: Wednesday, December 31, 2003 6:58 PM To: George Lessmann Cc: pg@fastcrypt.com; pgsql-jdbc@postgresql.org Subject: Re: [JDBC] odd jdbc driver synchronization issue George Lessmann wrote: > Dave, all, > > Sorry, I thought this was such an obvious thing to do; it would be some > setting or something I missed (does the jdbc driver have any debug > options?). I have multiple connections from a single JVM accessing the same DB concurrently with no problems; it's likely to be a quirk of the exact query you run or how you're using the JDBC API. > Some of the code is proprietary. Hopefully, this will give > you an idea of what is happening. Basically, create a number of clients, > load them with parents, and then run() them. While running, a client > inserts one parent and a number of children. Ideally we need a selfcontained, compilable, testcase that demonstrates the problem. Otherwise it's just guesswork. Some things you could try to narrow down the problem: - Run a couple of instances of your test in separate JVMs, see if you get any concurrency between them. - Get thread dumps from the JVM while your test is running (under solaris & linux sending SIGQUIT to the JVM does this) and see where your client threads are blocked. - Turn on statement logging (in postgresql.conf) and see exactly when the queries are being submitted. - Run the same queries by hand via psql and see if you get the same behaviour. -O ---------------------------(end of broadcast)--------------------------- TIP 6: Have you searched our list archives? http://archives.postgresql.org
On Mon, 5 Jan 2004, George Lessmann wrote: > > The original design of my test application allocated one Connection and > two CallableStatements per client (one for the parent insert and one for > the child insert). As I reported earlier, each client would then be > serialized, even though they each existed in their own > thread/connection/callablestatement. > > I then refactored the application to allocate a connection per client, > and prepareCall() before every execute(). This change resulted in each > client being able to concurrently access the database, even though, in > my opinion, it violates the whole reason behind a prepareCall() which is > to reuse it often without recompilation. I've attached a program I wrote to try and verify your claims. I see no serialization despite a single prepareCall and many executes. This program runs with java CallTest <num clients> <num runs per client> each client does a loop over numRuns, calling a SELECT stored procedure and then inserts that value into a table via another stored procedure. Let me know if you see a problem with this test or how your application differs. Kris Jurka
Attachment
On Mon, Jan 05, 2004 at 03:01:16PM -0600, George Lessmann wrote: > Oliver, all, > > The original design of my test application allocated one Connection and > two CallableStatements per client (one for the parent insert and one for > the child insert). As I reported earlier, each client would then be > serialized, even though they each existed in their own > thread/connection/callablestatement. This shouldn't happen. We really need a standalone testcase to locate the problem down, though. > I then refactored the application to allocate a connection per client, > and prepareCall() before every execute(). This change resulted in each > client being able to concurrently access the database, even though, in > my opinion, it violates the whole reason behind a prepareCall() which is > to reuse it often without recompilation. As you say, this shouldn't be necessary. One possibility is that your original code was (accidentally) using CallableStatements that originated from the same Connection for different clients. In this case, execution of those statements will serialize (as they contend for the same underlying physical connection). If the CallableStatements are indeed from different Connections, I can't see how they'd serialize on the Java side .. the two Connection objects are entirely independent and don't hold references to each other at all. IIRC you were using a connection pool -- is it possible the contention is happening in the pool implementation? Some pool implementations provide their own proxying Connection and *Statement implementations; perhaps the pool's CallableStatement is causing the problem. Did you try any of my suggestions below? If you can produce a thread dump while some of the clients are blocked we can probably narrow the problem down without a testcase. > Ideally we need a selfcontained, compilable, testcase that demonstrates > the problem. Otherwise it's just guesswork. > > Some things you could try to narrow down the problem: > > - Run a couple of instances of your test in separate JVMs, see if you > get any concurrency between them. > > - Get thread dumps from the JVM while your test is running (under > solaris & linux sending SIGQUIT to the JVM does this) and see where your > > client threads are blocked. > > - Turn on statement logging (in postgresql.conf) and see exactly when > the queries are being submitted. > > - Run the same queries by hand via psql and see if you get the same > behaviour. > > -O -O
Kris, Thanks for the test program. It helped a lot, and I believe I've found my problem. The main issue seems to be the transaction isolation and locking behavior of PostgreSQL as compared to MSSQL Server, and the way I was writing my code (specifically, commits) between all my different tests. My workload uses two stored procedures; each one issues a few selects, possibly updates, and then an insert. The update branch is what caused the issues, as a table could be concurrently updated by any number of clients (basically, something like balance = balance + amount). Here's my work section, modeled after your example: public void execute(int num) throws SQLException { for (int i=0; i<num; i++) { parentCall.execute(); // #1 for (int j=0; j<9; j++) { childCall.execute(); // #1 } // #2 } // #3 conn.commit(); } #1 a conn.commit() placed here allowed multiple clients/postmasters to work. No postmaster process went above 8% cpu. #2 a conn.commit() placed here allowed multiple clients/postmasters to work. Faster than #1, naturally. No postmaster process went above 25% cpu. #3 a conn.commit() placed here will serialize all clients/postmasters (if you have updates in your functions). 1 postmaster process always at 99% cpu. Although it finished with the highest throughput, it's not the correct isolation level for the workload. Here's some comparison data for those who are interested, Postgres #1 39.38 tps Postgres #2 108.50 tps Postgres #3 563.76 tps (near pgAdmin levels?) MSSQL #1 389.81 tps MSSQL #2 459.18 tps MSSQL #3 767.23 tps Thanks to Dave, Oliver, and Kris for the help. George -----Original Message----- From: pgsql-jdbc-owner@postgresql.org [mailto:pgsql-jdbc-owner@postgresql.org] On Behalf Of Kris Jurka Sent: Monday, January 05, 2004 4:42 PM To: George Lessmann Cc: pgsql-jdbc@postgresql.org Subject: Re: [JDBC] odd jdbc driver synchronization issue On Mon, 5 Jan 2004, George Lessmann wrote: > > The original design of my test application allocated one Connection and > two CallableStatements per client (one for the parent insert and one for > the child insert). As I reported earlier, each client would then be > serialized, even though they each existed in their own > thread/connection/callablestatement. > > I then refactored the application to allocate a connection per client, > and prepareCall() before every execute(). This change resulted in each > client being able to concurrently access the database, even though, in > my opinion, it violates the whole reason behind a prepareCall() which is > to reuse it often without recompilation. I've attached a program I wrote to try and verify your claims. I see no serialization despite a single prepareCall and many executes. This program runs with java CallTest <num clients> <num runs per client> each client does a loop over numRuns, calling a SELECT stored procedure and then inserts that value into a table via another stored procedure. Let me know if you see a problem with this test or how your application differs. Kris Jurka
> Here's my work section, modeled after your example: > > public void execute(int num) throws SQLException { > for (int i=0; i<num; i++) { > > parentCall.execute(); > // #1 > for (int j=0; j<9; j++) { > childCall.execute(); > // #1 > } > // #2 > } > // #3 > conn.commit(); > } In this loop are you using i or j to indicate what rows to operate on in the stored procedure calls? If so there will be huge contention as each thread will try to do the exact same things. Using clientNumber*num+i or something like that would give a more realistic test if you're supposed to be dividing up work between these threads. Kris Jurka