Performance analysis of server-parsed PreparedStatements ... - Mailing list pgsql-jdbc
From | James Robinson |
---|---|
Subject | Performance analysis of server-parsed PreparedStatements ... |
Date | |
Msg-id | 9F1FF8C6-8FD3-11D8-9377-000A9566A412@socialserve.com Whole thread Raw |
Responses |
Re: Performance analysis of server-parsed PreparedStatements
|
List | pgsql-jdbc |
I began doing various benchmarks with JBoss 3.2.3 maintaining a pool of prepared statements on postgres 7.4.2, with a few hacks to the JDBC driver to ultimately get performance higher than without any pooled prepared statements or server-side preparation. Things learned from this endeavor should be of users of JBoss + PG, the JDBC driver implementors, and backend coders. The benchmarking was done with a subset of our EJB app's regression test suite. All JDBC hacks were done in AbstractJdbc1Statement. Hacks: 1) Our primary keys are almost exclusively int8s. Tom's cross-integral index operator fix for 7.5 would reduce the count of local hacks in the JDBC driver to pull this off. Those hacks involve explicitly tacking on '::INT8' to the value string in setLong(int, long) and setObject(int, Object, int, int)'s "CASE Types.BIGINT" clause. Without this you get the horse-beaten-to-death issue of sequential scan instead of index scan for non-server prepared queries problem. 2) JBoss pays attention to the # rows updated count from update / insert statements. Server prepared queries currently don't return the updated row count. The asked-for protocol update would allow for this value to be returned through JDBC regardless of prepared or not. So, only server prepare if it is a select statement. 3) JBoss can (and will, depending on configuration settings + contents of various runtime caches) issue rather stupidly large queries along the lines of "select * from table where (id = 1) or (id = 2) or ... (id = 750). Yep -- 752 node OR grouping. I chose to skip auto-server preparation if the query had more than N parameters, with N currently set to 30, since those would most likely never get reused again. 4) When server preparing a query, initialize a counter to count how many times this prepared statement was reused. So, executing it twice would yield a reuse count of 1. Print out this count + the SQL query string when the prepared statement is closed. Hacks #2 and #3 amounted to: /** threshold for using a server-side prepared query -- never prepare if more than this. */ private final static int MAX_ARGS_TO_PREPARE = 30; public AbstractJdbc1Statement (BaseConnection connection, String p_sql) throws SQLException { this.connection = connection; parseSqlStmt(p_sql); // this allows Callable stmt to override if(isSingleSelect() && m_bindTypes.length < MAX_ARGS_TO_PREPARE) setUseServerPrepare(true); } I performed 3 sets of tests: 1) No server-preparation at all, no JBoss statement pooling (vanilla setup). 2) Server preparation for single-select statements, but no JBoss statement pooling. 3) Server preparation for single-select statements, JBoss pool of prepared statements set to 500. Each test run started with the same initial database state, including all vacuumed and analyzed. JBoss restarted every time to have it start with an empty EJB cache. Here are the results (minutes:seconds), with high and low run removed: Suite #1 (vanilla): 4:39.57 4:41.21 4:47.09 Suite #2 (server prepared but not jboss cached): 4:38.63 5:00.09 5:30.89 Suite #3 (server prepared, jboss cached): 3:53.79 3:56.74 3:55.50 Analysis: Suite #2 performed the worst. I expect it to be due to the overhead of closing the prepared statement right after using it (most likely exactly once), requiring the close operation to issue an additional jboss -> postgres round trip cost for the DEALLOCATE command. The standard deviation between runs is also much higher -- perhaps additional java GC overhead or the additional context switch jitters. It is extremely unlikely for the jboss 3.2 series CMP layer to reuse a preparedstatement within a single method. This can be easily seen from the prepared statement names, each statement being named via an auto-incrementing counter in the JDBC driver. A suite #2 run ends up preparing + closing 11,848 single select statements, and of those, absolutely none of them were reused within the lifetime of the prepared statement. Suite #3 performed the best, ~30 seconds better than 'vanilla'. Attached are the reuse counters (along with the query that was reused), sorted numerically by the reuse count. Many of the statements get reused quite a bit, but they are relatively simple -- we don't do many joins with more than 4 tables anywhere in our application, and looks like none more than 2 or 3 tables in this test suite. So, combined with pooled prepared statements, prepared statements *might* be a winner for an EJB application. JBoss issues many individual queries, but reducing them to a distinct set even across method calls brought them down by a factor of 20, indicating a high hit rate in the jboss prepared statement cache for this particular workload (for select statements, anyway). Given these results, I would recommend *not* turning on auto-server preparation in the JDBC driver for any sort of query, be they single select statements or updates, when the client -> backend protocol can support returning the proper row update count, since it can definitely impact performance of applications today. The 'prepare on passing threshold' patch would be great here, since it would have effectively made batch #2 equivalent to batch #1 (vanilla) -- no additional cost when PreparedStatements are not cached + reused. It would also enable performance just slightly worse than case #3 if the threshold were set to something like 3 or 4 when also using the JBoss statement cache, since the first 3 or so invocations would be non-prepared, then auto-switching to server-prepared when it really looked like this was a non-unique query (such as the 734-node OR queries). James ---- James Robinson Socialserve.com
Attachment
pgsql-jdbc by date: