Thread: performance issue
hi, having about 10^6 records in a table indexed on names (http://www.upcdatabase.com) and doing a query from psql I get response time in millisec order on the other hand when executing the same query through JDBC it hangs about 80 seconds. This is about 10^5 decrease in speed. After profiling the JDBC driver simply waits for data PGStream.ReceiveChar() spending enough time to do a sequencial scan on all 10^6 records. Can anyone give a suggestion why is this lag? PSQL: test=# EXPLAIN ANALYZE SELECT count(*) from upc where name like 'Aba%'; QUERY PLAN Aggregate (cost=14897.01..14897.02 rows=1 width=0) (actual time=0.380..0.381 rows=1 loops=1) -> Bitmap Heap Scan on upc (cost=57.67..14885.47 rows=4612 width=0) (actual time=0.216..0.326 rows=13 loops=1) Filter: ((name)::text ~~ 'Aba%'::text) -> Bitmap Index Scan on upc_name (cost=0.00..57.67 rows=4612 width=0) (actual time=0.155..0.155 rows=13 loops=1) Index Cond: (((name)::text >= 'Aba'::character varying) AND ((name)::text < 'Abb'::character varying)) Total runtime: 0.670 ms (6 rows) JDBC: PreparedStatement count = connection.prepareStatement("SELECT count(*) FROM upc WHERE name like upper(?)||'%' "); count.setString(1,"Aba"); ResultSet rs = count.executeQuery(); //<== this takes about 82 seconds to execute psql (PostgreSQL) 8.1.5 postgresql-jdbc-8.3-603
Steven varga wrote: > > having about 10^6 records in a table indexed on names and doing a > query from psql I get response time in millisec order on the other > hand when executing the same query through JDBC it hangs about 80 > seconds. > > PreparedStatement count = > connection.prepareStatement("SELECT count(*) FROM > upc WHERE name like upper(?)||'%' "); > When using a PreparedStatement the server must come up with a plan that works for all parameter values. Since the parameter is unknown, the generated plan doesn't use an index. Your options are to interpolate the parameter yourself or connect using the protocolVersion=2 URL option which will make the driver do the interpolation prior to passing the query on to the server. Kris Jurka
Kris Jurka <books 'at' ejurka.com> writes: > When using a PreparedStatement the server must come up with a plan > that works for all parameter values. Since the parameter is unknown, > the generated plan doesn't use an index. Your options are to > interpolate the parameter yourself or connect using the > protocolVersion=2 URL option which will make the driver do the > interpolation prior to passing the query on to the server. Kris, actually, is this behaviour considered a bug or a feature? It seems that moving from protocol v2 to v3 means a disastrous decrease of performance in this kind of situation - is it considered an unavoidable tradeoff for other increases of performances? (I guess that when executing the same prepared statement multiple times, it would be a win to save SQL parsing and parameters interpolation) It seems that it's a common case to hit this problem, and again it can mean orders of magnitude decrease of performance, hence maybe the programmer could benefit from a programmatic way to branch between the two behaviours at the prepared statement level, or at least at the connection level (shouldn't it be adviced to avoid manual parameter interpolation to prevent from exposing to SQL injections because of not sanitized enough approach, or have a driver-provided parameter interpolation facility that we could trust)? Unless there's something in the JDBC specifications, or at the implementation level, which make my question stupid.. -- Guillaume Cottenceau, MNC Mobile News Channel SA, an Alcatel-Lucent Company Av. de la Gare 10, 1003 Lausanne, Switzerland - direct +41 21 317 50 36
Guillaume Cottenceau wrote: > Kris Jurka <books 'at' ejurka.com> writes: > >> When using a PreparedStatement the server must come up with a plan >> that works for all parameter values. Since the parameter is unknown, >> the generated plan doesn't use an index. Your options are to >> interpolate the parameter yourself or connect using the >> protocolVersion=2 URL option which will make the driver do the >> interpolation prior to passing the query on to the server. > > Kris, actually, is this behaviour considered a bug or a feature? It was discussed at some length recently, check the list archives. IIRC the consensus was to fix the server - essentially, add a mode where the driver can tell the server "please replan this statement every time you get new parameters", which would allow all the normal planner optimizations (constant folding, etc) to be made since it does not need to produce a general-purpose plan for all possible parameter values. -O
On Wed, 2008-03-26 at 23:46 -0500, Steven varga wrote: > hi, > > having about 10^6 records in a table indexed on names (http://www.upcdatabase.com) > and doing a query from psql I get response time in millisec order on the > other hand when executing the same query through JDBC it hangs about 80 > seconds. > This is about 10^5 decrease in speed. > > After profiling the JDBC driver simply waits for data > PGStream.ReceiveChar() spending enough time to do a sequencial scan on all > 10^6 records. > > Can anyone give a suggestion why is this lag? > > > > PSQL: > > test=# EXPLAIN ANALYZE SELECT count(*) from upc where name like 'Aba%'; > QUERY PLAN > Aggregate (cost=14897.01..14897.02 rows=1 width=0) (actual > time=0.380..0.381 rows=1 loops=1) > -> Bitmap Heap Scan on upc (cost=57.67..14885.47 rows=4612 > width=0) (actual time=0.216..0.326 rows=13 loops=1) > Filter: ((name)::text ~~ 'Aba%'::text) > -> Bitmap Index Scan on upc_name (cost=0.00..57.67 > rows=4612 width=0) (actual time=0.155..0.155 rows=13 loops=1) > Index Cond: (((name)::text >= 'Aba'::character varying) AND > ((name)::text < 'Abb'::character varying)) > Total runtime: 0.670 ms > (6 rows) > > > JDBC: > > PreparedStatement count = > connection.prepareStatement("SELECT count(*) FROM > upc WHERE name like upper(?)||'%' "); > > count.setString(1,"Aba"); > ResultSet rs = count.executeQuery(); //<== this takes about 82 seconds > to execute Wait a minute, the statements you're running in psql and jdbc aren't even the same. In psql you are supplying a literal string, but in the jdbc version you are using the upper function and the concatenate function to assemble the like clause. The use of those functions would preclude most optimizations anyway-- I suspect that running "select count(*) from upc where name like upper('Abc')||'%'" from psql would be slow too. That said, to get fast behavior you would probably need to do both: change the query and also either set protocolVersion=2 or insert the string yourself. -- Mark Lewis