Thread: performance issue

performance issue

From
Steven varga
Date:
hi,

having about 10^6 records in a table indexed on names (http://www.upcdatabase.com)
and doing a query from psql I get response time in millisec order on the
other hand when executing the same query through JDBC it hangs about 80
seconds.
This is about 10^5 decrease in speed.

After profiling the JDBC driver simply waits for data
PGStream.ReceiveChar() spending enough time to do a sequencial scan on all
10^6 records.

  Can anyone give a suggestion why is this lag?



PSQL:

test=# EXPLAIN ANALYZE SELECT count(*) from upc  where name like 'Aba%';
QUERY PLAN
  Aggregate  (cost=14897.01..14897.02 rows=1 width=0) (actual
time=0.380..0.381 rows=1 loops=1)
    ->  Bitmap Heap Scan on upc (cost=57.67..14885.47 rows=4612
width=0) (actual time=0.216..0.326 rows=13 loops=1)
          Filter: ((name)::text ~~ 'Aba%'::text)
          ->  Bitmap Index Scan on upc_name  (cost=0.00..57.67
rows=4612 width=0) (actual time=0.155..0.155 rows=13 loops=1)
                Index Cond: (((name)::text >= 'Aba'::character varying) AND
((name)::text < 'Abb'::character varying))
  Total runtime: 0.670 ms
(6 rows)


JDBC:

  PreparedStatement count =
         connection.prepareStatement("SELECT count(*) FROM
     upc WHERE name like upper(?)||'%' ");

  count.setString(1,"Aba");
  ResultSet  rs = count.executeQuery(); //<== this takes about 82 seconds
to execute


psql (PostgreSQL) 8.1.5
postgresql-jdbc-8.3-603



Re: performance issue

From
Kris Jurka
Date:
Steven varga wrote:
>
> having about 10^6 records in a table indexed on names and doing a
> query from psql I get response time in millisec order on the other
> hand when executing the same query through JDBC it hangs about 80
> seconds.
>
>  PreparedStatement count =
>         connection.prepareStatement("SELECT count(*) FROM
>     upc WHERE name like upper(?)||'%' ");
>

When using a PreparedStatement the server must come up with a plan that
works for all parameter values.  Since the parameter is unknown, the
generated plan doesn't use an index.  Your options are to interpolate
the parameter yourself or connect using the protocolVersion=2 URL option
which will make the driver do the interpolation prior to passing the
query on to the server.

Kris Jurka

Re: performance issue

From
Guillaume Cottenceau
Date:
Kris Jurka <books 'at' ejurka.com> writes:

> When using a PreparedStatement the server must come up with a plan
> that works for all parameter values.  Since the parameter is unknown,
> the generated plan doesn't use an index.  Your options are to
> interpolate the parameter yourself or connect using the
> protocolVersion=2 URL option which will make the driver do the
> interpolation prior to passing the query on to the server.

Kris, actually, is this behaviour considered a bug or a feature?
It seems that moving from protocol v2 to v3 means a disastrous
decrease of performance in this kind of situation - is it
considered an unavoidable tradeoff for other increases of
performances? (I guess that when executing the same prepared
statement multiple times, it would be a win to save SQL parsing
and parameters interpolation) It seems that it's a common case to
hit this problem, and again it can mean orders of magnitude
decrease of performance, hence maybe the programmer could benefit
from a programmatic way to branch between the two behaviours at
the prepared statement level, or at least at the connection level
(shouldn't it be adviced to avoid manual parameter interpolation
to prevent from exposing to SQL injections because of not
sanitized enough approach, or have a driver-provided parameter
interpolation facility that we could trust)? Unless there's
something in the JDBC specifications, or at the implementation
level, which make my question stupid..

--
Guillaume Cottenceau, MNC Mobile News Channel SA, an Alcatel-Lucent Company
Av. de la Gare 10, 1003 Lausanne, Switzerland - direct +41 21 317 50 36

Re: performance issue

From
Oliver Jowett
Date:
Guillaume Cottenceau wrote:
> Kris Jurka <books 'at' ejurka.com> writes:
>
>> When using a PreparedStatement the server must come up with a plan
>> that works for all parameter values.  Since the parameter is unknown,
>> the generated plan doesn't use an index.  Your options are to
>> interpolate the parameter yourself or connect using the
>> protocolVersion=2 URL option which will make the driver do the
>> interpolation prior to passing the query on to the server.
>
> Kris, actually, is this behaviour considered a bug or a feature?

It was discussed at some length recently, check the list archives. IIRC
the consensus was to fix the server - essentially, add a mode where the
driver can tell the server "please replan this statement every time you
get new parameters", which would allow all the normal planner
optimizations (constant folding, etc) to be made since it does not need
to produce a general-purpose plan for all possible parameter values.

-O

Re: performance issue

From
Mark Lewis
Date:
On Wed, 2008-03-26 at 23:46 -0500, Steven varga wrote:
> hi,
>
> having about 10^6 records in a table indexed on names (http://www.upcdatabase.com)
> and doing a query from psql I get response time in millisec order on the
> other hand when executing the same query through JDBC it hangs about 80
> seconds.
> This is about 10^5 decrease in speed.
>
> After profiling the JDBC driver simply waits for data
> PGStream.ReceiveChar() spending enough time to do a sequencial scan on all
> 10^6 records.
>
>   Can anyone give a suggestion why is this lag?
>
>
>
> PSQL:
>
> test=# EXPLAIN ANALYZE SELECT count(*) from upc  where name like 'Aba%';
> QUERY PLAN
>   Aggregate  (cost=14897.01..14897.02 rows=1 width=0) (actual
> time=0.380..0.381 rows=1 loops=1)
>     ->  Bitmap Heap Scan on upc (cost=57.67..14885.47 rows=4612
> width=0) (actual time=0.216..0.326 rows=13 loops=1)
>           Filter: ((name)::text ~~ 'Aba%'::text)
>           ->  Bitmap Index Scan on upc_name  (cost=0.00..57.67
> rows=4612 width=0) (actual time=0.155..0.155 rows=13 loops=1)
>                 Index Cond: (((name)::text >= 'Aba'::character varying) AND
> ((name)::text < 'Abb'::character varying))
>   Total runtime: 0.670 ms
> (6 rows)
>
>
> JDBC:
>
>   PreparedStatement count =
>          connection.prepareStatement("SELECT count(*) FROM
>      upc WHERE name like upper(?)||'%' ");
>
>   count.setString(1,"Aba");
>   ResultSet  rs = count.executeQuery(); //<== this takes about 82 seconds
> to execute

Wait a minute, the statements you're running in psql and jdbc aren't
even the same.  In psql you are supplying a literal string, but in the
jdbc version you are using the upper function and the concatenate
function to assemble the like clause.  The use of those functions would
preclude most optimizations anyway-- I suspect that running "select
count(*) from upc where name like upper('Abc')||'%'" from psql would be
slow too.

That said, to get fast behavior you would probably need to do both:
change the query and also either set protocolVersion=2 or insert the
string yourself.

-- Mark Lewis