Thread: Strange discrepancy in query performance...

Strange discrepancy in query performance...

From
"Jason L. Buberel"
Date:
I'm hoping that someone on the list can help me understand an apparent
discrepancy in the performance information that I'm collecting on a
particularly troublesome query.

The configuration: pg-8.2.4 on RHEL4. log_min_duration_statement = 1m.

In my syslog output, I see entries indicating that the
JDBC-driver-originated query on a table named 'city_summary' are taking
upwards of 300 seconds:

Oct  1 18:27:47 srv3 postgres-8.2[1625]: [12-1]
LOG:  duration: 307077.037 ms  execute S_42: select * from city_summary
      where  state = $1 and city_master_id = $2 and res_type =
'single_family' and date = $3
      and range = 90 and zip = $4 and quartile  = '__ALL'
DETAIL:  parameters: $1 = 'CA', $2 = '291', $3 = '2007-09-28', $4 = '__ALL'

However, if I run the same query on the same host at the same time that
the Java application is running, but from the psql  command line, it
takes only 0.37 seconds:

 > time /opt/postgres-8.2.4/bin/psql --port 54824  -U postgres -d
altos_research  -c 'select fact_id from city_summary where state =
\'CA\' and city_master_id = 291 and zip = \'__ALL\' and quartile =
\'__ALL\' and res_type = \'single_family\' and range = \'90\' and date =
\'2007-09-28\';'

 fact_id
----------
 46624925
(1 row)

0.00user 0.00system 0:00.37elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+285minor)pagefaults 0swaps

The output of 'explain' seems to indicate that the right index is being
used:

QUERY PLAN
 Index Scan using city_summary_pkey on city_summary  (cost=0.00..12.27
rows=1 width=2721)
   Index Cond: ((date = '2007-09-28'::text) AND
                (state = 'CA'::text) AND
                (city_master_id = 334::bigint) AND
                (quartile = '__ALL'::text) AND
                (range = '90'::text))
   Filter: ((zip = '__ALL'::text) AND ((res_type)::text =
'single_family'::text))
(3 rows)

The index looks like this:

# \d city_summary_pkey
Index "public.city_summary_pkey"
       Column       |  Type
--------------------+---------
 date               | text
 state              | text
 city_master_id     | bigint
 zip_master_id      | integer
 res_type_master_id | bigint
 quartile           | text
 range              | text
primary key, btree, for table "public.city_summary"

Any ideas on why I am seeing such a big difference between the two
measurements (JDBC/syslog vs. command line)?

Thanks,
Jason






Re: Strange discrepancy in query performance...

From
Tom Lane
Date:
"Jason L. Buberel" <jason@buberel.org> writes:
> In my syslog output, I see entries indicating that the
> JDBC-driver-originated query on a table named 'city_summary' are taking
> upwards of 300 seconds:

> Oct  1 18:27:47 srv3 postgres-8.2[1625]: [12-1]
> LOG:  duration: 307077.037 ms  execute S_42: select * from city_summary
>       where  state = $1 and city_master_id = $2 and res_type =
> 'single_family' and date = $3
>       and range = 90 and zip = $4 and quartile  = '__ALL'
> DETAIL:  parameters: $1 = 'CA', $2 = '291', $3 = '2007-09-28', $4 = '__ALL'

> However, if I run the same query on the same host at the same time that
> the Java application is running, but from the psql  command line, it
> takes only 0.37 seconds:

>>> time /opt/postgres-8.2.4/bin/psql --port 54824  -U postgres -d
> altos_research  -c 'select fact_id from city_summary where state =
> \'CA\' and city_master_id = 291 and zip = \'__ALL\' and quartile =
> \'__ALL\' and res_type = \'single_family\' and range = \'90\' and date =
> \'2007-09-28\';'

This is not, in fact, the same query --- the JDBC-originated one is
parameterized, which means it very possibly has a different plan
(since the planner doesn't know the particular values to plan for).

Try using PREPARE and EXPLAIN EXECUTE to examine the plan that is
being produced for the parameterized query.

            regards, tom lane

Re: Strange discrepancy in query performance...

From
"Jason L. Buberel"
Date:
Tom-right-as-usual:

Yep - you were right about the query plan for the prepared statement (a sequential scan of the table) differed a bit from the directly-executed version :)

For reference, when using JasperReports .jrxml files as the basis for the query, I only had to do to the following to 'force' postgres to treat the jasper report parameter as a number and not text, thereby allowing the correct index to be used:

select * from city summary where city_master_id = $P{city_master_id}::bigint ...

Query times went from 300+ seconds back down to ~100ms.

-jason

Tom Lane wrote:
"Jason L. Buberel" <jason@buberel.org> writes: 
In my syslog output, I see entries indicating that the 
JDBC-driver-originated query on a table named 'city_summary' are taking 
upwards of 300 seconds:   
 
Oct  1 18:27:47 srv3 postgres-8.2[1625]: [12-1]
LOG:  duration: 307077.037 ms  execute S_42: select * from city_summary     where  state = $1 and city_master_id = $2 and res_type = 
'single_family' and date = $3     and range = 90 and zip = $4 and quartile  = '__ALL'
DETAIL:  parameters: $1 = 'CA', $2 = '291', $3 = '2007-09-28', $4 = '__ALL'   
 
However, if I run the same query on the same host at the same time that 
the Java application is running, but from the psql  command line, it 
takes only 0.37 seconds:   
 
time /opt/postgres-8.2.4/bin/psql --port 54824  -U postgres -d        
altos_research  -c 'select fact_id from city_summary where state = 
\'CA\' and city_master_id = 291 and zip = \'__ALL\' and quartile = 
\'__ALL\' and res_type = \'single_family\' and range = \'90\' and date = 
\'2007-09-28\';'   
This is not, in fact, the same query --- the JDBC-originated one is
parameterized, which means it very possibly has a different plan
(since the planner doesn't know the particular values to plan for).

Try using PREPARE and EXPLAIN EXECUTE to examine the plan that is
being produced for the parameterized query.
		regards, tom lane 

Re: Strange discrepancy in query performance...

From
Tom Lane
Date:
"Jason L. Buberel" <jason@buberel.org> writes:
> For reference, when using JasperReports .jrxml files as the basis for
> the query, I only had to do to the following to 'force' postgres to
> treat the jasper report parameter as a number and not text, thereby
> allowing the correct index to be used:

> select * from city summary where city_master_id =
> $P{city_master_id}::bigint ...

> Query times went from 300+ seconds back down to ~100ms.

Hmm ... if Postgres were just given the parameter symbol with no type
information, I believe it would have assumed it was bigint (or in
general, the same type as what it's being compared to).  So your
problem suggests that Jasper is deliberately telling the backend that
that parameter is of type text.  If that's coming from something you
did in your code, you probably ought to change the code.  If not,
it seems like a bug/omission in Jasper.

            regards, tom lane

Re: Strange discrepancy in query performance...

From
"Jason L. Buberel"
Date:
I agree that this is a bug in JasperReports. I've been stepping throgh their code to determine where the paramter type is set to 'java.lang.String', but have not yet figured out how their Java API will allow me to override that with 'java.lang.Integer' or something more appropriate.

If I figure something out, I'll post to the list.

Regards,
jason

Tom Lane wrote:
Hmm ... if Postgres were just given the parameter symbol with no type
information, I believe it would have assumed it was bigint (or in
general, the same type as what it's being compared to).  So your
problem suggests that Jasper is deliberately telling the backend that
that parameter is of type text.  If that's coming from something you
did in your code, you probably ought to change the code.  If not,
it seems like a bug/omission in Jasper.
		regards, tom lane