Home > mailing lists

Thread: 15x slower PreparedStatement vs raw query

15x slower PreparedStatement vs raw query

From

Alex

Date:

02 May 2021, 19:45:26

PreparedStatement: 15s

Raw query with embedded params: 1s

See issue on github with query and explain analyze:

https://github.com/pgjdbc/pgjdbc/issues/2145

Re: 15x slower PreparedStatement vs raw query

From

Justin Pryzby

Date:

03 May 2021, 20:18:12

On Sun, May 02, 2021 at 07:45:26PM +0000, Alex wrote:
> PreparedStatement: 15s
> Raw query with embedded params: 1s
> See issue on github with query and explain analyze:
> https://github.com/pgjdbc/pgjdbc/issues/2145

| ..PostgreSQL Version?  12
|Prepared statement
|...
|Planning Time: 11.596 ms
|Execution Time: 14799.266 ms
|
|Raw statement
|Planning Time: 22.685 ms
|Execution Time: 1012.992 ms

The prepared statemnt has 2x faster planning time, which is what it's meant to
improve.

The execution time is slower, and I think you can improve it with this.
https://www.postgresql.org/docs/12/runtime-config-query.html#GUC-PLAN-CACHE_MODE
|plan_cache_mode (enum)
|    Prepared statements (either explicitly prepared or implicitly generated, for example by PL/pgSQL) can be executed
usingcustom or generic plans. Custom plans are made afresh for each execution using its specific set of parameter
values,while generic plans do not rely on the parameter values and can be re-used across executions. Thus, use of a
genericplan saves planning time, but if the ideal plan depends strongly on the parameter values then a generic plan may
beinefficient. The choice between these options is normally made automatically, but it can be overridden with
plan_cache_mode.The allowed values are auto (the default), force_custom_plan and force_generic_plan. This setting is
consideredwhen a cached plan is to be executed, not when it is prepared. For more information see PREPARE.

-- 
Justin

Re: 15x slower PreparedStatement vs raw query

From

Alex

Date:

04 May 2021, 09:21:32

Shouldn't this process be automatic based on some heuristics?

Saving 10ms planning but costing 14s execution is catastrophic.

For example, using some statistics to limit planner time to some percent of of previous executions.

This way, if query is fast, planning is fast, but if query is slow, more planning can save huge execution time.

This is a better general usage option and should be enabled by default, and users who want fast planning should set the variable to use the generic plan.

Justin Pryzby wrote:

On Sun, May 02, 2021 at 07:45:26PM +0000, Alex wrote:

PreparedStatement: 15s
Raw query with embedded params: 1s
See issue on github with query and explain analyze:

https://github.com/pgjdbc/pgjdbc/issues/2145

| ..PostgreSQL Version? 12
|Prepared statement
|...
|Planning Time: 11.596 ms
|Execution Time: 14799.266 ms
|
|Raw statement
|Planning Time: 22.685 ms
|Execution Time: 1012.992 ms

The prepared statemnt has 2x faster planning time, which is what it's meant to
improve.

The execution time is slower, and I think you can improve it with this.
https://www.postgresql.org/docs/12/runtime-config-query.html#GUC-PLAN-CACHE_MODE
|plan_cache_mode (enum)
| Prepared statements (either explicitly prepared or implicitly generated, for example by PL/pgSQL) can be executed using custom or generic plans. Custom plans are made afresh for each execution using its specific set of parameter values, while generic plans do not rely on the parameter values and can be re-used across executions. Thus, use of a generic plan saves planning time, but if the ideal plan depends strongly on the parameter values then a generic plan may be inefficient. The choice between these options is normally made automatically, but it can be overridden with plan_cache_mode. The allowed values are auto (the default), force_custom_plan and force_generic_plan. This setting is considered when a cached plan is to be executed, not when it is prepared. For more information see PREPARE.

--
Justin

Re: 15x slower PreparedStatement vs raw query

From

Rick Otten

Date:

04 May 2021, 12:12:38

On Tue, May 4, 2021 at 6:05 AM Alex <cdalxndr@yahoo.com> wrote:

Shouldn't this process be automatic based on some heuristics?

Saving 10ms planning but costing 14s execution is catastrophic.

For example, using some statistics to limit planner time to some percent of of previous executions.
This way, if query is fast, planning is fast, but if query is slow, more planning can save huge execution time.
This is a better general usage option and should be enabled by default, and users who want fast planning should set the variable to use the generic plan.

"fast" and "slow" are relative things. There are many queries that I would be overjoyed with if they completed in 5 _minutes_. And others where they have to complete within 100ms or something is really wrong. We don't really know what the execution time is until the query actually executes. Planning is a guess for the best approach.

Another factor is whether the data is in cache or out on disk. Sometimes you don't really know until you try to go get it. That can significantly change query performance and plans - especially if some of the tables in a query with a lot of joins are in cache and some aren't and maybe some have to be swapped out to pick up others.

If you are running the same dozen queries with different but similarly scoped parameters over and over, one would hope that the system would slowly tune itself to be highly optimized for those dozen queries. That is a pretty narrow use case for a powerful general purpose relational database though.

Re: 15x slower PreparedStatement vs raw query

From

Alex

Date:

04 May 2021, 13:59:16

"Powerful general purpose relational database" but not smart...

I propose a feature to use information from previously executed queries to adjust the query plan time accordingly.

Reusing the same generic plan may and will lead to very long execution times.

Rick Otten wrote:

On Tue, May 4, 2021 at 6:05 AM Alex <cdalxndr@yahoo.com> wrote:
Shouldn't this process be automatic based on some heuristics?

Saving 10ms planning but costing 14s execution is catastrophic.

For example, using some statistics to limit planner time to some percent of of previous executions.
This way, if query is fast, planning is fast, but if query is slow, more planning can save huge execution time.
This is a better general usage option and should be enabled by default, and users who want fast planning should set the variable to use the generic plan.

"fast" and "slow" are relative things. There are many queries that I would be overjoyed with if they completed in 5 _minutes_. And others where they have to complete within 100ms or something is really wrong. We don't really know what the execution time is until the query actually executes. Planning is a guess for the best approach.

Another factor is whether the data is in cache or out on disk. Sometimes you don't really know until you try to go get it. That can significantly change query performance and plans - especially if some of the tables in a query with a lot of joins are in cache and some aren't and maybe some have to be swapped out to pick up others.

If you are running the same dozen queries with different but similarly scoped parameters over and over, one would hope that the system would slowly tune itself to be highly optimized for those dozen queries. That is a pretty narrow use case for a powerful general purpose relational database though.

Re: 15x slower PreparedStatement vs raw query

From

Laurenz Albe

Date:

04 May 2021, 15:22:04

On Tue, 2021-05-04 at 13:59 +0000, Alex wrote:
> "Powerful general purpose relational database" but not smart... 

Too smart can easily become slow...

> I propose a feature to use information from previously executed queries to adjust the query plan time accordingly.
> Reusing the same generic plan may and will lead to very long execution times.

AI can go wrong too, and I personally would be worried that such cases
are very hard to debug...

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com

Re: 15x slower PreparedStatement vs raw query

From

Vijaykumar Jain

Date:

04 May 2021, 15:50:19

I am not an expert on this, But I would like to take a shot :)

Is it possible to share your prepared statement and parameter types.

I mean

something like this

PREPARE usrrptplan (int) AS    SELECT * FROM users u, logs l WHERE u.usrid=$1 AND u.usrid=l.usrid    AND l.date = $2;
EXECUTE usrrptplan(1, current_date);

It's just that sometimes the datatypes of the prepared statement params are not the same as the datatype of the field in the join and as a result it may add some overhead.

PostgreSQL - general - bpchar, text and indexes (postgresql-archive.org)

There was one more thread where a person has similar issues, which was sorted by using the relevant field type in the prepared field.

bPMA | explain.depesz.com -> slow (prepared) Row 8

TsNn | explain.depesz.com -> fast (direct) Row 8

It seems the join filters in the prepared version are doing a lot of work on the fields massaging the fields that may add the cost overhead,

Also, if the above does not work, can you try the below plan GUC to check if you see any improvements.

Tech preview: How PostgreSQL 12 handles prepared plans - CYBERTEC (cybertec-postgresql.com)

Thanks,

Vijay

On Tue, 4 May 2021 at 20:52, Laurenz Albe <laurenz.albe@cybertec.at> wrote:

On Tue, 2021-05-04 at 13:59 +0000, Alex wrote:
> "Powerful general purpose relational database" but not smart...

Too smart can easily become slow...

> I propose a feature to use information from previously executed queries to adjust the query plan time accordingly.
> Reusing the same generic plan may and will lead to very long execution times.

AI can go wrong too, and I personally would be worried that such cases
are very hard to debug...

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com

Thanks,

Vijay

Mumbai, India

Re: 15x slower PreparedStatement vs raw query

From

David Rowley

Date:

05 May 2021, 06:57:04

On Tue, 4 May 2021 at 22:05, Alex <cdalxndr@yahoo.com> wrote:
> Shouldn't this process be automatic based on some heuristics?

When plan_cache_mode is set to "auto", then the decision to use a
generic or custom plan is cost-based. See [1]. There's a fairly crude
method there for estimating the effort required to replan the query.
The remainder is based on the average cost of the previous custom
plans + estimated planning effort vs cost of the generic plan.  The
cheaper one wins.

Certainly, what's there is far from perfect.  There are various
problems with it.  The estimated planning cost is pretty crude and
could do with an overhaul.   There are also issues with the plan costs
not being true to the cost of the query.  One problem there is that
run-time partition pruning is not costed into the plan.  This might
cause choose_custom_plan() to pick a custom plan when a generic one
with run-time pruning might have been better.

In order to get a better idea of where things are going wrong for you,
we'd need to see the EXPLAIN ANALYZE output for both the custom and
the generic plan.

David

[1] https://github.com/postgres/postgres/blob/master/src/backend/utils/cache/plancache.c#L1019

Re: 15x slower PreparedStatement vs raw query

From

Justin Pryzby

Date:

05 May 2021, 06:59:19

On Mon, May 03, 2021 at 03:18:11PM -0500, Justin Pryzby wrote:
> On Sun, May 02, 2021 at 07:45:26PM +0000, Alex wrote:
> > PreparedStatement: 15s
> > Raw query with embedded params: 1s
> > See issue on github with query and explain analyze:
> > https://github.com/pgjdbc/pgjdbc/issues/2145
> 
> | ..PostgreSQL Version?  12
> |Prepared statement
> |...
> |Planning Time: 11.596 ms
> |Execution Time: 14799.266 ms
> |
> |Raw statement
> |Planning Time: 22.685 ms
> |Execution Time: 1012.992 ms
> 
> The prepared statemnt has 2x faster planning time, which is what it's meant to
> improve.
> 
> The execution time is slower, and I think you can improve it with this.
> https://www.postgresql.org/docs/12/runtime-config-query.html#GUC-PLAN-CACHE_MODE

Also, the rowcount estimates are way off starting with the scan nodes.

  ->  Bitmap Heap Scan on category_property_name cpn_limits  (cost=32.13..53.55 rows=14 width=29) (actual
time=0.665..8.822rows=2650 loops=1)
 
    Recheck Cond: ((lexeme = ANY ('{''rata'',""''polling'' ''rata'' ''ratez''"",""''polling'' ''rata''"",""''rata''
''ratez''''semnal'' ''usb-ul''""}'::tsvector[])) OR (lexeme = '''frecventa'' ''frecventez'''::tsvector) OR (lexeme =
'''raportare''''rata'' ''ratez'''::tsvector) OR (lexeme = ANY ('{''latime'',""''latime'' ''placi''"",""''compatibila''
''latime''''telefon''""}'::tsvector[])) OR (lexeme = '''lungime'''::tsvector) OR (lexeme = '''cablu''
''lungime'''::tsvector)OR (lexeme = '''inaltime'''::tsvector) OR (lexeme = '''rezolutie'''::tsvector) OR (lexeme =
'''greutate'''::tsvector))
    Heap Blocks: exact=85
    ->  BitmapOr  (cost=32.13..32.13 rows=14 width=0) (actual time=0.574..0.577 rows=0 loops=1) 
          ->  Bitmap Index Scan on category_property_name_lexeme_idx  (cost=0.00..9.17 rows=4 width=0) (actual
time=0.088..0.089rows=10 loops=1)
 
            Index Cond: (lexeme = ANY ('{''rata'',""''polling'' ''rata'' ''ratez''"",""''polling''
''rata''"",""''rata''''ratez'' ''semnal'' ''usb-ul''""}'::tsvector[]))
 
          ->  Bitmap Index Scan on category_property_name_lexeme_idx  (cost=0.00..2.29 rows=1 width=0) (actual
time=0.047..0.047rows=171 loops=1) 
 
            Index Cond: (lexeme = '''frecventa'' ''frecventez'''::tsvector)
                                                                                                                ->
BitmapIndex Scan on category_property_name_lexeme_idx  (cost=0.00..2.29 rows=1 width=0) (actual time=0.015..0.015
rows=1loops=1)                                                                                                   Index
Cond:(lexeme = '''raportare'' ''rata'' ''ratez'''::tsvector)
 
          ->  Bitmap Index Scan on category_property_name_lexeme_idx  (cost=0.00..6.88 rows=3 width=0) (actual
time=0.097..0.097rows=547 loops=1)
          Index Cond: (lexeme = ANY ('{''latime'',""''latime'' ''placi''"",""''compatibila'' ''latime''
''telefon''""}'::tsvector[]))
          ->  Bitmap Index Scan on category_property_name_lexeme_idx  (cost=0.00..2.29 rows=1 width=0) (actual
time=0.107..0.107rows=604 loops=1)
          Index Cond: (lexeme = '''lungime'''::tsvector)
 
          ->  Bitmap Index Scan on category_property_name_lexeme_idx  (cost=0.00..2.29 rows=1 width=0) (actual
time=0.030..0.030rows=137 loops=1)
 
            Index Cond: (lexeme = '''cablu'' ''lungime'''::tsvector)
          ->  Bitmap Index Scan on category_property_name_lexeme_idx  (cost=0.00..2.29 rows=1 width=0) (actual
time=0.079..0.079rows=479 loops=1)
          Index Cond: (lexeme = '''inaltime'''::tsvector)
 
          ->  Bitmap Index Scan on category_property_name_lexeme_idx  (cost=0.00..2.29 rows=1 width=0) (actual
time=0.020..0.020rows=40 loops=1)
 
            Index Cond: (lexeme = '''rezolutie'''::tsvector)
                                                                                                                ->
BitmapIndex Scan on category_property_name_lexeme_idx  (cost=0.00..2.29 rows=1 width=0) (actual time=0.088..0.088
rows=661loops=1)                                                                                                 Index
Cond:(lexeme = '''greutate'''::tsvector)

Re: 15x slower PreparedStatement vs raw query

From

Alex

Date:

05 May 2021, 15:21:30

This is exactly my issue.

Using raw query, planning takes 22ms (custom plan), but using PreparedStatement planning takes 11ms (generic plan).

It choose the faster generic plan 11ms, wining 11ms faster than custom plan, but loosing 14seconds!!! to execution...

The auto choose algorithm should be changed to include execution time in the decision.

On Wednesday, May 5, 2021, 9:57:20 AM GMT+3, David Rowley <dgrowleyml@gmail.com> wrote:

On Tue, 4 May 2021 at 22:05, Alex <cdalxndr@yahoo.com> wrote:
> Shouldn't this process be automatic based on some heuristics?

When plan_cache_mode is set to "auto", then the decision to use a
generic or custom plan is cost-based. See [1]. There's a fairly crude
method there for estimating the effort required to replan the query.
The remainder is based on the average cost of the previous custom
plans + estimated planning effort vs cost of the generic plan. The
cheaper one wins.

Certainly, what's there is far from perfect. There are various
problems with it. The estimated planning cost is pretty crude and
could do with an overhaul. There are also issues with the plan costs
not being true to the cost of the query. One problem there is that
run-time partition pruning is not costed into the plan. This might
cause choose_custom_plan() to pick a custom plan when a generic one
with run-time pruning might have been better.

In order to get a better idea of where things are going wrong for you,
we'd need to see the EXPLAIN ANALYZE output for both the custom and
the generic plan.

David

[1]

https://github.com/postgres/postgres/blob/master/src/backend/utils/cache/plancache.c#L1019