Thread: Missing query plan for auto_explain.

Missing query plan for auto_explain.

From
Matheus Martin
Date:
Our Postgres recently started reporting considerably different execution times for the same query. When executed from our JDBC application the Postgres logs report an average execution time of 1500 ms but when the query is manually executed through `psql` it doesn't take longer than 50 ms.

With a view to investigate discrepancies in the plan we enabled `auto_explain` in `session_preload_libraries` with `auto_explain.log_min_duration = '1s'`. All application servers were bounced to ensure new connections were created and picked up the changes. However this trouble query does not have an explain plan printed, even when its execution time exceeds the threshold (other queries do though).

Does anyone have ideas of why the explain plan is not being printed?

Sample log entry for trouble query executed from application:
```
Aug 26 09:11:33 db-931 postgres[8106]: [66-1] 2022-08-26 09:11:33 GMT [8106]: [5-1] db=betwave,user=betwave_app_readonly_user LOG:  duration: 1423.481 ms  bind <unnamed>: /*@ org.sixty6.as66.QueryID:report.framework.userSearch */select users.user_id, users.unique_identifier, user_profile.name, user_profile.email, organisation.organisation_id, organisation.description, user_realm.user_realm_id, user_realm.description as col_8, user_realm.type, user_realm_category.description as col_10, vip_schema.description as col_11, vip_level.description as col_12, affiliate.description as col_13, users.status, users.creation_date, user_statistics.last_user_session_creation_date, users.real_base_currency, users.fun_base_currency, local_balance_account.balance
Aug 26 09:11:33 db-931 postgres[8106]: [66-2] #011from users users
Aug 26 09:11:33 db-931 postgres[8106]: [66-3] #011join user_profile user_profile
Aug 26 09:11:33 db-931 postgres[8106]: [66-4] #011on user_profile.user_id = users.user_id
Aug 26 09:11:33 db-931 postgres[8106]: [66-5] #011join user_realm user_realm
Aug 26 09:11:33 db-931 postgres[8106]: [66-6] #011on user_realm.user_realm_id = users.user_realm_id
Aug 26 09:11:33 db-931 postgres[8106]: [66-7] #011join organisation organisation
Aug 26 09:11:33 db-931 postgres[8106]: [66-8] #011on organisation.organisation_id = user_realm.organisation_id
Aug 26 09:11:33 db-931 postgres[8106]: [66-9] #011join acl_allowed_organisation acl_allowed_organisation
Aug 26 09:11:33 db-931 postgres[8106]: [66-10] #011on acl_allowed_organisation.organisation_id = organisation.organisation_id and acl_allowed_organisation.permission = $1 and acl_allowed_organisation.user_id = $2
Aug 26 09:11:33 db-931 postgres[8106]: [66-11] #011join affiliate affiliate
Aug 26 09:11:33 db-931 postgres[8106]: [66-12] #011on affiliate.affiliate_id = users.affiliate_id
Aug 26 09:11:33 db-931 postgres[8106]: [66-13] #011join acl_allowed_affiliate acl_allowed_affiliate
Aug 26 09:11:33 db-931 postgres[8106]: [66-14] #011on acl_allowed_affiliate.affiliate_id = affiliate.affiliate_id and acl_allowed_affiliate.permission = $3 and acl_allowed_affiliate.user_id = $4
Aug 26 09:11:33 db-931 postgres[8106]: [66-15] #011join acl_allowed_user_realm_category acl_allowed_user_realm_category
Aug 26 09:11:33 db-931 postgres[8106]: [66-16] #011on acl_allowed_user_realm_category.user_realm_category_id = user_realm.user_realm_category_id and acl_allowed_user_realm_category.permission = $5 and acl_allowed_user_realm_category.user_id = $6
Aug 26 09:11:33 db-931 postgres[8106]: [66-17] #011join user_statistics user_statistics
Aug 26 09:11:33 db-931 postgres[8106]: [66-18] #011on user_statistics.user_id = users.user_id
Aug 26 09:11:33 db-931 postgres[8106]: [66-19] #011join vip_level vip_level
Aug 26 09:11:33 db-931 postgres[8106]: [66-20] #011on vip_level.vip_level_id = users.vip_level_id
Aug 26 09:11:33 db-931 postgres[8106]: [66-21] #011join vip_schema vip_schema
Aug 26 09:11:33 db-931 postgres[8106]: [66-22] #011on vip_schema.vip_schema_id = vip_level.vip_schema_id
Aug 26 09:11:33 db-931 postgres[8106]: [66-23] #011join user_realm_category user_realm_category
Aug 26 09:11:33 db-931 postgres[8106]: [66-24] #011on user_realm_category.user_realm_category_id = user_realm.user_realm_category_id
Aug 26 09:11:33 db-931 postgres[8106]: [66-25] #011left join local_balance_account local_balance_account
Aug 26 09:11:33 db-931 postgres[8106]: [66-26] #011on local_balance_account.user_id = users.user_id and local_balance_account.currency = users.real_base_currency and local_balance_account.balance_category = $7 and local_balance_account.primary_balance = $8
Aug 26 09:11:33 db-931 postgres[8106]: [66-27] #011where users.type in ($9, $10) and user_realm.type in ($11, $12, $13) and users.unique_identifier like $14 escape '\'
Aug 26 09:11:33 db-931 postgres[8106]: [66-28] #011order by users.unique_identifier asc, user_realm.user_realm_id asc
Aug 26 09:11:33 db-931 postgres[8106]: [66-29] #011limit $15
Aug 26 09:11:33 db-931 postgres[8106]: [66-30] 2022-08-26 09:11:33 GMT [8106]: [6-1] db=betwave,user=betwave_app_readonly_user DETAIL:  parameters: $1 = 'READ', $2 = '1468137', $3 = 'READ', $4 = '1468137', $5 = 'READ', $6 = '1468137', $7 = 'CASH', $8 = 't', $9 = 'LOCAL', $10 = 'EXTERNAL', $11 = 'ADMIN', $12 = 'HUB', $13 = 'PLAYER', $14 = 'robinson06%', $15 = '200'
```

Sample log entry for another query executed from application:
```
Aug 26 15:10:58 db-931 postgres[40727]: [71-1] 2022-08-26 15:10:58 GMT [40727]: [15-1] db=betwave,user=betwave_app_readonly_user LOG:  duration: 1069.964 ms  plan:
Aug 26 15:10:58 db-931 postgres[40727]: [71-2] #011Query Text: /*@ org.sixty6.as66.QueryID:health.betwave.game.external.InProgressGamePlays:com.betconstruct */select x.hour, sum(x.in_progress_count) as col_2, sum(x.closed_count) as col_3
Aug 26 15:10:58 db-931 postgres[40727]: [71-3] #011from ((select date_trunc('hour',game_play.start_date AT TIME ZONE INTERVAL '0 seconds') AT TIME ZONE INTERVAL '0 seconds' as hour, count(*) as in_progress_count, $1 as closed_count
Aug 26 15:10:58 db-931 postgres[40727]: [71-4] #011from game_play game_play
Aug 26 15:10:58 db-931 postgres[40727]: [71-5] #011join game_profile game_profile
Aug 26 15:10:58 db-931 postgres[40727]: [71-6] #011on game_profile.game_profile_id = game_play.game_profile_id
Aug 26 15:10:58 db-931 postgres[40727]: [71-7] #011join game game
Aug 26 15:10:58 db-931 postgres[40727]: [71-8] #011on game.game_id = game_profile.game_id
Aug 26 15:10:58 db-931 postgres[40727]: [71-9] #011where game.game_engine_id = $2 and game_play.status = $3 and game_play.start_date >= $4 and game_play.start_date < $5
Aug 26 15:10:58 db-931 postgres[40727]: [71-10] #011group by date_trunc('hour',game_play.start_date AT TIME ZONE INTERVAL '0 seconds') AT TIME ZONE INTERVAL '0 seconds')
Aug 26 15:10:58 db-931 postgres[40727]: [71-11] #011union all
Aug 26 15:10:58 db-931 postgres[40727]: [71-12] #011(select date_trunc('hour',game_activity.interval_start_date AT TIME ZONE INTERVAL '0 seconds') AT TIME ZONE INTERVAL '0 seconds' as hour, $6 as in_progress_count, sum(game_activity.game_play_count) as closed_count
Aug 26 15:10:58 db-931 postgres[40727]: [71-13] #011from game_activity game_activity
Aug 26 15:10:58 db-931 postgres[40727]: [71-14] #011join game game
Aug 26 15:10:58 db-931 postgres[40727]: [71-15] #011on game.game_id = game_activity.game_id
Aug 26 15:10:58 db-931 postgres[40727]: [71-16] #011where game.game_engine_id = $7 and tstzrange($8, $9, $10) @> tstzrange(game_activity.interval_start_date, game_activity.interval_end_date, $11)
Aug 26 15:10:58 db-931 postgres[40727]: [71-17] #011group by date_trunc('hour',game_activity.interval_start_date AT TIME ZONE INTERVAL '0 seconds') AT TIME ZONE INTERVAL '0 seconds')) as x
Aug 26 15:10:58 db-931 postgres[40727]: [71-18] #011group by x.hour
Aug 26 15:10:58 db-931 postgres[40727]: [71-19] #011having sum(x.in_progress_count) > $12 and sum(x.in_progress_count) + sum(x.closed_count) >= $13 and sum(x.in_progress_count) >= (sum(x.in_progress_count) + sum(x.closed_count)) * $14
Aug 26 15:10:58 db-931 postgres[40727]: [71-20] #011GroupAggregate  (cost=10069.43..10072.16 rows=1 width=72)
Aug 26 15:10:58 db-931 postgres[40727]: [71-21] #011  Group Key: "*SELECT* 1".hour
Aug 26 15:10:58 db-931 postgres[40727]: [71-22] #011  Filter: ((sum("*SELECT* 1".in_progress_count) > '0'::numeric) AND ((sum("*SELECT* 1".in_progress_count) + sum((('0'::bigint)::numeric))) >= '0'::numeric) AND (sum("*SELECT* 1".in_progress_count) >= ((sum("*SELECT* 1".in_progress_count) + sum((('0'::bigint)::numeric))) * '0'::numeric)))
Aug 26 15:10:58 db-931 postgres[40727]: [71-23] #011  ->  Sort  (cost=10069.43..10069.53 rows=39 width=48)
Aug 26 15:10:58 db-931 postgres[40727]: [71-24] #011        Sort Key: "*SELECT* 1".hour
Aug 26 15:10:58 db-931 postgres[40727]: [71-25] #011        ->  Append  (cost=10048.65..10068.01 rows=39 width=48)
Aug 26 15:10:58 db-931 postgres[40727]: [71-26] #011              ->  Subquery Scan on "*SELECT* 1"  (cost=10048.65..10050.08 rows=38 width=48)
Aug 26 15:10:58 db-931 postgres[40727]: [71-27] #011                    ->  GroupAggregate  (cost=10048.65..10049.60 rows=38 width=24)
Aug 26 15:10:58 db-931 postgres[40727]: [71-28] #011                          Group Key: (timezone('00:00:00'::interval, date_trunc('hour'::text, timezone('00:00:00'::interval, game_play.start_date))))
Aug 26 15:10:58 db-931 postgres[40727]: [71-29] #011                          ->  Sort  (cost=10048.65..10048.75 rows=38 width=8)
Aug 26 15:10:58 db-931 postgres[40727]: [71-30] #011                                Sort Key: (timezone('00:00:00'::interval, date_trunc('hour'::text, timezone('00:00:00'::interval, game_play.start_date))))
Aug 26 15:10:58 db-931 postgres[40727]: [71-31] #011                                ->  Nested Loop  (cost=116.74..10047.66 rows=38 width=8)
Aug 26 15:10:58 db-931 postgres[40727]: [71-32] #011                                      ->  Hash Join  (cost=116.33..209.84 rows=18 width=8)
Aug 26 15:10:58 db-931 postgres[40727]: [71-33] #011                                            Hash Cond: (game_profile.game_id = game.game_id)
Aug 26 15:10:58 db-931 postgres[40727]: [71-34] #011                                            ->  Seq Scan on game_profile  (cost=0.00..86.54 rows=2654 width=16)
Aug 26 15:10:58 db-931 postgres[40727]: [71-35] #011                                            ->  Hash  (cost=116.12..116.12 rows=16 width=8)
Aug 26 15:10:58 db-931 postgres[40727]: [71-36] #011                                                  ->  Seq Scan on game  (cost=0.00..116.12 rows=16 width=8)
Aug 26 15:10:58 db-931 postgres[40727]: [71-37] #011                                                        Filter: ((game_engine_id)::text = 'com.betconstruct'::text)
Aug 26 15:10:58 db-931 postgres[40727]: [71-38] #011                                      ->  Index Scan using game_play_operable_unq01 on game_play_operable game_play  (cost=0.42..546.45 rows=8 width=16)
Aug 26 15:10:58 db-931 postgres[40727]: [71-39] #011                                            Index Cond: (game_profile_id = game_profile.game_profile_id)
Aug 26 15:10:58 db-931 postgres[40727]: [71-40] #011                                            Filter: ((start_date >= '2022-08-25 16:00:00+00'::timestamp with time zone) AND (start_date < '2022-08-26 15:00:57.028601+00'::timestamp with time zone) AND ((status)::text = 'IN_PROGRESS'::text))
Aug 26 15:10:58 db-931 postgres[40727]: [71-41] #011              ->  GroupAggregate  (cost=17.70..17.73 rows=1 width=48)
Aug 26 15:10:58 db-931 postgres[40727]: [71-42] #011                    Group Key: (timezone('00:00:00'::interval, date_trunc('hour'::text, timezone('00:00:00'::interval, game_activity.interval_start_date))))
Aug 26 15:10:58 db-931 postgres[40727]: [71-43] #011                    ->  Sort  (cost=17.70..17.70 rows=1 width=16)
Aug 26 15:10:58 db-931 postgres[40727]: [71-44] #011                          Sort Key: (timezone('00:00:00'::interval, date_trunc('hour'::text, timezone('00:00:00'::interval, game_activity.interval_start_date))))
Aug 26 15:10:58 db-931 postgres[40727]: [71-45] #011                          ->  Nested Loop  (cost=0.70..17.69 rows=1 width=16)
Aug 26 15:10:58 db-931 postgres[40727]: [71-46] #011                                ->  Index Scan using game_activity_ix01 on game_activity  (cost=0.42..8.44 rows=1 width=24)
Aug 26 15:10:58 db-931 postgres[40727]: [71-47] #011                                      Index Cond: (tstzrange(interval_start_date, interval_end_date, '[]'::text) <@ '["2022-08-25 16:00:00+00","2022-08-26 16:00:00+00"]'::tstzrange)
Aug 26 15:10:58 db-931 postgres[40727]: [71-48] #011                                ->  Index Scan using pk_game on game game_1  (cost=0.28..8.30 rows=1 width=8)
Aug 26 15:10:58 db-931 postgres[40727]: [71-49] #011                                      Index Cond: (game_id = game_activity.game_id)
Aug 26 15:10:58 db-931 postgres[40727]: [71-50] #011                                      Filter: ((game_engine_id)::text = 'com.betconstruct'::text)
```

Our environment consists of PostgreSQL 13.6 on Linux.

Re: Missing query plan for auto_explain.

From
Alvaro Herrera
Date:
On 2022-Aug-30, Matheus Martin wrote:

> Our Postgres recently started reporting considerably different
> execution times for the same query. When executed from our JDBC
> application the Postgres logs report an average execution time of 1500
> ms but when the query is manually executed through `psql` it doesn't
> take longer than 50 ms.

I don't know why the plan is not saved by auto_explain (maybe we're
missing ExecutorEnd calls somewhere?  that would be strange), but one
frequent reason for queries to show different plan in JDBC than psql is
the use of prepared statements.  Did you try using "PREPARE
yourquery(...)" and then EXPLAIN EXECUTE(...)?  Sometimes that helps to
recreate the original problem.


(Apparently, ExecutorEnd is called from PortalCleanup; what happens with
the portal for an extended-protocol query?)

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/



Re: Missing query plan for auto_explain.

From
Julien Rouhaud
Date:
Hi,

On Tue, Aug 30, 2022 at 01:16:43PM +0200, Alvaro Herrera wrote:
> On 2022-Aug-30, Matheus Martin wrote:
> 
> > Our Postgres recently started reporting considerably different
> > execution times for the same query. When executed from our JDBC
> > application the Postgres logs report an average execution time of 1500
> > ms but when the query is manually executed through `psql` it doesn't
> > take longer than 50 ms.
> 
> I don't know why the plan is not saved by auto_explain (maybe we're
> missing ExecutorEnd calls somewhere?  that would be strange), but one
> frequent reason for queries to show different plan in JDBC than psql is
> the use of prepared statements.  Did you try using "PREPARE
> yourquery(...)" and then EXPLAIN EXECUTE(...)?  Sometimes that helps to
> recreate the original problem.
> 
> (Apparently, ExecutorEnd is called from PortalCleanup; what happens with
> the portal for an extended-protocol query?)

AFAICS log_min_duration_statements threshold is based on the full query
processing time while auto_explain is only based on the executor runtime, so
one more likely explanation is that out of the 1423ms, more than 423ms were
spent in the planner?



Re: Missing query plan for auto_explain.

From
Matheus Martin
Date:
Good idea on using an actual prepared statement but unfortunately it didn't produce any different result.

Could you please elaborate a bit on your advice concerning ExecutorEnd/PortalCleanup? I am afraid it doesn't mean much to me.

On Tue, 30 Aug 2022 at 12:16, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
On 2022-Aug-30, Matheus Martin wrote:

> Our Postgres recently started reporting considerably different
> execution times for the same query. When executed from our JDBC
> application the Postgres logs report an average execution time of 1500
> ms but when the query is manually executed through `psql` it doesn't
> take longer than 50 ms.

I don't know why the plan is not saved by auto_explain (maybe we're
missing ExecutorEnd calls somewhere?  that would be strange), but one
frequent reason for queries to show different plan in JDBC than psql is
the use of prepared statements.  Did you try using "PREPARE
yourquery(...)" and then EXPLAIN EXECUTE(...)?  Sometimes that helps to
recreate the original problem.


(Apparently, ExecutorEnd is called from PortalCleanup; what happens with
the portal for an extended-protocol query?)

--
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/

Re: Missing query plan for auto_explain.

From
Matheus Martin
Date:
The threshold for `auto_explain` was changed to 500 ms and explain plans are still not being logged.

On Tue, 30 Aug 2022 at 13:30, Julien Rouhaud <rjuju123@gmail.com> wrote:
Hi,

On Tue, Aug 30, 2022 at 01:16:43PM +0200, Alvaro Herrera wrote:
> On 2022-Aug-30, Matheus Martin wrote:
>
> > Our Postgres recently started reporting considerably different
> > execution times for the same query. When executed from our JDBC
> > application the Postgres logs report an average execution time of 1500
> > ms but when the query is manually executed through `psql` it doesn't
> > take longer than 50 ms.
>
> I don't know why the plan is not saved by auto_explain (maybe we're
> missing ExecutorEnd calls somewhere?  that would be strange), but one
> frequent reason for queries to show different plan in JDBC than psql is
> the use of prepared statements.  Did you try using "PREPARE
> yourquery(...)" and then EXPLAIN EXECUTE(...)?  Sometimes that helps to
> recreate the original problem.
>
> (Apparently, ExecutorEnd is called from PortalCleanup; what happens with
> the portal for an extended-protocol query?)

AFAICS log_min_duration_statements threshold is based on the full query
processing time while auto_explain is only based on the executor runtime, so
one more likely explanation is that out of the 1423ms, more than 423ms were
spent in the planner?

Re: Missing query plan for auto_explain.

From
Alvaro Herrera
Date:
On 2022-Aug-30, Matheus Martin wrote:

> Good idea on using an actual prepared statement but unfortunately it didn't
> produce any different result.

I should have also mentioned to try the EXPLAIN EXECUTE six times and
see if the last one produces a different plan.  That's when it switches
from planning every time to planning with generic arguments, as I
recall.

> Could you please elaborate a bit on your advice concerning
> ExecutorEnd/PortalCleanup? I am afraid it doesn't mean much to me.

That wasn't advice actually, just a note that the code might be doing
that thing wrong, causing auto_explain to miss it.  This is an unproven
hypothesis that is likely to be all wrong.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"I'm impressed how quickly you are fixing this obscure issue. I came from 
MS SQL and it would be hard for me to put into words how much of a better job
you all are doing on [PostgreSQL]."
 Steve Midgley, http://archives.postgresql.org/pgsql-sql/2008-08/msg00000.php



Re: Missing query plan for auto_explain.

From
Tom Lane
Date:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> On 2022-Aug-30, Matheus Martin wrote:
>> Good idea on using an actual prepared statement but unfortunately it didn't
>> produce any different result.

> I should have also mentioned to try the EXPLAIN EXECUTE six times and
> see if the last one produces a different plan.  That's when it switches
> from planning every time to planning with generic arguments, as I
> recall.

In recent versions you can bypass that with

SET plan_cache_mode = force_generic_plan;

            regards, tom lane



Re: Missing query plan for auto_explain.

From
Matheus Martin
Date:
We tried running the prepared statement six times as suggested but wasn't still able to recreate the original problem.

Perhaps more concerning/relevant is that we have not found any explanation to why the explain plan is not being logged by `auto_explain`. Could this be a bug? Shall we report it?

On Tue, 30 Aug 2022 at 17:45, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
On 2022-Aug-30, Matheus Martin wrote:

> Good idea on using an actual prepared statement but unfortunately it didn't
> produce any different result.

I should have also mentioned to try the EXPLAIN EXECUTE six times and
see if the last one produces a different plan.  That's when it switches
from planning every time to planning with generic arguments, as I
recall.

> Could you please elaborate a bit on your advice concerning
> ExecutorEnd/PortalCleanup? I am afraid it doesn't mean much to me.

That wasn't advice actually, just a note that the code might be doing
that thing wrong, causing auto_explain to miss it.  This is an unproven
hypothesis that is likely to be all wrong.

--
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"I'm impressed how quickly you are fixing this obscure issue. I came from
MS SQL and it would be hard for me to put into words how much of a better job
you all are doing on [PostgreSQL]."
 Steve Midgley, http://archives.postgresql.org/pgsql-sql/2008-08/msg00000.php

Re: Missing query plan for auto_explain.

From
Julien Rouhaud
Date:
Hi,

On Thu, Sep 01, 2022 at 08:20:13PM +0100, Matheus Martin wrote:
> We tried running the prepared statement six times as suggested but wasn't
> still able to recreate the original problem.
> 
> Perhaps more concerning/relevant is that we have not found any explanation
> to why the explain plan is not being logged by `auto_explain`. Could this
> be a bug? Shall we report it?

Just to be sure, do you get at least some plans logged by auto_explain when
queries are executed by the JDBC application?

Can you try to temporarily lower auto_explain.log_min_duration_statements to
less than 50ms and see what auto_explain sees for the execution time (and
planning time).

Another possibility would be some conflicting locks held.  If the conflict
happens during the planning auto_explain still won't be triggered as it's
outside the executor.  Also, have you enabled log_lock_waits?



Re: Missing query plan for auto_explain.

From
Matheus Martin
Date:
Yes, we do see some plans logged by the auto_explain. We couldn't find a `auto_explain.log_min_duration_statements` setting but `log_min_duration_statement` as in https://www.postgresql.org/docs/current/runtime-config-logging.html is set to 100 ms.

Unfortunately, due to the amount of traffic we have, we cannot lower the threshold for the auto_explain.

`log_lock_waits` is turned on and the logs do not indicate any locks related to the tables in the query.  

On Fri, 2 Sept 2022 at 05:49, Julien Rouhaud <rjuju123@gmail.com> wrote:
Hi,

On Thu, Sep 01, 2022 at 08:20:13PM +0100, Matheus Martin wrote:
> We tried running the prepared statement six times as suggested but wasn't
> still able to recreate the original problem.
>
> Perhaps more concerning/relevant is that we have not found any explanation
> to why the explain plan is not being logged by `auto_explain`. Could this
> be a bug? Shall we report it?

Just to be sure, do you get at least some plans logged by auto_explain when
queries are executed by the JDBC application?

Can you try to temporarily lower auto_explain.log_min_duration_statements to
less than 50ms and see what auto_explain sees for the execution time (and
planning time).

Another possibility would be some conflicting locks held.  If the conflict
happens during the planning auto_explain still won't be triggered as it's
outside the executor.  Also, have you enabled log_lock_waits?

Re: Missing query plan for auto_explain.

From
"Peter J. Holzer"
Date:
On 2022-09-02 10:58:58 +0100, Matheus Martin wrote:
> Yes, we do see some plans logged by the auto_explain. We couldn't find a
> `auto_explain.log_min_duration_statements` setting

This is weird as the documentation says:

| Note that the default behavior is to do nothing, so you must set at
| least auto_explain.log_min_duration if you want any results.

What does
    show auto_explain.log_min_duration;
return?

> but `log_min_duration_statement` as in
> https://www.postgresql.org/docs/current/ runtime-config-logging.html
> is set to 100 ms.

I don't think this affects auto_explain.

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Attachment

Re: Missing query plan for auto_explain.

From
Matheus Martin
Date:
`auto_explain.log_min_duration` is set to 500 ms.

On Mon, 5 Sept 2022 at 12:35, Peter J. Holzer <hjp-pgsql@hjp.at> wrote:
On 2022-09-02 10:58:58 +0100, Matheus Martin wrote:
> Yes, we do see some plans logged by the auto_explain. We couldn't find a
> `auto_explain.log_min_duration_statements` setting

This is weird as the documentation says:

| Note that the default behavior is to do nothing, so you must set at
| least auto_explain.log_min_duration if you want any results.

What does
    show auto_explain.log_min_duration;
return?

> but `log_min_duration_statement` as in
> https://www.postgresql.org/docs/current/ runtime-config-logging.html
> is set to 100 ms.

I don't think this affects auto_explain.

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Re: Missing query plan for auto_explain.

From
Maxim Boguk
Date:


On Tue, Aug 30, 2022 at 1:38 PM Matheus Martin <matheus.martin@voidbridge.com> wrote:
Our Postgres recently started reporting considerably different execution times for the same query. When executed from our JDBC application the Postgres logs report an average execution time of 1500 ms but when the query is manually executed through `psql` it doesn't take longer than 50 ms.

With a view to investigate discrepancies in the plan we enabled `auto_explain` in `session_preload_libraries` with `auto_explain.log_min_duration = '1s'`. All application servers were bounced to ensure new connections were created and picked up the changes. However this trouble query does not have an explain plan printed, even when its execution time exceeds the threshold (other queries do though).

Does anyone have ideas of why the explain plan is not being printed?

Sample log entry for trouble query executed from application:
```
Aug 26 09:11:33 db-931 postgres[8106]: [66-1] 2022-08-26 09:11:33 GMT [8106]: [5-1] db=betwave,user=betwave_app_readonly_user LOG:  duration: 1423.481 ms  bind <unnamed>:

My understanding of how to auto_explain work - it deals only for execution calls, but in your case duration: 1423.481 ms on BIND call, before query execution.
At least in my understanding - auto_explain cannot work and will not help in case of a slow BIND call (because it's a time when the query is planned but not executed).
According documentation:
"Query planning typically occurs when the Bind message is processed. If the prepared statement has no parameters, or is executed repeatedly, the server might save the created plan and re-use it during subsequent Bind messages for the same prepared statement."
Hard to say what the reason for slow planning, but one (there could be others) likely reason is JIT work. Do you have JIT enabled?

 
--
Maxim Boguk
Senior Postgresql DBA
https://dataegret.com/

Phone UA: +380 99 143 0000
Phone AU: +61  45 218 5678

LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk

"Доктор, вы мне советовали так не делать, но почему мне по-прежнему больно когда я так делаю ещё раз?"

Re: Missing query plan for auto_explain.

From
Matheus Martin
Date:
We do have JIT enabled `jit=on` with `jit_above_cost=100000`.

I am sorry but I don't quite understand what role JIT plays in this situation with `auto_explain`. Could you please elaborate on that?

On Tue, 6 Sept 2022 at 00:29, Maxim Boguk <maxim.boguk@gmail.com> wrote:


On Tue, Aug 30, 2022 at 1:38 PM Matheus Martin <matheus.martin@voidbridge.com> wrote:
Our Postgres recently started reporting considerably different execution times for the same query. When executed from our JDBC application the Postgres logs report an average execution time of 1500 ms but when the query is manually executed through `psql` it doesn't take longer than 50 ms.

With a view to investigate discrepancies in the plan we enabled `auto_explain` in `session_preload_libraries` with `auto_explain.log_min_duration = '1s'`. All application servers were bounced to ensure new connections were created and picked up the changes. However this trouble query does not have an explain plan printed, even when its execution time exceeds the threshold (other queries do though).

Does anyone have ideas of why the explain plan is not being printed?

Sample log entry for trouble query executed from application:
```
Aug 26 09:11:33 db-931 postgres[8106]: [66-1] 2022-08-26 09:11:33 GMT [8106]: [5-1] db=betwave,user=betwave_app_readonly_user LOG:  duration: 1423.481 ms  bind <unnamed>:

My understanding of how to auto_explain work - it deals only for execution calls, but in your case duration: 1423.481 ms on BIND call, before query execution.
At least in my understanding - auto_explain cannot work and will not help in case of a slow BIND call (because it's a time when the query is planned but not executed).
According documentation:
"Query planning typically occurs when the Bind message is processed. If the prepared statement has no parameters, or is executed repeatedly, the server might save the created plan and re-use it during subsequent Bind messages for the same prepared statement."
Hard to say what the reason for slow planning, but one (there could be others) likely reason is JIT work. Do you have JIT enabled?

 
--
Maxim Boguk
Senior Postgresql DBA
https://dataegret.com/

Phone UA: +380 99 143 0000
Phone AU: +61  45 218 5678

LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk

"Доктор, вы мне советовали так не делать, но почему мне по-прежнему больно когда я так делаю ещё раз?"

Re: Missing query plan for auto_explain.

From
Maxim Boguk
Date:


On Thu, Sep 8, 2022 at 1:18 PM Matheus Martin <matheus.martin@voidbridge.com> wrote:
We do have JIT enabled `jit=on` with `jit_above_cost=100000`.

I am sorry but I don't quite understand what role JIT plays in this situation with `auto_explain`. Could you please elaborate on that?


In your log - time spent during the execution stage (where auto-exlain can help) but during the bind/planning stage (e.g. generation of plan).
So you have a problem not with slow query execution, but with slow query planning, so autoexplain cannot help in that case.
JIT is one possible explanation of planning stage slowdown.

Can you run explain (analyze, costs, buffers, timing) of your query?
And check how long the planning stage took and check if JIT was used or not (and how much time spent during JIT if it had been used).



--
Maxim Boguk
Senior Postgresql DBA
https://dataegret.com/

Phone UA: +380 99 143 0000
Phone AU: +61  45 218 5678

Re: Missing query plan for auto_explain.

From
Matheus Martin
Date:
Understood. I have run a prepared statement with the query in question through `psql` and JIT was not used (see plan below), however please note that the long response times were never reproducible from `psql`, they only happen from our JDBC application.

                                                                                                                  QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=31.41..31.41 rows=1 width=707) (actual time=4.421..4.442 rows=23 loops=1)
   Buffers: shared hit=827 dirtied=1
   ->  Sort  (cost=31.41..31.41 rows=1 width=707) (actual time=4.416..4.434 rows=23 loops=1)
         Sort Key: users.unique_identifier, user_realm.user_realm_id
         Sort Method: quicksort  Memory: 31kB
         Buffers: shared hit=827 dirtied=1
         ->  Nested Loop Left Join  (cost=7.84..31.40 rows=1 width=707) (actual time=1.154..4.232 rows=23 loops=1)
               Buffers: shared hit=824 dirtied=1
               ->  Nested Loop  (cost=7.55..24.94 rows=1 width=704) (actual time=1.090..3.725 rows=23 loops=1)
                     Join Filter: ((acl_allowed_user_realm_category.user_realm_category_id)::text = (user_realm_category.user_realm_category_id)::text)
                     Buffers: shared hit=757 dirtied=1
                     ->  Nested Loop  (cost=7.42..24.75 rows=1 width=209) (actual time=1.015..3.564 rows=23 loops=1)
                           Buffers: shared hit=711 dirtied=1
                           ->  Nested Loop  (cost=7.29..23.95 rows=1 width=204) (actual time=0.950..3.399 rows=23 loops=1)
                                 Buffers: shared hit=665 dirtied=1
                                 ->  Nested Loop  (cost=7.15..23.73 rows=1 width=200) (actual time=0.911..3.228 rows=23 loops=1)
                                       Buffers: shared hit=619 dirtied=1
                                       ->  Nested Loop  (cost=6.72..23.22 rows=1 width=200) (actual time=0.803..2.481 rows=23 loops=1)
                                             Buffers: shared hit=527
                                             ->  Nested Loop  (cost=6.44..22.42 rows=1 width=191) (actual time=0.669..2.249 rows=23 loops=1)
                                                   Buffers: shared hit=480
                                                   ->  Nested Loop  (cost=6.17..21.55 rows=1 width=194) (actual time=0.616..2.091 rows=23 loops=1)
                                                         Join Filter: ((organisation.organisation_id)::text = (acl_allowed_organisation.organisation_id)::text)
                                                         Buffers: shared hit=411
                                                         ->  Nested Loop  (cost=5.88..20.16 rows=1 width=203) (actual time=0.514..1.863 rows=23 loops=1)
                                                               Buffers: shared hit=364
                                                               ->  Nested Loop  (cost=5.61..19.73 rows=1 width=187) (actual time=0.474..1.682 rows=23 loops=1)
                                                                     Buffers: shared hit=295
                                                                     ->  Nested Loop  (cost=5.33..18.82 rows=1 width=142) (actual time=0.424..1.446 rows=23 loops=1)
                                                                           Buffers: shared hit=226
                                                                           ->  Hash Join  (cost=4.91..10.44 rows=1 width=80) (actual time=0.340..1.148 rows=23 loops=1)
                                                                                 Hash Cond: (users.affiliate_id = acl_allowed_affiliate.affiliate_id)
                                                                                 Buffers: shared hit=134
                                                                                 ->  Index Scan using users_ix01 on users  (cost=0.43..5.45 rows=189 width=72) (actual time=0.090..0.822 rows=166 loops=1)
                                                                                       Index Cond: (((unique_identifier)::text ~>=~ 'robinson06'::text) AND ((unique_identifier)::text ~<~ 'robinson07'::text))
                                                                                       Filter: (((type)::text = ANY ('{LOCAL,EXTERNAL}'::text[])) AND ((unique_identifier)::text ~~ 'robinson06%'::text))
                                                                                       Buffers: shared hit=127
                                                                                 ->  Hash  (cost=4.45..4.45 rows=2 width=8) (actual time=0.135..0.136 rows=1 loops=1)
                                                                                       Buckets: 1024  Batches: 1  Memory Usage: 9kB
                                                                                       Buffers: shared hit=4
                                                                                       ->  Index Only Scan using pk_acl_allowed_affiliate on acl_allowed_affiliate  (cost=0.41..4.45 rows=2 width=8) (actual time=0.115..0.117 rows=1 loops=1)
                                                                                             Index Cond: ((user_id = '1468137'::bigint) AND (permission = 'READ'::text))
                                                                                             Heap Fetches: 0
                                                                                             Buffers: shared hit=4
                                                                           ->  Index Scan using pk_user_profile on user_profile  (cost=0.43..8.38 rows=1 width=62) (actual time=0.011..0.011 rows=1 loops=23)
                                                                                 Index Cond: (user_id = users.user_id)
                                                                                 Buffers: shared hit=92
                                                                     ->  Index Scan using pk_user_realm on user_realm  (cost=0.27..0.91 rows=1 width=60) (actual time=0.008..0.008 rows=1 loops=23)
                                                                           Index Cond: ((user_realm_id)::text = (users.user_realm_id)::text)
                                                                           Filter: ((type)::text = ANY ('{ADMIN,HUB,PLAYER}'::text[]))
                                                                           Buffers: shared hit=69
                                                               ->  Index Scan using pk_organisation on organisation  (cost=0.27..0.43 rows=1 width=16) (actual time=0.006..0.006 rows=1 loops=23)
                                                                     Index Cond: ((organisation_id)::text = (user_realm.organisation_id)::text)
                                                                     Buffers: shared hit=69
                                                         ->  Index Only Scan using pk_acl_allowed_organisation on acl_allowed_organisation  (cost=0.29..1.38 rows=1 width=9) (actual time=0.007..0.007 rows=1 loops=23)
                                                               Index Cond: ((user_id = '1468137'::bigint) AND (organisation_id = (user_realm.organisation_id)::text) AND (permission = 'READ'::text))
                                                               Heap Fetches: 0
                                                               Buffers: shared hit=47
                                                   ->  Index Scan using pk_affiliate on affiliate  (cost=0.28..0.86 rows=1 width=21) (actual time=0.005..0.005 rows=1 loops=23)
                                                         Index Cond: (affiliate_id = users.affiliate_id)
                                                         Buffers: shared hit=69
                                             ->  Index Only Scan using pk_acl_allowed_user_realm_category on acl_allowed_user_realm_category  (cost=0.28..0.59 rows=1 width=9) (actual time=0.008..0.008 rows=1 loops=23)
                                                   Index Cond: ((user_id = '1468137'::bigint) AND (user_realm_category_id = (user_realm.user_realm_category_id)::text) AND (permission = 'READ'::text))
                                                   Heap Fetches: 0
                                                   Buffers: shared hit=47
                                       ->  Index Scan using pk_user_statistics on user_statistics  (cost=0.43..0.51 rows=1 width=16) (actual time=0.030..0.030 rows=1 loops=23)
                                             Index Cond: (user_id = user_profile.user_id)
                                             Buffers: shared hit=92 dirtied=1
                                 ->  Index Scan using pk_vip_level on vip_level  (cost=0.14..0.22 rows=1 width=20) (actual time=0.005..0.005 rows=1 loops=23)
                                       Index Cond: (vip_level_id = users.vip_level_id)
                                       Buffers: shared hit=46
                           ->  Index Scan using pk_vip_schema on vip_schema  (cost=0.13..0.63 rows=1 width=22) (actual time=0.005..0.005 rows=1 loops=23)
                                 Index Cond: ((vip_schema_id)::text = (vip_level.vip_schema_id)::text)
                                 Buffers: shared hit=46
                     ->  Index Scan using pk_user_realm_category on user_realm_category  (cost=0.13..0.18 rows=1 width=1032) (actual time=0.005..0.005 rows=1 loops=23)
                           Index Cond: ((user_realm_category_id)::text = (user_realm.user_realm_category_id)::text)
                           Buffers: shared hit=46
               ->  Index Scan using local_balance_account_unq01 on local_balance_account  (cost=0.29..6.45 rows=1 width=15) (actual time=0.020..0.020 rows=1 loops=23)
                     Index Cond: ((user_id = users.user_id) AND ((balance_category)::text = 'CASH'::text))
                     Filter: ((currency)::text = (users.real_base_currency)::text)
                     Buffers: shared hit=67
 Planning:
   Buffers: shared hit=859
 Planning Time: 50.833 ms
 Execution Time: 5.480 ms
(85 rows) 

On Fri, 9 Sept 2022 at 22:06, Maxim Boguk <maxim.boguk@gmail.com> wrote:


On Thu, Sep 8, 2022 at 1:18 PM Matheus Martin <matheus.martin@voidbridge.com> wrote:
We do have JIT enabled `jit=on` with `jit_above_cost=100000`.

I am sorry but I don't quite understand what role JIT plays in this situation with `auto_explain`. Could you please elaborate on that?


In your log - time spent during the execution stage (where auto-exlain can help) but during the bind/planning stage (e.g. generation of plan).
So you have a problem not with slow query execution, but with slow query planning, so autoexplain cannot help in that case.
JIT is one possible explanation of planning stage slowdown.

Can you run explain (analyze, costs, buffers, timing) of your query?
And check how long the planning stage took and check if JIT was used or not (and how much time spent during JIT if it had been used).



--
Maxim Boguk
Senior Postgresql DBA
https://dataegret.com/

Phone UA: +380 99 143 0000
Phone AU: +61  45 218 5678

Re: Missing query plan for auto_explain.

From
Julien Rouhaud
Date:
On Mon, Sep 12, 2022 at 05:34:37PM +0100, Matheus Martin wrote:
> Understood. I have run a prepared statement with the query in question
> through `psql` and JIT was not used (see plan below), however please note
> that the long response times were never reproducible from `psql`, they only
> happen from our JDBC application.

Do you have pg_stat_statements enabled?  If yes you could check what are the
maximum execution and planning time for that query, which may give an answer.