Thread: Missing query plan for auto_explain.
Our Postgres recently started reporting considerably different execution times for the same query. When executed from our JDBC application the Postgres logs report an average execution time of 1500 ms but when the query is manually executed through `psql` it doesn't take longer than 50 ms.
With a view to investigate discrepancies in the plan we enabled `auto_explain` in `session_preload_libraries` with `auto_explain.log_min_duration = '1s'`. All application servers were bounced to ensure new connections were created and picked up the changes. However this trouble query does not have an explain plan printed, even when its execution time exceeds the threshold (other queries do though).
Does anyone have ideas of why the explain plan is not being printed?
Sample log entry for trouble query executed from application:
```
Aug 26 09:11:33 db-931 postgres[8106]: [66-1] 2022-08-26 09:11:33 GMT [8106]: [5-1] db=betwave,user=betwave_app_readonly_user LOG: duration: 1423.481 ms bind <unnamed>: /*@ org.sixty6.as66.QueryID:report.framework.userSearch */select users.user_id, users.unique_identifier, user_profile.name, user_profile.email, organisation.organisation_id, organisation.description, user_realm.user_realm_id, user_realm.description as col_8, user_realm.type, user_realm_category.description as col_10, vip_schema.description as col_11, vip_level.description as col_12, affiliate.description as col_13, users.status, users.creation_date, user_statistics.last_user_session_creation_date, users.real_base_currency, users.fun_base_currency, local_balance_account.balance
Aug 26 09:11:33 db-931 postgres[8106]: [66-2] #011from users users
Aug 26 09:11:33 db-931 postgres[8106]: [66-3] #011join user_profile user_profile
Aug 26 09:11:33 db-931 postgres[8106]: [66-4] #011on user_profile.user_id = users.user_id
Aug 26 09:11:33 db-931 postgres[8106]: [66-5] #011join user_realm user_realm
Aug 26 09:11:33 db-931 postgres[8106]: [66-6] #011on user_realm.user_realm_id = users.user_realm_id
Aug 26 09:11:33 db-931 postgres[8106]: [66-7] #011join organisation organisation
Aug 26 09:11:33 db-931 postgres[8106]: [66-8] #011on organisation.organisation_id = user_realm.organisation_id
Aug 26 09:11:33 db-931 postgres[8106]: [66-9] #011join acl_allowed_organisation acl_allowed_organisation
Aug 26 09:11:33 db-931 postgres[8106]: [66-10] #011on acl_allowed_organisation.organisation_id = organisation.organisation_id and acl_allowed_organisation.permission = $1 and acl_allowed_organisation.user_id = $2
Aug 26 09:11:33 db-931 postgres[8106]: [66-11] #011join affiliate affiliate
Aug 26 09:11:33 db-931 postgres[8106]: [66-12] #011on affiliate.affiliate_id = users.affiliate_id
Aug 26 09:11:33 db-931 postgres[8106]: [66-13] #011join acl_allowed_affiliate acl_allowed_affiliate
Aug 26 09:11:33 db-931 postgres[8106]: [66-14] #011on acl_allowed_affiliate.affiliate_id = affiliate.affiliate_id and acl_allowed_affiliate.permission = $3 and acl_allowed_affiliate.user_id = $4
Aug 26 09:11:33 db-931 postgres[8106]: [66-15] #011join acl_allowed_user_realm_category acl_allowed_user_realm_category
Aug 26 09:11:33 db-931 postgres[8106]: [66-16] #011on acl_allowed_user_realm_category.user_realm_category_id = user_realm.user_realm_category_id and acl_allowed_user_realm_category.permission = $5 and acl_allowed_user_realm_category.user_id = $6
Aug 26 09:11:33 db-931 postgres[8106]: [66-17] #011join user_statistics user_statistics
Aug 26 09:11:33 db-931 postgres[8106]: [66-18] #011on user_statistics.user_id = users.user_id
Aug 26 09:11:33 db-931 postgres[8106]: [66-19] #011join vip_level vip_level
Aug 26 09:11:33 db-931 postgres[8106]: [66-20] #011on vip_level.vip_level_id = users.vip_level_id
Aug 26 09:11:33 db-931 postgres[8106]: [66-21] #011join vip_schema vip_schema
Aug 26 09:11:33 db-931 postgres[8106]: [66-22] #011on vip_schema.vip_schema_id = vip_level.vip_schema_id
Aug 26 09:11:33 db-931 postgres[8106]: [66-23] #011join user_realm_category user_realm_category
Aug 26 09:11:33 db-931 postgres[8106]: [66-24] #011on user_realm_category.user_realm_category_id = user_realm.user_realm_category_id
Aug 26 09:11:33 db-931 postgres[8106]: [66-25] #011left join local_balance_account local_balance_account
Aug 26 09:11:33 db-931 postgres[8106]: [66-26] #011on local_balance_account.user_id = users.user_id and local_balance_account.currency = users.real_base_currency and local_balance_account.balance_category = $7 and local_balance_account.primary_balance = $8
Aug 26 09:11:33 db-931 postgres[8106]: [66-27] #011where users.type in ($9, $10) and user_realm.type in ($11, $12, $13) and users.unique_identifier like $14 escape '\'
Aug 26 09:11:33 db-931 postgres[8106]: [66-28] #011order by users.unique_identifier asc, user_realm.user_realm_id asc
Aug 26 09:11:33 db-931 postgres[8106]: [66-29] #011limit $15
Aug 26 09:11:33 db-931 postgres[8106]: [66-30] 2022-08-26 09:11:33 GMT [8106]: [6-1] db=betwave,user=betwave_app_readonly_user DETAIL: parameters: $1 = 'READ', $2 = '1468137', $3 = 'READ', $4 = '1468137', $5 = 'READ', $6 = '1468137', $7 = 'CASH', $8 = 't', $9 = 'LOCAL', $10 = 'EXTERNAL', $11 = 'ADMIN', $12 = 'HUB', $13 = 'PLAYER', $14 = 'robinson06%', $15 = '200'
```
Sample log entry for another query executed from application:
```
Aug 26 15:10:58 db-931 postgres[40727]: [71-1] 2022-08-26 15:10:58 GMT [40727]: [15-1] db=betwave,user=betwave_app_readonly_user LOG: duration: 1069.964 ms plan:
Aug 26 15:10:58 db-931 postgres[40727]: [71-2] #011Query Text: /*@ org.sixty6.as66.QueryID:health.betwave.game.external.InProgressGamePlays:com.betconstruct */select x.hour, sum(x.in_progress_count) as col_2, sum(x.closed_count) as col_3
Aug 26 15:10:58 db-931 postgres[40727]: [71-3] #011from ((select date_trunc('hour',game_play.start_date AT TIME ZONE INTERVAL '0 seconds') AT TIME ZONE INTERVAL '0 seconds' as hour, count(*) as in_progress_count, $1 as closed_count
Aug 26 15:10:58 db-931 postgres[40727]: [71-4] #011from game_play game_play
Aug 26 15:10:58 db-931 postgres[40727]: [71-5] #011join game_profile game_profile
Aug 26 15:10:58 db-931 postgres[40727]: [71-6] #011on game_profile.game_profile_id = game_play.game_profile_id
Aug 26 15:10:58 db-931 postgres[40727]: [71-7] #011join game game
Aug 26 15:10:58 db-931 postgres[40727]: [71-8] #011on game.game_id = game_profile.game_id
Aug 26 15:10:58 db-931 postgres[40727]: [71-9] #011where game.game_engine_id = $2 and game_play.status = $3 and game_play.start_date >= $4 and game_play.start_date < $5
Aug 26 15:10:58 db-931 postgres[40727]: [71-10] #011group by date_trunc('hour',game_play.start_date AT TIME ZONE INTERVAL '0 seconds') AT TIME ZONE INTERVAL '0 seconds')
Aug 26 15:10:58 db-931 postgres[40727]: [71-11] #011union all
Aug 26 15:10:58 db-931 postgres[40727]: [71-12] #011(select date_trunc('hour',game_activity.interval_start_date AT TIME ZONE INTERVAL '0 seconds') AT TIME ZONE INTERVAL '0 seconds' as hour, $6 as in_progress_count, sum(game_activity.game_play_count) as closed_count
Aug 26 15:10:58 db-931 postgres[40727]: [71-13] #011from game_activity game_activity
Aug 26 15:10:58 db-931 postgres[40727]: [71-14] #011join game game
Aug 26 15:10:58 db-931 postgres[40727]: [71-15] #011on game.game_id = game_activity.game_id
Aug 26 15:10:58 db-931 postgres[40727]: [71-16] #011where game.game_engine_id = $7 and tstzrange($8, $9, $10) @> tstzrange(game_activity.interval_start_date, game_activity.interval_end_date, $11)
Aug 26 15:10:58 db-931 postgres[40727]: [71-17] #011group by date_trunc('hour',game_activity.interval_start_date AT TIME ZONE INTERVAL '0 seconds') AT TIME ZONE INTERVAL '0 seconds')) as x
Aug 26 15:10:58 db-931 postgres[40727]: [71-18] #011group by x.hour
Aug 26 15:10:58 db-931 postgres[40727]: [71-19] #011having sum(x.in_progress_count) > $12 and sum(x.in_progress_count) + sum(x.closed_count) >= $13 and sum(x.in_progress_count) >= (sum(x.in_progress_count) + sum(x.closed_count)) * $14
Aug 26 15:10:58 db-931 postgres[40727]: [71-20] #011GroupAggregate (cost=10069.43..10072.16 rows=1 width=72)
Aug 26 15:10:58 db-931 postgres[40727]: [71-21] #011 Group Key: "*SELECT* 1".hour
Aug 26 15:10:58 db-931 postgres[40727]: [71-22] #011 Filter: ((sum("*SELECT* 1".in_progress_count) > '0'::numeric) AND ((sum("*SELECT* 1".in_progress_count) + sum((('0'::bigint)::numeric))) >= '0'::numeric) AND (sum("*SELECT* 1".in_progress_count) >= ((sum("*SELECT* 1".in_progress_count) + sum((('0'::bigint)::numeric))) * '0'::numeric)))
Aug 26 15:10:58 db-931 postgres[40727]: [71-23] #011 -> Sort (cost=10069.43..10069.53 rows=39 width=48)
Aug 26 15:10:58 db-931 postgres[40727]: [71-24] #011 Sort Key: "*SELECT* 1".hour
Aug 26 15:10:58 db-931 postgres[40727]: [71-25] #011 -> Append (cost=10048.65..10068.01 rows=39 width=48)
Aug 26 15:10:58 db-931 postgres[40727]: [71-26] #011 -> Subquery Scan on "*SELECT* 1" (cost=10048.65..10050.08 rows=38 width=48)
Aug 26 15:10:58 db-931 postgres[40727]: [71-27] #011 -> GroupAggregate (cost=10048.65..10049.60 rows=38 width=24)
Aug 26 15:10:58 db-931 postgres[40727]: [71-28] #011 Group Key: (timezone('00:00:00'::interval, date_trunc('hour'::text, timezone('00:00:00'::interval, game_play.start_date))))
Aug 26 15:10:58 db-931 postgres[40727]: [71-29] #011 -> Sort (cost=10048.65..10048.75 rows=38 width=8)
Aug 26 15:10:58 db-931 postgres[40727]: [71-30] #011 Sort Key: (timezone('00:00:00'::interval, date_trunc('hour'::text, timezone('00:00:00'::interval, game_play.start_date))))
Aug 26 15:10:58 db-931 postgres[40727]: [71-31] #011 -> Nested Loop (cost=116.74..10047.66 rows=38 width=8)
Aug 26 15:10:58 db-931 postgres[40727]: [71-32] #011 -> Hash Join (cost=116.33..209.84 rows=18 width=8)
Aug 26 15:10:58 db-931 postgres[40727]: [71-33] #011 Hash Cond: (game_profile.game_id = game.game_id)
Aug 26 15:10:58 db-931 postgres[40727]: [71-34] #011 -> Seq Scan on game_profile (cost=0.00..86.54 rows=2654 width=16)
Aug 26 15:10:58 db-931 postgres[40727]: [71-35] #011 -> Hash (cost=116.12..116.12 rows=16 width=8)
Aug 26 15:10:58 db-931 postgres[40727]: [71-36] #011 -> Seq Scan on game (cost=0.00..116.12 rows=16 width=8)
Aug 26 15:10:58 db-931 postgres[40727]: [71-37] #011 Filter: ((game_engine_id)::text = 'com.betconstruct'::text)
Aug 26 15:10:58 db-931 postgres[40727]: [71-38] #011 -> Index Scan using game_play_operable_unq01 on game_play_operable game_play (cost=0.42..546.45 rows=8 width=16)
Aug 26 15:10:58 db-931 postgres[40727]: [71-39] #011 Index Cond: (game_profile_id = game_profile.game_profile_id)
Aug 26 15:10:58 db-931 postgres[40727]: [71-40] #011 Filter: ((start_date >= '2022-08-25 16:00:00+00'::timestamp with time zone) AND (start_date < '2022-08-26 15:00:57.028601+00'::timestamp with time zone) AND ((status)::text = 'IN_PROGRESS'::text))
Aug 26 15:10:58 db-931 postgres[40727]: [71-41] #011 -> GroupAggregate (cost=17.70..17.73 rows=1 width=48)
Aug 26 15:10:58 db-931 postgres[40727]: [71-42] #011 Group Key: (timezone('00:00:00'::interval, date_trunc('hour'::text, timezone('00:00:00'::interval, game_activity.interval_start_date))))
Aug 26 15:10:58 db-931 postgres[40727]: [71-43] #011 -> Sort (cost=17.70..17.70 rows=1 width=16)
Aug 26 15:10:58 db-931 postgres[40727]: [71-44] #011 Sort Key: (timezone('00:00:00'::interval, date_trunc('hour'::text, timezone('00:00:00'::interval, game_activity.interval_start_date))))
Aug 26 15:10:58 db-931 postgres[40727]: [71-45] #011 -> Nested Loop (cost=0.70..17.69 rows=1 width=16)
Aug 26 15:10:58 db-931 postgres[40727]: [71-46] #011 -> Index Scan using game_activity_ix01 on game_activity (cost=0.42..8.44 rows=1 width=24)
Aug 26 15:10:58 db-931 postgres[40727]: [71-47] #011 Index Cond: (tstzrange(interval_start_date, interval_end_date, '[]'::text) <@ '["2022-08-25 16:00:00+00","2022-08-26 16:00:00+00"]'::tstzrange)
Aug 26 15:10:58 db-931 postgres[40727]: [71-48] #011 -> Index Scan using pk_game on game game_1 (cost=0.28..8.30 rows=1 width=8)
Aug 26 15:10:58 db-931 postgres[40727]: [71-49] #011 Index Cond: (game_id = game_activity.game_id)
Aug 26 15:10:58 db-931 postgres[40727]: [71-50] #011 Filter: ((game_engine_id)::text = 'com.betconstruct'::text)
```
Our environment consists of PostgreSQL 13.6 on Linux.
On 2022-Aug-30, Matheus Martin wrote: > Our Postgres recently started reporting considerably different > execution times for the same query. When executed from our JDBC > application the Postgres logs report an average execution time of 1500 > ms but when the query is manually executed through `psql` it doesn't > take longer than 50 ms. I don't know why the plan is not saved by auto_explain (maybe we're missing ExecutorEnd calls somewhere? that would be strange), but one frequent reason for queries to show different plan in JDBC than psql is the use of prepared statements. Did you try using "PREPARE yourquery(...)" and then EXPLAIN EXECUTE(...)? Sometimes that helps to recreate the original problem. (Apparently, ExecutorEnd is called from PortalCleanup; what happens with the portal for an extended-protocol query?) -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
Hi, On Tue, Aug 30, 2022 at 01:16:43PM +0200, Alvaro Herrera wrote: > On 2022-Aug-30, Matheus Martin wrote: > > > Our Postgres recently started reporting considerably different > > execution times for the same query. When executed from our JDBC > > application the Postgres logs report an average execution time of 1500 > > ms but when the query is manually executed through `psql` it doesn't > > take longer than 50 ms. > > I don't know why the plan is not saved by auto_explain (maybe we're > missing ExecutorEnd calls somewhere? that would be strange), but one > frequent reason for queries to show different plan in JDBC than psql is > the use of prepared statements. Did you try using "PREPARE > yourquery(...)" and then EXPLAIN EXECUTE(...)? Sometimes that helps to > recreate the original problem. > > (Apparently, ExecutorEnd is called from PortalCleanup; what happens with > the portal for an extended-protocol query?) AFAICS log_min_duration_statements threshold is based on the full query processing time while auto_explain is only based on the executor runtime, so one more likely explanation is that out of the 1423ms, more than 423ms were spent in the planner?
Good idea on using an actual prepared statement but unfortunately it didn't produce any different result.
Could you please elaborate a bit on your advice concerning ExecutorEnd/PortalCleanup? I am afraid it doesn't mean much to me.
Could you please elaborate a bit on your advice concerning ExecutorEnd/PortalCleanup? I am afraid it doesn't mean much to me.
On Tue, 30 Aug 2022 at 12:16, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
On 2022-Aug-30, Matheus Martin wrote:
> Our Postgres recently started reporting considerably different
> execution times for the same query. When executed from our JDBC
> application the Postgres logs report an average execution time of 1500
> ms but when the query is manually executed through `psql` it doesn't
> take longer than 50 ms.
I don't know why the plan is not saved by auto_explain (maybe we're
missing ExecutorEnd calls somewhere? that would be strange), but one
frequent reason for queries to show different plan in JDBC than psql is
the use of prepared statements. Did you try using "PREPARE
yourquery(...)" and then EXPLAIN EXECUTE(...)? Sometimes that helps to
recreate the original problem.
(Apparently, ExecutorEnd is called from PortalCleanup; what happens with
the portal for an extended-protocol query?)
--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
The threshold for `auto_explain` was changed to 500 ms and explain plans are still not being logged.
On Tue, 30 Aug 2022 at 13:30, Julien Rouhaud <rjuju123@gmail.com> wrote:
Hi,
On Tue, Aug 30, 2022 at 01:16:43PM +0200, Alvaro Herrera wrote:
> On 2022-Aug-30, Matheus Martin wrote:
>
> > Our Postgres recently started reporting considerably different
> > execution times for the same query. When executed from our JDBC
> > application the Postgres logs report an average execution time of 1500
> > ms but when the query is manually executed through `psql` it doesn't
> > take longer than 50 ms.
>
> I don't know why the plan is not saved by auto_explain (maybe we're
> missing ExecutorEnd calls somewhere? that would be strange), but one
> frequent reason for queries to show different plan in JDBC than psql is
> the use of prepared statements. Did you try using "PREPARE
> yourquery(...)" and then EXPLAIN EXECUTE(...)? Sometimes that helps to
> recreate the original problem.
>
> (Apparently, ExecutorEnd is called from PortalCleanup; what happens with
> the portal for an extended-protocol query?)
AFAICS log_min_duration_statements threshold is based on the full query
processing time while auto_explain is only based on the executor runtime, so
one more likely explanation is that out of the 1423ms, more than 423ms were
spent in the planner?
On 2022-Aug-30, Matheus Martin wrote: > Good idea on using an actual prepared statement but unfortunately it didn't > produce any different result. I should have also mentioned to try the EXPLAIN EXECUTE six times and see if the last one produces a different plan. That's when it switches from planning every time to planning with generic arguments, as I recall. > Could you please elaborate a bit on your advice concerning > ExecutorEnd/PortalCleanup? I am afraid it doesn't mean much to me. That wasn't advice actually, just a note that the code might be doing that thing wrong, causing auto_explain to miss it. This is an unproven hypothesis that is likely to be all wrong. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "I'm impressed how quickly you are fixing this obscure issue. I came from MS SQL and it would be hard for me to put into words how much of a better job you all are doing on [PostgreSQL]." Steve Midgley, http://archives.postgresql.org/pgsql-sql/2008-08/msg00000.php
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > On 2022-Aug-30, Matheus Martin wrote: >> Good idea on using an actual prepared statement but unfortunately it didn't >> produce any different result. > I should have also mentioned to try the EXPLAIN EXECUTE six times and > see if the last one produces a different plan. That's when it switches > from planning every time to planning with generic arguments, as I > recall. In recent versions you can bypass that with SET plan_cache_mode = force_generic_plan; regards, tom lane
We tried running the prepared statement six times as suggested but wasn't still able to recreate the original problem.
Perhaps more concerning/relevant is that we have not found any explanation to why the explain plan is not being logged by `auto_explain`. Could this be a bug? Shall we report it?
Perhaps more concerning/relevant is that we have not found any explanation to why the explain plan is not being logged by `auto_explain`. Could this be a bug? Shall we report it?
On Tue, 30 Aug 2022 at 17:45, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
On 2022-Aug-30, Matheus Martin wrote:
> Good idea on using an actual prepared statement but unfortunately it didn't
> produce any different result.
I should have also mentioned to try the EXPLAIN EXECUTE six times and
see if the last one produces a different plan. That's when it switches
from planning every time to planning with generic arguments, as I
recall.
> Could you please elaborate a bit on your advice concerning
> ExecutorEnd/PortalCleanup? I am afraid it doesn't mean much to me.
That wasn't advice actually, just a note that the code might be doing
that thing wrong, causing auto_explain to miss it. This is an unproven
hypothesis that is likely to be all wrong.
--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"I'm impressed how quickly you are fixing this obscure issue. I came from
MS SQL and it would be hard for me to put into words how much of a better job
you all are doing on [PostgreSQL]."
Steve Midgley, http://archives.postgresql.org/pgsql-sql/2008-08/msg00000.php
Hi, On Thu, Sep 01, 2022 at 08:20:13PM +0100, Matheus Martin wrote: > We tried running the prepared statement six times as suggested but wasn't > still able to recreate the original problem. > > Perhaps more concerning/relevant is that we have not found any explanation > to why the explain plan is not being logged by `auto_explain`. Could this > be a bug? Shall we report it? Just to be sure, do you get at least some plans logged by auto_explain when queries are executed by the JDBC application? Can you try to temporarily lower auto_explain.log_min_duration_statements to less than 50ms and see what auto_explain sees for the execution time (and planning time). Another possibility would be some conflicting locks held. If the conflict happens during the planning auto_explain still won't be triggered as it's outside the executor. Also, have you enabled log_lock_waits?
Yes, we do see some plans logged by the auto_explain. We couldn't find a `auto_explain.log_min_duration_statements` setting but `log_min_duration_statement` as in https://www.postgresql.org/docs/current/runtime-config-logging.html is set to 100 ms.
Unfortunately, due to the amount of traffic we have, we cannot lower the threshold for the auto_explain.
`log_lock_waits` is turned on and the logs do not indicate any locks related to the tables in the query.
Unfortunately, due to the amount of traffic we have, we cannot lower the threshold for the auto_explain.
`log_lock_waits` is turned on and the logs do not indicate any locks related to the tables in the query.
On Fri, 2 Sept 2022 at 05:49, Julien Rouhaud <rjuju123@gmail.com> wrote:
Hi,
On Thu, Sep 01, 2022 at 08:20:13PM +0100, Matheus Martin wrote:
> We tried running the prepared statement six times as suggested but wasn't
> still able to recreate the original problem.
>
> Perhaps more concerning/relevant is that we have not found any explanation
> to why the explain plan is not being logged by `auto_explain`. Could this
> be a bug? Shall we report it?
Just to be sure, do you get at least some plans logged by auto_explain when
queries are executed by the JDBC application?
Can you try to temporarily lower auto_explain.log_min_duration_statements to
less than 50ms and see what auto_explain sees for the execution time (and
planning time).
Another possibility would be some conflicting locks held. If the conflict
happens during the planning auto_explain still won't be triggered as it's
outside the executor. Also, have you enabled log_lock_waits?
On 2022-09-02 10:58:58 +0100, Matheus Martin wrote: > Yes, we do see some plans logged by the auto_explain. We couldn't find a > `auto_explain.log_min_duration_statements` setting This is weird as the documentation says: | Note that the default behavior is to do nothing, so you must set at | least auto_explain.log_min_duration if you want any results. What does show auto_explain.log_min_duration; return? > but `log_min_duration_statement` as in > https://www.postgresql.org/docs/current/ runtime-config-logging.html > is set to 100 ms. I don't think this affects auto_explain. hp -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | hjp@hjp.at | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!"
Attachment
`auto_explain.log_min_duration` is set to 500 ms.
On Mon, 5 Sept 2022 at 12:35, Peter J. Holzer <hjp-pgsql@hjp.at> wrote:
On 2022-09-02 10:58:58 +0100, Matheus Martin wrote:
> Yes, we do see some plans logged by the auto_explain. We couldn't find a
> `auto_explain.log_min_duration_statements` setting
This is weird as the documentation says:
| Note that the default behavior is to do nothing, so you must set at
| least auto_explain.log_min_duration if you want any results.
What does
show auto_explain.log_min_duration;
return?
> but `log_min_duration_statement` as in
> https://www.postgresql.org/docs/current/ runtime-config-logging.html
> is set to 100 ms.
I don't think this affects auto_explain.
hp
--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"
On Tue, Aug 30, 2022 at 1:38 PM Matheus Martin <matheus.martin@voidbridge.com> wrote:
Our Postgres recently started reporting considerably different execution times for the same query. When executed from our JDBC application the Postgres logs report an average execution time of 1500 ms but when the query is manually executed through `psql` it doesn't take longer than 50 ms.
With a view to investigate discrepancies in the plan we enabled `auto_explain` in `session_preload_libraries` with `auto_explain.log_min_duration = '1s'`. All application servers were bounced to ensure new connections were created and picked up the changes. However this trouble query does not have an explain plan printed, even when its execution time exceeds the threshold (other queries do though).
Does anyone have ideas of why the explain plan is not being printed?
Sample log entry for trouble query executed from application:
```
Aug 26 09:11:33 db-931 postgres[8106]: [66-1] 2022-08-26 09:11:33 GMT [8106]: [5-1] db=betwave,user=betwave_app_readonly_user LOG: duration: 1423.481 ms bind <unnamed>:
My understanding of how to auto_explain work - it deals only for execution calls, but in your case duration: 1423.481 ms on BIND call, before query execution.
At least in my understanding - auto_explain cannot work and will not help in case of a slow BIND call (because it's a time when the query is planned but not executed).
According documentation:
"Query planning typically occurs when the Bind message is processed. If the prepared statement has no parameters, or is executed repeatedly, the server might save the created plan and re-use it during subsequent Bind messages for the same prepared statement."
Hard to say what the reason for slow planning, but one (there could be others) likely reason is JIT work. Do you have JIT enabled?
--
Maxim Boguk
Senior Postgresql DBA
https://dataegret.com/
Phone UA: +380 99 143 0000
Phone AU: +61 45 218 5678
LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk
"Доктор, вы мне советовали так не делать, но почему мне по-прежнему больно когда я так делаю ещё раз?"
Senior Postgresql DBA
https://dataegret.com/
Phone UA: +380 99 143 0000
Phone AU: +61 45 218 5678
LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk
"Доктор, вы мне советовали так не делать, но почему мне по-прежнему больно когда я так делаю ещё раз?"
We do have JIT enabled `jit=on` with `jit_above_cost=100000`.
I am sorry but I don't quite understand what role JIT plays in this situation with `auto_explain`. Could you please elaborate on that?
I am sorry but I don't quite understand what role JIT plays in this situation with `auto_explain`. Could you please elaborate on that?
On Tue, 6 Sept 2022 at 00:29, Maxim Boguk <maxim.boguk@gmail.com> wrote:
On Tue, Aug 30, 2022 at 1:38 PM Matheus Martin <matheus.martin@voidbridge.com> wrote:Our Postgres recently started reporting considerably different execution times for the same query. When executed from our JDBC application the Postgres logs report an average execution time of 1500 ms but when the query is manually executed through `psql` it doesn't take longer than 50 ms.
With a view to investigate discrepancies in the plan we enabled `auto_explain` in `session_preload_libraries` with `auto_explain.log_min_duration = '1s'`. All application servers were bounced to ensure new connections were created and picked up the changes. However this trouble query does not have an explain plan printed, even when its execution time exceeds the threshold (other queries do though).
Does anyone have ideas of why the explain plan is not being printed?
Sample log entry for trouble query executed from application:
```
Aug 26 09:11:33 db-931 postgres[8106]: [66-1] 2022-08-26 09:11:33 GMT [8106]: [5-1] db=betwave,user=betwave_app_readonly_user LOG: duration: 1423.481 ms bind <unnamed>:My understanding of how to auto_explain work - it deals only for execution calls, but in your case duration: 1423.481 ms on BIND call, before query execution.At least in my understanding - auto_explain cannot work and will not help in case of a slow BIND call (because it's a time when the query is planned but not executed).According documentation:"Query planning typically occurs when the Bind message is processed. If the prepared statement has no parameters, or is executed repeatedly, the server might save the created plan and re-use it during subsequent Bind messages for the same prepared statement."Hard to say what the reason for slow planning, but one (there could be others) likely reason is JIT work. Do you have JIT enabled?--Maxim Boguk
Senior Postgresql DBA
https://dataegret.com/
Phone UA: +380 99 143 0000
Phone AU: +61 45 218 5678
LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk
"Доктор, вы мне советовали так не делать, но почему мне по-прежнему больно когда я так делаю ещё раз?"
On Thu, Sep 8, 2022 at 1:18 PM Matheus Martin <matheus.martin@voidbridge.com> wrote:
We do have JIT enabled `jit=on` with `jit_above_cost=100000`.
I am sorry but I don't quite understand what role JIT plays in this situation with `auto_explain`. Could you please elaborate on that?
In your log - time spent during the execution stage (where auto-exlain can help) but during the bind/planning stage (e.g. generation of plan).
So you have a problem not with slow query execution, but with slow query planning, so autoexplain cannot help in that case.
JIT is one possible explanation of planning stage slowdown.
Can you run explain (analyze, costs, buffers, timing) of your query?
And check how long the planning stage took and check if JIT was used or not (and how much time spent during JIT if it had been used).
Maxim Boguk
Senior Postgresql DBA
https://dataegret.com/
Phone UA: +380 99 143 0000
Phone AU: +61 45 218 5678
Senior Postgresql DBA
https://dataegret.com/
Phone UA: +380 99 143 0000
Phone AU: +61 45 218 5678
Understood. I have run a prepared statement with the query in question through `psql` and JIT was not used (see plan below), however please note that the long response times were never reproducible from `psql`, they only happen from our JDBC application.
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=31.41..31.41 rows=1 width=707) (actual time=4.421..4.442 rows=23 loops=1)
Buffers: shared hit=827 dirtied=1
-> Sort (cost=31.41..31.41 rows=1 width=707) (actual time=4.416..4.434 rows=23 loops=1)
Sort Key: users.unique_identifier, user_realm.user_realm_id
Sort Method: quicksort Memory: 31kB
Buffers: shared hit=827 dirtied=1
-> Nested Loop Left Join (cost=7.84..31.40 rows=1 width=707) (actual time=1.154..4.232 rows=23 loops=1)
Buffers: shared hit=824 dirtied=1
-> Nested Loop (cost=7.55..24.94 rows=1 width=704) (actual time=1.090..3.725 rows=23 loops=1)
Join Filter: ((acl_allowed_user_realm_category.user_realm_category_id)::text = (user_realm_category.user_realm_category_id)::text)
Buffers: shared hit=757 dirtied=1
-> Nested Loop (cost=7.42..24.75 rows=1 width=209) (actual time=1.015..3.564 rows=23 loops=1)
Buffers: shared hit=711 dirtied=1
-> Nested Loop (cost=7.29..23.95 rows=1 width=204) (actual time=0.950..3.399 rows=23 loops=1)
Buffers: shared hit=665 dirtied=1
-> Nested Loop (cost=7.15..23.73 rows=1 width=200) (actual time=0.911..3.228 rows=23 loops=1)
Buffers: shared hit=619 dirtied=1
-> Nested Loop (cost=6.72..23.22 rows=1 width=200) (actual time=0.803..2.481 rows=23 loops=1)
Buffers: shared hit=527
-> Nested Loop (cost=6.44..22.42 rows=1 width=191) (actual time=0.669..2.249 rows=23 loops=1)
Buffers: shared hit=480
-> Nested Loop (cost=6.17..21.55 rows=1 width=194) (actual time=0.616..2.091 rows=23 loops=1)
Join Filter: ((organisation.organisation_id)::text = (acl_allowed_organisation.organisation_id)::text)
Buffers: shared hit=411
-> Nested Loop (cost=5.88..20.16 rows=1 width=203) (actual time=0.514..1.863 rows=23 loops=1)
Buffers: shared hit=364
-> Nested Loop (cost=5.61..19.73 rows=1 width=187) (actual time=0.474..1.682 rows=23 loops=1)
Buffers: shared hit=295
-> Nested Loop (cost=5.33..18.82 rows=1 width=142) (actual time=0.424..1.446 rows=23 loops=1)
Buffers: shared hit=226
-> Hash Join (cost=4.91..10.44 rows=1 width=80) (actual time=0.340..1.148 rows=23 loops=1)
Hash Cond: (users.affiliate_id = acl_allowed_affiliate.affiliate_id)
Buffers: shared hit=134
-> Index Scan using users_ix01 on users (cost=0.43..5.45 rows=189 width=72) (actual time=0.090..0.822 rows=166 loops=1)
Index Cond: (((unique_identifier)::text ~>=~ 'robinson06'::text) AND ((unique_identifier)::text ~<~ 'robinson07'::text))
Filter: (((type)::text = ANY ('{LOCAL,EXTERNAL}'::text[])) AND ((unique_identifier)::text ~~ 'robinson06%'::text))
Buffers: shared hit=127
-> Hash (cost=4.45..4.45 rows=2 width=8) (actual time=0.135..0.136 rows=1 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
Buffers: shared hit=4
-> Index Only Scan using pk_acl_allowed_affiliate on acl_allowed_affiliate (cost=0.41..4.45 rows=2 width=8) (actual time=0.115..0.117 rows=1 loops=1)
Index Cond: ((user_id = '1468137'::bigint) AND (permission = 'READ'::text))
Heap Fetches: 0
Buffers: shared hit=4
-> Index Scan using pk_user_profile on user_profile (cost=0.43..8.38 rows=1 width=62) (actual time=0.011..0.011 rows=1 loops=23)
Index Cond: (user_id = users.user_id)
Buffers: shared hit=92
-> Index Scan using pk_user_realm on user_realm (cost=0.27..0.91 rows=1 width=60) (actual time=0.008..0.008 rows=1 loops=23)
Index Cond: ((user_realm_id)::text = (users.user_realm_id)::text)
Filter: ((type)::text = ANY ('{ADMIN,HUB,PLAYER}'::text[]))
Buffers: shared hit=69
-> Index Scan using pk_organisation on organisation (cost=0.27..0.43 rows=1 width=16) (actual time=0.006..0.006 rows=1 loops=23)
Index Cond: ((organisation_id)::text = (user_realm.organisation_id)::text)
Buffers: shared hit=69
-> Index Only Scan using pk_acl_allowed_organisation on acl_allowed_organisation (cost=0.29..1.38 rows=1 width=9) (actual time=0.007..0.007 rows=1 loops=23)
Index Cond: ((user_id = '1468137'::bigint) AND (organisation_id = (user_realm.organisation_id)::text) AND (permission = 'READ'::text))
Heap Fetches: 0
Buffers: shared hit=47
-> Index Scan using pk_affiliate on affiliate (cost=0.28..0.86 rows=1 width=21) (actual time=0.005..0.005 rows=1 loops=23)
Index Cond: (affiliate_id = users.affiliate_id)
Buffers: shared hit=69
-> Index Only Scan using pk_acl_allowed_user_realm_category on acl_allowed_user_realm_category (cost=0.28..0.59 rows=1 width=9) (actual time=0.008..0.008 rows=1 loops=23)
Index Cond: ((user_id = '1468137'::bigint) AND (user_realm_category_id = (user_realm.user_realm_category_id)::text) AND (permission = 'READ'::text))
Heap Fetches: 0
Buffers: shared hit=47
-> Index Scan using pk_user_statistics on user_statistics (cost=0.43..0.51 rows=1 width=16) (actual time=0.030..0.030 rows=1 loops=23)
Index Cond: (user_id = user_profile.user_id)
Buffers: shared hit=92 dirtied=1
-> Index Scan using pk_vip_level on vip_level (cost=0.14..0.22 rows=1 width=20) (actual time=0.005..0.005 rows=1 loops=23)
Index Cond: (vip_level_id = users.vip_level_id)
Buffers: shared hit=46
-> Index Scan using pk_vip_schema on vip_schema (cost=0.13..0.63 rows=1 width=22) (actual time=0.005..0.005 rows=1 loops=23)
Index Cond: ((vip_schema_id)::text = (vip_level.vip_schema_id)::text)
Buffers: shared hit=46
-> Index Scan using pk_user_realm_category on user_realm_category (cost=0.13..0.18 rows=1 width=1032) (actual time=0.005..0.005 rows=1 loops=23)
Index Cond: ((user_realm_category_id)::text = (user_realm.user_realm_category_id)::text)
Buffers: shared hit=46
-> Index Scan using local_balance_account_unq01 on local_balance_account (cost=0.29..6.45 rows=1 width=15) (actual time=0.020..0.020 rows=1 loops=23)
Index Cond: ((user_id = users.user_id) AND ((balance_category)::text = 'CASH'::text))
Filter: ((currency)::text = (users.real_base_currency)::text)
Buffers: shared hit=67
Planning:
Buffers: shared hit=859
Planning Time: 50.833 ms
Execution Time: 5.480 ms
(85 rows)
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=31.41..31.41 rows=1 width=707) (actual time=4.421..4.442 rows=23 loops=1)
Buffers: shared hit=827 dirtied=1
-> Sort (cost=31.41..31.41 rows=1 width=707) (actual time=4.416..4.434 rows=23 loops=1)
Sort Key: users.unique_identifier, user_realm.user_realm_id
Sort Method: quicksort Memory: 31kB
Buffers: shared hit=827 dirtied=1
-> Nested Loop Left Join (cost=7.84..31.40 rows=1 width=707) (actual time=1.154..4.232 rows=23 loops=1)
Buffers: shared hit=824 dirtied=1
-> Nested Loop (cost=7.55..24.94 rows=1 width=704) (actual time=1.090..3.725 rows=23 loops=1)
Join Filter: ((acl_allowed_user_realm_category.user_realm_category_id)::text = (user_realm_category.user_realm_category_id)::text)
Buffers: shared hit=757 dirtied=1
-> Nested Loop (cost=7.42..24.75 rows=1 width=209) (actual time=1.015..3.564 rows=23 loops=1)
Buffers: shared hit=711 dirtied=1
-> Nested Loop (cost=7.29..23.95 rows=1 width=204) (actual time=0.950..3.399 rows=23 loops=1)
Buffers: shared hit=665 dirtied=1
-> Nested Loop (cost=7.15..23.73 rows=1 width=200) (actual time=0.911..3.228 rows=23 loops=1)
Buffers: shared hit=619 dirtied=1
-> Nested Loop (cost=6.72..23.22 rows=1 width=200) (actual time=0.803..2.481 rows=23 loops=1)
Buffers: shared hit=527
-> Nested Loop (cost=6.44..22.42 rows=1 width=191) (actual time=0.669..2.249 rows=23 loops=1)
Buffers: shared hit=480
-> Nested Loop (cost=6.17..21.55 rows=1 width=194) (actual time=0.616..2.091 rows=23 loops=1)
Join Filter: ((organisation.organisation_id)::text = (acl_allowed_organisation.organisation_id)::text)
Buffers: shared hit=411
-> Nested Loop (cost=5.88..20.16 rows=1 width=203) (actual time=0.514..1.863 rows=23 loops=1)
Buffers: shared hit=364
-> Nested Loop (cost=5.61..19.73 rows=1 width=187) (actual time=0.474..1.682 rows=23 loops=1)
Buffers: shared hit=295
-> Nested Loop (cost=5.33..18.82 rows=1 width=142) (actual time=0.424..1.446 rows=23 loops=1)
Buffers: shared hit=226
-> Hash Join (cost=4.91..10.44 rows=1 width=80) (actual time=0.340..1.148 rows=23 loops=1)
Hash Cond: (users.affiliate_id = acl_allowed_affiliate.affiliate_id)
Buffers: shared hit=134
-> Index Scan using users_ix01 on users (cost=0.43..5.45 rows=189 width=72) (actual time=0.090..0.822 rows=166 loops=1)
Index Cond: (((unique_identifier)::text ~>=~ 'robinson06'::text) AND ((unique_identifier)::text ~<~ 'robinson07'::text))
Filter: (((type)::text = ANY ('{LOCAL,EXTERNAL}'::text[])) AND ((unique_identifier)::text ~~ 'robinson06%'::text))
Buffers: shared hit=127
-> Hash (cost=4.45..4.45 rows=2 width=8) (actual time=0.135..0.136 rows=1 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
Buffers: shared hit=4
-> Index Only Scan using pk_acl_allowed_affiliate on acl_allowed_affiliate (cost=0.41..4.45 rows=2 width=8) (actual time=0.115..0.117 rows=1 loops=1)
Index Cond: ((user_id = '1468137'::bigint) AND (permission = 'READ'::text))
Heap Fetches: 0
Buffers: shared hit=4
-> Index Scan using pk_user_profile on user_profile (cost=0.43..8.38 rows=1 width=62) (actual time=0.011..0.011 rows=1 loops=23)
Index Cond: (user_id = users.user_id)
Buffers: shared hit=92
-> Index Scan using pk_user_realm on user_realm (cost=0.27..0.91 rows=1 width=60) (actual time=0.008..0.008 rows=1 loops=23)
Index Cond: ((user_realm_id)::text = (users.user_realm_id)::text)
Filter: ((type)::text = ANY ('{ADMIN,HUB,PLAYER}'::text[]))
Buffers: shared hit=69
-> Index Scan using pk_organisation on organisation (cost=0.27..0.43 rows=1 width=16) (actual time=0.006..0.006 rows=1 loops=23)
Index Cond: ((organisation_id)::text = (user_realm.organisation_id)::text)
Buffers: shared hit=69
-> Index Only Scan using pk_acl_allowed_organisation on acl_allowed_organisation (cost=0.29..1.38 rows=1 width=9) (actual time=0.007..0.007 rows=1 loops=23)
Index Cond: ((user_id = '1468137'::bigint) AND (organisation_id = (user_realm.organisation_id)::text) AND (permission = 'READ'::text))
Heap Fetches: 0
Buffers: shared hit=47
-> Index Scan using pk_affiliate on affiliate (cost=0.28..0.86 rows=1 width=21) (actual time=0.005..0.005 rows=1 loops=23)
Index Cond: (affiliate_id = users.affiliate_id)
Buffers: shared hit=69
-> Index Only Scan using pk_acl_allowed_user_realm_category on acl_allowed_user_realm_category (cost=0.28..0.59 rows=1 width=9) (actual time=0.008..0.008 rows=1 loops=23)
Index Cond: ((user_id = '1468137'::bigint) AND (user_realm_category_id = (user_realm.user_realm_category_id)::text) AND (permission = 'READ'::text))
Heap Fetches: 0
Buffers: shared hit=47
-> Index Scan using pk_user_statistics on user_statistics (cost=0.43..0.51 rows=1 width=16) (actual time=0.030..0.030 rows=1 loops=23)
Index Cond: (user_id = user_profile.user_id)
Buffers: shared hit=92 dirtied=1
-> Index Scan using pk_vip_level on vip_level (cost=0.14..0.22 rows=1 width=20) (actual time=0.005..0.005 rows=1 loops=23)
Index Cond: (vip_level_id = users.vip_level_id)
Buffers: shared hit=46
-> Index Scan using pk_vip_schema on vip_schema (cost=0.13..0.63 rows=1 width=22) (actual time=0.005..0.005 rows=1 loops=23)
Index Cond: ((vip_schema_id)::text = (vip_level.vip_schema_id)::text)
Buffers: shared hit=46
-> Index Scan using pk_user_realm_category on user_realm_category (cost=0.13..0.18 rows=1 width=1032) (actual time=0.005..0.005 rows=1 loops=23)
Index Cond: ((user_realm_category_id)::text = (user_realm.user_realm_category_id)::text)
Buffers: shared hit=46
-> Index Scan using local_balance_account_unq01 on local_balance_account (cost=0.29..6.45 rows=1 width=15) (actual time=0.020..0.020 rows=1 loops=23)
Index Cond: ((user_id = users.user_id) AND ((balance_category)::text = 'CASH'::text))
Filter: ((currency)::text = (users.real_base_currency)::text)
Buffers: shared hit=67
Planning:
Buffers: shared hit=859
Planning Time: 50.833 ms
Execution Time: 5.480 ms
(85 rows)
On Fri, 9 Sept 2022 at 22:06, Maxim Boguk <maxim.boguk@gmail.com> wrote:
On Thu, Sep 8, 2022 at 1:18 PM Matheus Martin <matheus.martin@voidbridge.com> wrote:We do have JIT enabled `jit=on` with `jit_above_cost=100000`.
I am sorry but I don't quite understand what role JIT plays in this situation with `auto_explain`. Could you please elaborate on that?In your log - time spent during the execution stage (where auto-exlain can help) but during the bind/planning stage (e.g. generation of plan).So you have a problem not with slow query execution, but with slow query planning, so autoexplain cannot help in that case.JIT is one possible explanation of planning stage slowdown.Can you run explain (analyze, costs, buffers, timing) of your query?And check how long the planning stage took and check if JIT was used or not (and how much time spent during JIT if it had been used).--Maxim Boguk
Senior Postgresql DBA
https://dataegret.com/
Phone UA: +380 99 143 0000
Phone AU: +61 45 218 5678
On Mon, Sep 12, 2022 at 05:34:37PM +0100, Matheus Martin wrote: > Understood. I have run a prepared statement with the query in question > through `psql` and JIT was not used (see plan below), however please note > that the long response times were never reproducible from `psql`, they only > happen from our JDBC application. Do you have pg_stat_statements enabled? If yes you could check what are the maximum execution and planning time for that query, which may give an answer.