Thread: BUG #17198: Planning time too high when execute query on standby cluster
BUG #17198: Planning time too high when execute query on standby cluster
The following bug has been logged on the website: Bug reference: 17198 Logged by: Andriy Bartash Email address: abartash@xmatters.com PostgreSQL version: 12.8 Operating system: CentOS7 Description: We have physical standby and noticed that some queries have Planning time very high (2-10 sec). Same queries on Primary cluster have Planning time 4-6 msec Tried to run ANALYZE <table>, but it didn't help. Also noticed that RC (?) was bloated objects (table either index) Query example: EXPLAIN (ANALYZE, buffers, VERBOSE, settings) SELECT cf.name AS field_name,cv.value AS field_value, cv.position AS POSITION FROM evs e JOIN comm c ON c.comm_ev_id = e.ev_id JOIN comm_flds cf ON cf.comm_id = c.comm_id JOIN comm_values cv ON cv.comm_fld_id = cf.comm_fld_id WHERE e.ev_id = 296115655 AND e.org_uuid = '9179fd37-1322-4957-8a99-7e2056388b4f'; Generates Execution plan: Nested Loop (cost=1.70..186.69 rows=10 width=373) (actual time=0.046..0.047 rows=0 loops=1) Output: cf.name, cv.value, cv."position" Buffers: shared hit=5 -> Nested Loop (cost=1.14..180.62 rows=8 width=23) (actual time=0.046..0.047 rows=0 loops=1) Output: cf.name, cf.comm_fld_id Buffers: shared hit=5 -> Nested Loop (cost=0.57..6.62 rows=1 width=8) (actual time=0.046..0.046 rows=0 loops=1) Output: c.comm_id Buffers: shared hit=5 -> Index Scan using evs_ev_id_message_panel_id_idx on public.evs e (cost=0.29..3.31 rows=1 width=8) (actual time=0.045..0.045 rows=0 loops=1) Output: e.ev_id, e.node_id, e.incident_id, e.enhanced_message_id, e.sender, e.scr_vpkg_id, e.dmn_name, e.status, e.ev_dmn_id, e.company_id, e.company_name, e.creation_date, e.evs_type, e.voice_msg_id, e.last_action_user, e.when_created, e.conference _id, e.priority, e.message_panel_id, e.termination_date, e.message_panel_name, e.application_name, e.presentation_id, e.system_message_type, e.pass_response_count, e.application_id, e.uuid, e.request_id, e.integration_uuid, e.revision_id, e.suppressed_ntfns, e.org_uuid, e.updated_by_uuid, e.flow_trace, e.flow_block_id Index Cond: (e.ev_id = 296115655) Filter: (e.org_uuid = '9179fd37-1322-4957-8a99-7e2056388b4f'::uuid) Buffers: shared hit=5 -> Index Scan using comm_comm_ev_id_uidx on public.comm c (cost=0.29..3.31 rows=1 width=16) (never executed) Output: c.comm_id, c.comm_type, c.comm_ev_id, c.comm_ev_tml_id, c.comm_ntfn_id, c.dvc_type, c.case_sensitive, c.when_created, c.org_uuid, c.updated_by_uuid Index Cond: (c.comm_ev_id = 296115655) -> Index Scan using idx_comm_flds1 on public.comm_flds cf (cost=0.56..172.60 rows=140 width=31) (never executed) Output: cf.comm_fld_id, cf.name, cf.comm_id, cf.when_created, cf.org_uuid, cf.updated_by_uuid Index Cond: (cf.comm_id = c.comm_id) -> Index Scan using idx_comm_values1 on public.comm_values cv (cost=0.56..0.74 rows=2 width=366) (never executed) Output: cv.comm_fld_id, cv.value, cv."position", cv.when_created, cv.org_uuid, cv.updated_by_uuid Index Cond: (cv.comm_fld_id = cf.comm_fld_id) Settings: effective_cache_size = '96GB', max_parallel_workers = '15', max_parallel_workers_per_gather = '4', random_page_cost = '1.5', temp_buffers = '16MB', work_mem = '128MB' Planning Time: 2327.769 ms Execution Time: 0.294 ms Last analyzed: SELECT now(),schemaname, relname, last_analyze, last_autoanalyze FROM pg_stat_all_tables WHERE relname = 'comm_values'; now | schemaname | relname | last_analyze | last_autoanalyze -------------------------------+------------+-------------+-------------------------------+------------------------------- 2021-09-20 19:29:58 | public | comm_values | 2021-09-04 22:22:40 | 2021-09-19:11:22:03 Rebuilt an index (I knew it was used by query and was bloated) and Planning time back to normal (2327.769 ms vs 59.643 ms) . Yes, I had run many times same query before I rebuilt the index just to make sure there was no cached plans. Index size before: SELECT pg_size_pretty(pg_table_size('idx_comm_values1')); pg_size_pretty ---------------- 1546 MB Index size after SELECT pg_size_pretty(pg_table_size('idx_comm_values1')); pg_size_pretty ---------------- 1052 MB Execution plan after I rebuilt the index Nested Loop (cost=1.70..186.67 rows=10 width=373) (actual time=0.014..0.015 rows=0 loops=1) Output: cf.name, cv.value, cv."position" Buffers: shared hit=2 -> Nested Loop (cost=1.14..180.62 rows=8 width=23) (actual time=0.014..0.014 rows=0 loops=1) Output: cf.name, cf.comm_fld_id Buffers: shared hit=2 -> Nested Loop (cost=0.57..6.62 rows=1 width=8) (actual time=0.014..0.014 rows=0 loops=1) Output: c.comm_id Buffers: shared hit=2 -> Index Scan using evs_ev_id_message_panel_id_idx on public.evs e (cost=0.29..3.31 rows=1 width=8) (actual time=0.013..0.013 rows=0 loops=1) Output: e.ev_id, e.node_id, e.incident_id, e.enhanced_message_id, e.sender, e.scr_vpkg_id, e.dmn_name, e.status, e.ev_dmn_id, e.company_id, e.company_name, e.creation_date, e.evs_type, e.voice_msg_id, e.last_action_user, e.when_created, e.conference _id, e.priority, e.message_panel_id, e.termination_date, e.message_panel_name, e.application_name, e.presentation_id, e.system_message_type, e.pass_response_count, e.application_id, e.uuid, e.request_id, e.integration_uuid, e.revision_id, e.suppressed_ntfns, e.org_uuid, e.updated_by_uuid, e.flow_trace, e.flow_block_id Index Cond: (e.ev_id = 296115655) Filter: (e.org_uuid = '9179fd37-1322-4957-8a99-7e2056388b4f'::uuid) Buffers: shared hit=2 -> Index Scan using comm_comm_ev_id_uidx on public.comm c (cost=0.29..3.31 rows=1 width=16) (never executed) Output: c.comm_id, c.comm_type, c.comm_ev_id, c.comm_ev_tml_id, c.comm_ntfn_id, c.dvc_type, c.case_sensitive, c.when_created, c.org_uuid, c.updated_by_uuid Index Cond: (c.comm_ev_id = 296115655) -> Index Scan using idx_comm_flds1 on public.comm_flds cf (cost=0.56..172.60 rows=140 width=31) (never executed) Output: cf.comm_fld_id, cf.name, cf.comm_id, cf.when_created, cf.org_uuid, cf.updated_by_uuid Index Cond: (cf.comm_id = c.comm_id) -> Index Scan using idx_comm_values1 on public.comm_values cv (cost=0.56..0.74 rows=2 width=366) (never executed) Output: cv.comm_fld_id, cv.value, cv."position", cv.when_created, cv.org_uuid, cv.updated_by_uuid Index Cond: (cv.comm_fld_id = cf.comm_fld_id) Settings: effective_cache_size = '96GB', max_parallel_workers = '15', max_parallel_workers_per_gather = '4', random_page_cost = '1.5', temp_buffers = '16MB', work_mem = '128MB' Planning Time: 59.643 ms Execution Time: 0.095 ms I would appreciate any piece of advice. PS Also tested the same scenario on PG12.7 - reproducible.
Re: BUG #17198: Planning time too high when execute query on standby cluster
On Tue, Sep 21, 2021 at 8:42 AM PG Bug reporting form <noreply@postgresql.org> wrote: > > The following bug has been logged on the website: > > Bug reference: 17198 > Logged by: Andriy Bartash > Email address: abartash@xmatters.com > PostgreSQL version: 12.8 > Operating system: CentOS7 > Description: > > We have physical standby and noticed that some queries have Planning time > very high (2-10 sec). Same queries on Primary cluster have Planning time 4-6 > msec > Tried to run ANALYZE <table>, but it didn't help. > Also noticed that RC (?) was bloated objects (table either index) Do you have wal_log_hints or checksums enabled on your system? If not, try running a VACUUM on the primary (not analyze), and see if that helps. Situations like this can happen due to hint bits not being updated on the standby side, and not being included in the replication. -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Re: BUG #17198: Planning time too high when execute query on standby cluster
Thank you Magnus
Yes, we have checksum enabled (means that wal_log_hints is always on).
Currently, we are running reindexing or calling pg_repack instead.
I will try more aggressive vacuum setup to prevent this issue. At least now we know the root cause – “hint bits not being updated on
the standby side, and not being included in the replication”
Wondering, if there is a Postgres config parameter what might help us?
Thank you in advance.
From: Magnus Hagander <magnus@hagander.net>
Date: Monday, September 20, 2021 at 11:55 PM
To: abartash@xmatters.com <abartash@xmatters.com>, PostgreSQL mailing lists <pgsql-bugs@lists.postgresql.org>
Subject: Re: BUG #17198: Planning time too high when execute query on standby cluster
[everbridge logo] CAUTION:
This email originated outside the organization. Do not click links or open attachments if you do not recognize the sender and know the content is safe.
On Tue, Sep 21, 2021 at 8:42 AM PG Bug reporting form
<noreply@postgresql.org> wrote:
>
> The following bug has been logged on the website:
>
> Bug reference: 17198
> Logged by: Andriy Bartash
> Email address: abartash@xmatters.com
> PostgreSQL version: 12.8
> Operating system: CentOS7
> Description:
>
> We have physical standby and noticed that some queries have Planning time
> very high (2-10 sec). Same queries on Primary cluster have Planning time 4-6
> msec
> Tried to run ANALYZE <table>, but it didn't help.
> Also noticed that RC (?) was bloated objects (table either index)
Do you have wal_log_hints or checksums enabled on your system? If not,
try running a VACUUM on the primary (not analyze), and see if that
helps.
Situations like this can happen due to hint bits not being updated on
the standby side, and not being included in the replication.
--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/
The content of this email is confidential and intended for the recipient specified in message only. It is strictly forbidden to share any part of this message with any third party, without a written consent of the sender. If you received this message in error, please reply to this message and follow with its deletion, so that we can ensure such an error does not occur in the future.