Thread: BUG #17198: Planning time too high when execute query on standby cluster

BUG #17198: Planning time too high when execute query on standby cluster

From

PG Bug reporting form

Date:

21 September 2021, 00:18:07

The following bug has been logged on the website:

Bug reference:      17198
Logged by:          Andriy Bartash
Email address:      abartash@xmatters.com
PostgreSQL version: 12.8
Operating system:   CentOS7
Description:

We have physical standby and noticed that some queries have Planning time
very high (2-10 sec). Same queries on Primary cluster have Planning time 4-6
msec
Tried to run ANALYZE <table>, but it didn't help.
Also noticed that RC (?) was bloated objects (table either index)
Query example:
EXPLAIN (ANALYZE, buffers, VERBOSE, settings)         
SELECT  cf.name AS field_name,cv.value AS field_value,
        cv.position AS POSITION
FROM evs e
JOIN comm c ON c.comm_ev_id = e.ev_id
JOIN comm_flds cf ON cf.comm_id = c.comm_id
JOIN comm_values cv ON cv.comm_fld_id = cf.comm_fld_id
WHERE e.ev_id = 296115655
AND   e.org_uuid = '9179fd37-1322-4957-8a99-7e2056388b4f';
Generates Execution plan:
Nested Loop  (cost=1.70..186.69 rows=10 width=373) (actual time=0.046..0.047
rows=0 loops=1)
   Output: cf.name, cv.value, cv."position"
   Buffers: shared hit=5
   ->  Nested Loop  (cost=1.14..180.62 rows=8 width=23) (actual
time=0.046..0.047 rows=0 loops=1)
         Output: cf.name, cf.comm_fld_id
         Buffers: shared hit=5
         ->  Nested Loop  (cost=0.57..6.62 rows=1 width=8) (actual
time=0.046..0.046 rows=0 loops=1)
               Output: c.comm_id
               Buffers: shared hit=5
               ->  Index Scan using evs_ev_id_message_panel_id_idx on
public.evs e  (cost=0.29..3.31 rows=1 width=8) (actual time=0.045..0.045
rows=0 loops=1)
                     Output: e.ev_id, e.node_id, e.incident_id,
e.enhanced_message_id, e.sender, e.scr_vpkg_id, e.dmn_name, e.status,
e.ev_dmn_id, e.company_id, e.company_name, e.creation_date, e.evs_type,
e.voice_msg_id, e.last_action_user, e.when_created, e.conference
_id, e.priority, e.message_panel_id, e.termination_date,
e.message_panel_name, e.application_name, e.presentation_id,
e.system_message_type, e.pass_response_count, e.application_id, e.uuid,
e.request_id, e.integration_uuid, e.revision_id, e.suppressed_ntfns,
e.org_uuid,
 e.updated_by_uuid, e.flow_trace, e.flow_block_id
                     Index Cond: (e.ev_id = 296115655)
                     Filter: (e.org_uuid =
'9179fd37-1322-4957-8a99-7e2056388b4f'::uuid)
                     Buffers: shared hit=5
               ->  Index Scan using comm_comm_ev_id_uidx on public.comm c
(cost=0.29..3.31 rows=1 width=16) (never executed)
                     Output: c.comm_id, c.comm_type, c.comm_ev_id,
c.comm_ev_tml_id, c.comm_ntfn_id, c.dvc_type, c.case_sensitive,
c.when_created, c.org_uuid, c.updated_by_uuid
                     Index Cond: (c.comm_ev_id = 296115655)
         ->  Index Scan using idx_comm_flds1 on public.comm_flds cf
(cost=0.56..172.60 rows=140 width=31) (never executed)
               Output: cf.comm_fld_id, cf.name, cf.comm_id, cf.when_created,
cf.org_uuid, cf.updated_by_uuid
               Index Cond: (cf.comm_id = c.comm_id)
   ->  Index Scan using idx_comm_values1 on public.comm_values cv
(cost=0.56..0.74 rows=2 width=366) (never executed)
         Output: cv.comm_fld_id, cv.value, cv."position", cv.when_created,
cv.org_uuid, cv.updated_by_uuid
         Index Cond: (cv.comm_fld_id = cf.comm_fld_id)
 Settings: effective_cache_size = '96GB', max_parallel_workers = '15',
max_parallel_workers_per_gather = '4', random_page_cost = '1.5',
temp_buffers = '16MB', work_mem = '128MB'
 Planning Time: 2327.769 ms
 Execution Time: 0.294 ms

Last analyzed:
SELECT now(),schemaname, relname, last_analyze, last_autoanalyze FROM
pg_stat_all_tables WHERE relname = 'comm_values';
              now              | schemaname |   relname   |
last_analyze          |       last_autoanalyze

-------------------------------+------------+-------------+-------------------------------+-------------------------------
 2021-09-20 19:29:58 | public     | comm_values | 2021-09-04 22:22:40 |
2021-09-19:11:22:03

Rebuilt an index (I knew it was used by query and was bloated)
and Planning time back to normal (2327.769 ms vs  59.643 ms)
. Yes, I had run many times same query before I rebuilt the index just to
make sure there was no cached plans. 
Index size before:
SELECT pg_size_pretty(pg_table_size('idx_comm_values1'));
 pg_size_pretty
----------------
 1546 MB

Index size after
SELECT pg_size_pretty(pg_table_size('idx_comm_values1'));
 pg_size_pretty
----------------
 1052 MB

Execution plan after I rebuilt the index

Nested Loop  (cost=1.70..186.67 rows=10 width=373) (actual time=0.014..0.015
rows=0 loops=1)
   Output: cf.name, cv.value, cv."position"
   Buffers: shared hit=2
   ->  Nested Loop  (cost=1.14..180.62 rows=8 width=23) (actual
time=0.014..0.014 rows=0 loops=1)
         Output: cf.name, cf.comm_fld_id
         Buffers: shared hit=2
         ->  Nested Loop  (cost=0.57..6.62 rows=1 width=8) (actual
time=0.014..0.014 rows=0 loops=1)
               Output: c.comm_id
               Buffers: shared hit=2
               ->  Index Scan using evs_ev_id_message_panel_id_idx on
public.evs e  (cost=0.29..3.31 rows=1 width=8) (actual time=0.013..0.013
rows=0 loops=1)
                     Output: e.ev_id, e.node_id, e.incident_id,
e.enhanced_message_id, e.sender, e.scr_vpkg_id, e.dmn_name, e.status,
e.ev_dmn_id, e.company_id, e.company_name, e.creation_date, e.evs_type,
e.voice_msg_id, e.last_action_user, e.when_created, e.conference
_id, e.priority, e.message_panel_id, e.termination_date,
e.message_panel_name, e.application_name, e.presentation_id,
e.system_message_type, e.pass_response_count, e.application_id, e.uuid,
e.request_id, e.integration_uuid, e.revision_id, e.suppressed_ntfns,
e.org_uuid,
 e.updated_by_uuid, e.flow_trace, e.flow_block_id
                     Index Cond: (e.ev_id = 296115655)
                     Filter: (e.org_uuid =
'9179fd37-1322-4957-8a99-7e2056388b4f'::uuid)
                     Buffers: shared hit=2
               ->  Index Scan using comm_comm_ev_id_uidx on public.comm c
(cost=0.29..3.31 rows=1 width=16) (never executed)
                     Output: c.comm_id, c.comm_type, c.comm_ev_id,
c.comm_ev_tml_id, c.comm_ntfn_id, c.dvc_type, c.case_sensitive,
c.when_created, c.org_uuid, c.updated_by_uuid
                     Index Cond: (c.comm_ev_id = 296115655)
         ->  Index Scan using idx_comm_flds1 on public.comm_flds cf
(cost=0.56..172.60 rows=140 width=31) (never executed)
               Output: cf.comm_fld_id, cf.name, cf.comm_id, cf.when_created,
cf.org_uuid, cf.updated_by_uuid
               Index Cond: (cf.comm_id = c.comm_id)
   ->  Index Scan using idx_comm_values1 on public.comm_values cv
(cost=0.56..0.74 rows=2 width=366) (never executed)
         Output: cv.comm_fld_id, cv.value, cv."position", cv.when_created,
cv.org_uuid, cv.updated_by_uuid
         Index Cond: (cv.comm_fld_id = cf.comm_fld_id)
 Settings: effective_cache_size = '96GB', max_parallel_workers = '15',
max_parallel_workers_per_gather = '4', random_page_cost = '1.5',
temp_buffers = '16MB', work_mem = '128MB'
 Planning Time: 59.643 ms
 Execution Time: 0.095 ms

I would appreciate any piece of advice.
PS Also tested the same scenario on PG12.7 - reproducible.

Re: BUG #17198: Planning time too high when execute query on standby cluster

From

Magnus Hagander

Date:

21 September 2021, 06:55:14

On Tue, Sep 21, 2021 at 8:42 AM PG Bug reporting form
<noreply@postgresql.org> wrote:
>
> The following bug has been logged on the website:
>
> Bug reference:      17198
> Logged by:          Andriy Bartash
> Email address:      abartash@xmatters.com
> PostgreSQL version: 12.8
> Operating system:   CentOS7
> Description:
>
> We have physical standby and noticed that some queries have Planning time
> very high (2-10 sec). Same queries on Primary cluster have Planning time 4-6
> msec
> Tried to run ANALYZE <table>, but it didn't help.
> Also noticed that RC (?) was bloated objects (table either index)

Do you have wal_log_hints or checksums enabled on your system? If not,
try running a VACUUM on the primary (not analyze), and see if that
helps.

Situations like this can happen due to hint bits not being updated on
the standby side, and not being included in the replication.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/

Re: BUG #17198: Planning time too high when execute query on standby cluster

From

Andriy Bartash

Date:

21 September 2021, 16:27:06

Thank you Magnus

Yes, we have checksum enabled (means that wal_log_hints is always on).

Currently, we are running reindexing or calling pg_repack instead.

I will try more aggressive vacuum setup to prevent this issue. At least now we know the root cause – “hint bits not being updated on
the standby side, and not being included in the replication”

Wondering, if there is a Postgres config parameter what might help us?

Thank you in advance.

From: Magnus Hagander <magnus@hagander.net>
Date: Monday, September 20, 2021 at 11:55 PM
To: abartash@xmatters.com <abartash@xmatters.com>, PostgreSQL mailing lists <pgsql-bugs@lists.postgresql.org>
Subject: Re: BUG #17198: Planning time too high when execute query on standby cluster

[everbridge logo]       CAUTION:
This email originated outside the organization. Do not click links or open attachments if you do not recognize the sender and know the content is safe.

On Tue, Sep 21, 2021 at 8:42 AM PG Bug reporting form
<noreply@postgresql.org> wrote:
>
> The following bug has been logged on the website:
>
> Bug reference:      17198
> Logged by:          Andriy Bartash
> Email address:      abartash@xmatters.com
> PostgreSQL version: 12.8
> Operating system:   CentOS7
> Description:
>
> We have physical standby and noticed that some queries have Planning time
> very high (2-10 sec). Same queries on Primary cluster have Planning time 4-6
> msec
> Tried to run ANALYZE <table>, but it didn't help.
> Also noticed that RC (?) was bloated objects (table either index)

Do you have wal_log_hints or checksums enabled on your system? If not,
try running a VACUUM on the primary (not analyze), and see if that
helps.

Situations like this can happen due to hint bits not being updated on
the standby side, and not being included in the replication.

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

The content of this email is confidential and intended for the recipient specified in message only. It is strictly forbidden to share any part of this message with any third party, without a written consent of the sender. If you received this message in error, please reply to this message and follow with its deletion, so that we can ensure such an error does not occur in the future.