BUG #16280: dead tuples (probably) effect plan and query performance - Mailing list pgsql-bugs

From PG Bug reporting form
Subject BUG #16280: dead tuples (probably) effect plan and query performance
Date
Msg-id 16280-e393fbb744eae7aa@postgresql.org
Whole thread Raw
Responses Re: BUG #16280: dead tuples (probably) effect plan and query performance  (Alexey Ermakov <alexey.ermakov@dataegret.com>)
Re: BUG #16280: dead tuples (probably) effect plan and query performance  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs
The following bug has been logged on the website:

Bug reference:      16280
Logged by:          Ilya Serbin
Email address:      iserbin@bostonsd.ru
PostgreSQL version: 11.6
Operating system:   Centos 7.4
Description:

Hello all,
Faced some strange plan changes with a query (query is quite bad, however I
can't understand why the plan changes).
Context:

db1=> \d+ table1
                                                            Table
"db1.table1"
   Column   |            Type             | Collation | Nullable |
    Default               | Storage  | Stats target | Description

------------+-----------------------------+-----------+----------+-------------------------------------+----------+--------------+-------------
 id         | bigint                      |           | not null |
nextval('table1_id_seq'::regclass)  | plain    |              |
 table1_id  | character varying(255)      |           | not null |
                          | extended |              |
 content    | jsonb                       |           |          |
                          | extended |              |
 created_at | timestamp without time zone |           | not null | now()
                          | plain    |              |
 updated_at | timestamp without time zone |           | not null | now()
                          | plain    |              |
 queued_at  | timestamp with time zone    |           |          |
                          | plain    |              |
Indexes:
    "table1_pkey" PRIMARY KEY, btree (id)
    "uk_table1_id" UNIQUE CONSTRAINT, btree (table1_id)
    "content_idx" gin (content jsonb_path_ops)
Referenced by:
    TABLE "collection_table1s" CONSTRAINT "fk_collection_table1s_table1_id"
FOREIGN KEY (table1_id) REFERENCES table1(id)
    TABLE "db1_table1s" CONSTRAINT "fk_db1_table1s_table1_id" FOREIGN KEY
(table1s_id) REFERENCES table1(id)
    TABLE "table1_sort" CONSTRAINT "fk_table1_sort_table1_id" FOREIGN KEY
(table1_id) REFERENCES table1(table1_id)
              name              |  setting
--------------------------------+-----------
 constraint_exclusion           | partition
 cpu_index_tuple_cost           | 0.005
 cpu_operator_cost              | 0.0025
 cpu_tuple_cost                 | 0.01
 default_statistics_target      | 200
 effective_cache_size           | 1572864
 enable_bitmapscan              | on
 enable_hashagg                 | on
 enable_hashjoin                | on
 enable_indexonlyscan           | on
 enable_indexscan               | on
 enable_material                | on
 enable_mergejoin               | on
 enable_nestloop                | on
 enable_parallel_append         | on
 enable_parallel_hash           | on
 enable_partition_pruning       | on
 enable_partitionwise_aggregate | off
 enable_partitionwise_join      | off
 enable_seqscan                 | on
 enable_sort                    | on
 enable_tidscan                 | on
 random_page_cost               | 1.1
 seq_page_cost                  | 1
 shared_buffers                 | 524288
(25 rows)


First plan: Good plan and problematic query:
https://explain.tensor.ru/archive/explain/1cf3c0181a9574bf2fd06d6fd07dc201:0:2020-02-27#context
Second plan: Same query and it's bad plan:
https://explain.tensor.ru/archive/explain/3770d39f786135e38bebeb1a8a4b1da9:0:2020-02-27#context

Issue description:
After some minimal activity in database first plan changes to the second.
Analyze on table1 do not help (tried with various default_statistics_target
values).
content_idx index recreation helps for some time, but several minutes later
plan degrades back to second one.
The only thing helped (surprisingly) is vacuum. It also helps for some time,
but once number of dead tuples reaches something  like 300-500 - plan
changes back to second one.

Table details with bad plan:

db1=> select * from pg_stat_user_tables where relname='table1';
-[ RECORD 1 ]-------+------------------------------
relid               | 74813598
schemaname          | db1
relname             | table1
seq_scan            | 1167
seq_tup_read        | 315158718
idx_scan            | 23116536
idx_tup_fetch       | 42353778
n_tup_ins           | 409288
n_tup_upd           | 313114963
n_tup_del           | 0
n_tup_hot_upd       | 4683
n_live_tup          | 409288
n_dead_tup          | 379
n_mod_since_analyze | 1169
last_vacuum         | 2020-02-12 08:58:06.147247+01
last_autovacuum     | 2020-02-20 12:29:31.427082+01
last_analyze        |
last_autoanalyze    | 2020-02-20 10:12:11.494305+01
vacuum_count        | 2
autovacuum_count    | 702
analyze_count       | 0
autoanalyze_count   | 20

Table details with good plan after vacuum:
-[ RECORD 1 ]-------+------------------------------
relid               | 74813598
schemaname          | db1
relname             | table1
seq_scan            | 1167
seq_tup_read        | 315158718
idx_scan            | 23116662
idx_tup_fetch       | 42360694
n_tup_ins           | 409288
n_tup_upd           | 313114963
n_tup_del           | 0
n_tup_hot_upd       | 4683
n_live_tup          | 409288
n_dead_tup          | 0
n_mod_since_analyze | 0
last_vacuum         | 2020-02-26 08:10:49.884625+01
last_autovacuum     | 2020-02-20 12:29:31.427082+01
last_analyze        | 2020-02-26 08:13:07.253307+01
last_autoanalyze    | 2020-02-20 10:12:11.494305+01
vacuum_count        | 3
autovacuum_count    | 702
analyze_count       | 1
autoanalyze_count   | 20

I can confirm that issue exists on 11.4 and 11.6 (updated to 11.6 since I
thought it was a bu that may have been fixed in 11.5 "Fix possible failure
of planner's index endpoint probes (Tom Lane)"), it is always reproducible,
and can be reproducted on prod and all lower environments.
I can't understand why it happens. As far as I understand there is something
to do with visibility map, but as per documention it should effect
index-only scans, not my case.
Main two questions are:
1) Is it a bug?
2) If it is expected behaviour - can someone please explain why it happens
and if there is any way  to keep the good plan (without installing
extensions to force pin plans)

Best regards, 
Ilya


pgsql-bugs by date:

Previous
From: Amit Langote
Date:
Subject: Re: BUG #16276: Server crash on an invalid attempt to attach apartition to an index
Next
From: Michael Paquier
Date:
Subject: Re: BUG #16276: Server crash on an invalid attempt to attach apartition to an index