BUG #16280: dead tuples (probably) effect plan and query performance - Mailing list pgsql-bugs
From | PG Bug reporting form |
---|---|
Subject | BUG #16280: dead tuples (probably) effect plan and query performance |
Date | |
Msg-id | 16280-e393fbb744eae7aa@postgresql.org Whole thread Raw |
Responses |
Re: BUG #16280: dead tuples (probably) effect plan and query performance
(Alexey Ermakov <alexey.ermakov@dataegret.com>)
Re: BUG #16280: dead tuples (probably) effect plan and query performance (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-bugs |
The following bug has been logged on the website: Bug reference: 16280 Logged by: Ilya Serbin Email address: iserbin@bostonsd.ru PostgreSQL version: 11.6 Operating system: Centos 7.4 Description: Hello all, Faced some strange plan changes with a query (query is quite bad, however I can't understand why the plan changes). Context: db1=> \d+ table1 Table "db1.table1" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description ------------+-----------------------------+-----------+----------+-------------------------------------+----------+--------------+------------- id | bigint | | not null | nextval('table1_id_seq'::regclass) | plain | | table1_id | character varying(255) | | not null | | extended | | content | jsonb | | | | extended | | created_at | timestamp without time zone | | not null | now() | plain | | updated_at | timestamp without time zone | | not null | now() | plain | | queued_at | timestamp with time zone | | | | plain | | Indexes: "table1_pkey" PRIMARY KEY, btree (id) "uk_table1_id" UNIQUE CONSTRAINT, btree (table1_id) "content_idx" gin (content jsonb_path_ops) Referenced by: TABLE "collection_table1s" CONSTRAINT "fk_collection_table1s_table1_id" FOREIGN KEY (table1_id) REFERENCES table1(id) TABLE "db1_table1s" CONSTRAINT "fk_db1_table1s_table1_id" FOREIGN KEY (table1s_id) REFERENCES table1(id) TABLE "table1_sort" CONSTRAINT "fk_table1_sort_table1_id" FOREIGN KEY (table1_id) REFERENCES table1(table1_id) name | setting --------------------------------+----------- constraint_exclusion | partition cpu_index_tuple_cost | 0.005 cpu_operator_cost | 0.0025 cpu_tuple_cost | 0.01 default_statistics_target | 200 effective_cache_size | 1572864 enable_bitmapscan | on enable_hashagg | on enable_hashjoin | on enable_indexonlyscan | on enable_indexscan | on enable_material | on enable_mergejoin | on enable_nestloop | on enable_parallel_append | on enable_parallel_hash | on enable_partition_pruning | on enable_partitionwise_aggregate | off enable_partitionwise_join | off enable_seqscan | on enable_sort | on enable_tidscan | on random_page_cost | 1.1 seq_page_cost | 1 shared_buffers | 524288 (25 rows) First plan: Good plan and problematic query: https://explain.tensor.ru/archive/explain/1cf3c0181a9574bf2fd06d6fd07dc201:0:2020-02-27#context Second plan: Same query and it's bad plan: https://explain.tensor.ru/archive/explain/3770d39f786135e38bebeb1a8a4b1da9:0:2020-02-27#context Issue description: After some minimal activity in database first plan changes to the second. Analyze on table1 do not help (tried with various default_statistics_target values). content_idx index recreation helps for some time, but several minutes later plan degrades back to second one. The only thing helped (surprisingly) is vacuum. It also helps for some time, but once number of dead tuples reaches something like 300-500 - plan changes back to second one. Table details with bad plan: db1=> select * from pg_stat_user_tables where relname='table1'; -[ RECORD 1 ]-------+------------------------------ relid | 74813598 schemaname | db1 relname | table1 seq_scan | 1167 seq_tup_read | 315158718 idx_scan | 23116536 idx_tup_fetch | 42353778 n_tup_ins | 409288 n_tup_upd | 313114963 n_tup_del | 0 n_tup_hot_upd | 4683 n_live_tup | 409288 n_dead_tup | 379 n_mod_since_analyze | 1169 last_vacuum | 2020-02-12 08:58:06.147247+01 last_autovacuum | 2020-02-20 12:29:31.427082+01 last_analyze | last_autoanalyze | 2020-02-20 10:12:11.494305+01 vacuum_count | 2 autovacuum_count | 702 analyze_count | 0 autoanalyze_count | 20 Table details with good plan after vacuum: -[ RECORD 1 ]-------+------------------------------ relid | 74813598 schemaname | db1 relname | table1 seq_scan | 1167 seq_tup_read | 315158718 idx_scan | 23116662 idx_tup_fetch | 42360694 n_tup_ins | 409288 n_tup_upd | 313114963 n_tup_del | 0 n_tup_hot_upd | 4683 n_live_tup | 409288 n_dead_tup | 0 n_mod_since_analyze | 0 last_vacuum | 2020-02-26 08:10:49.884625+01 last_autovacuum | 2020-02-20 12:29:31.427082+01 last_analyze | 2020-02-26 08:13:07.253307+01 last_autoanalyze | 2020-02-20 10:12:11.494305+01 vacuum_count | 3 autovacuum_count | 702 analyze_count | 1 autoanalyze_count | 20 I can confirm that issue exists on 11.4 and 11.6 (updated to 11.6 since I thought it was a bu that may have been fixed in 11.5 "Fix possible failure of planner's index endpoint probes (Tom Lane)"), it is always reproducible, and can be reproducted on prod and all lower environments. I can't understand why it happens. As far as I understand there is something to do with visibility map, but as per documention it should effect index-only scans, not my case. Main two questions are: 1) Is it a bug? 2) If it is expected behaviour - can someone please explain why it happens and if there is any way to keep the good plan (without installing extensions to force pin plans) Best regards, Ilya
pgsql-bugs by date: