Query with straightforward plan changes and becomes 6520 times slower after VACUUM ANALYZE - Mailing list pgsql-bugs

From Jan Kort
Subject Query with straightforward plan changes and becomes 6520 times slower after VACUUM ANALYZE
Date
Msg-id AM0PR0502MB3620D231418BE0B7B8A9E3E09B2C9@AM0PR0502MB3620.eurprd05.prod.outlook.com
Whole thread Raw
Responses Re: Query with straightforward plan changes and becomes 6520 times slower after VACUUM ANALYZE  (Pavel Stehule <pavel.stehule@gmail.com>)
Re: Query with straightforward plan changes and becomes 6520 times slower after VACUUM ANALYZE  (David Rowley <dgrowleyml@gmail.com>)
List pgsql-bugs

Hi,

 

I tried this on both PostgreSQL 12.7 and 13.3, I tried both VACUUM ANALYZE and VACUUM FULL ANALYZE, they both seem to change a straightforward plan to a reversed and complex plan that make the query slow. The only thing that works is to do a pg_dump, pg_restore, but that is not feasible.

 

It seems to depend on specific data, I tried generating other variations of around 1M records, but so far the attached DB (also generated) is the only one where I could reproduce it, I can’t figure out what is so special about this data.

 

Given the following record counts:

 

GFO_ZAKEN: 1048587

GFO_ZAKEN_KOSTEN: 3

GFO_ZAKEN_TYPECODE: 4

GFO_ZAKEN.ZAAKTYPECODE_ID has a value 9 times and the rest is null

 

When I run this query:

 

EXPLAIN ANALYZE

SELECT GFO_ZAKEN_KOSTEN.ID AS GFO_ZAKEN_KOSTEN_ID,

       GFO_ZAKEN.ID AS GFO_ZAKEN_ID,

       GFO_ZAKEN_TYPECODE.OMSCHRIJVING AS ZAAKTYPECODE_ID

  FROM TRIAL.GFO_ZAKEN_KOSTEN

  JOIN TRIAL.GFO_ZAKEN ON GFO_ZAKEN_KOSTEN.GFO_ZAKEN_ID = GFO_ZAKEN.ID

  JOIN TRIAL.GFO_ZAKEN_TYPECODE ON GFO_ZAKEN.ZAAKTYPECODE_ID = GFO_ZAKEN_TYPECODE.ID

 

It is taking 500ms or so, which I associate with a full table scan, but they are just simple referential constraints and corresponding indexes:

 

CREATE TABLE TRIAL.GFO_ZAKEN_TYPECODE (ID INTEGER PRIMARY KEY, OMSCHRIJVING CHARACTER VARYING(4000));

CREATE TABLE TRIAL.GFO_ZAKEN (ID INTEGER PRIMARY KEY, ZAAKTYPECODE_ID INTEGER,

  CONSTRAINT ZAAKTYPECODE_IDC1 FOREIGN KEY (ZAAKTYPECODE_ID) REFERENCES TRIAL.GFO_ZAKEN_TYPECODE (ID));

CREATE INDEX GFO_ZAKENO18 ON TRIAL.GFO_ZAKEN USING BTREE (ZAAKTYPECODE_ID);

CREATE TABLE TRIAL.GFO_ZAKEN_KOSTEN (ID INTEGER PRIMARY KEY, GFO_ZAKEN_ID INTEGER,

  CONSTRAINT GFO_ZAKEN_IDC1 FOREIGN KEY (GFO_ZAKEN_ID) REFERENCES TRIAL.GFO_ZAKEN (ID));

CREATE INDEX GFO_ZAKEN_KOSTENO14 ON TRIAL.GFO_ZAKEN_KOSTEN USING BTREE (GFO_ZAKEN_ID ASC NULLS LAST);

 

After pg_restore it gives a straightforward plan, starting with the gfo_zaken_kosten primary key and continuing with the join on gfo_zaken_kosten.gfo_zaken_id:

 

Nested Loop  (cost=0.56..3.64 rows=1 width=524) (actual time=0.036..0.037 rows=1 loops=1)

  ->  Nested Loop  (cost=0.43..3.48 rows=1 width=12) (actual time=0.030..0.030 rows=1 loops=1)

        ->  Seq Scan on gfo_zaken_kosten  (cost=0.00..1.04 rows=1 width=8) (actual time=0.016..0.016 rows=1 loops=1)

              Filter: (id = 13)

              Rows Removed by Filter: 2

        ->  Index Scan using gfo_zakenp on gfo_zaken  (cost=0.43..2.45 rows=1 width=8) (actual time=0.012..0.012 rows=1 loops=1)

              Index Cond: (id = gfo_zaken_kosten.gfo_zaken_id)

  ->  Index Scan using gfo_zaken_typecodep on gfo_zaken_typecode  (cost=0.13..0.15 rows=1 width=520) (actual time=0.005..0.005 rows=1 loops=1)

        Index Cond: (id = gfo_zaken.zaaktypecode_id)

Planning Time: 1.538 ms

Execution Time: 0.095 ms

 

After VACUUM ANALYZE the plan becomes inefficient again, and does not start with the gfo_zaken_kosten primary key, the plan starts at the wrong end with an index scan on 1M rows:

 

Merge Join  (cost=1.48..1.59 rows=1 width=159) (actual time=619.374..619.376 rows=1 loops=1)

  Merge Cond: (gfo_zaken.id = gfo_zaken_kosten.gfo_zaken_id)

  ->  Nested Loop  (cost=0.43..96503.47 rows=1048587 width=155) (actual time=0.022..619.359 rows=9 loops=1)

        Join Filter: (gfo_zaken.zaaktypecode_id = gfo_zaken_typecode.id)

        Rows Removed by Join Filter: 4194316

        ->  Index Scan using gfo_zakenp on gfo_zaken  (cost=0.43..33587.23 rows=1048587 width=8) (actual time=0.006..141.167 rows=1048587 loops=1)

        ->  Materialize  (cost=0.00..1.06 rows=4 width=155) (actual time=0.000..0.000 rows=4 loops=1048587)

              ->  Seq Scan on gfo_zaken_typecode  (cost=0.00..1.04 rows=4 width=155) (actual time=0.011..0.012 rows=4 loops=1)

  ->  Sort  (cost=1.05..1.05 rows=1 width=8) (actual time=0.012..0.012 rows=1 loops=1)

        Sort Key: gfo_zaken_kosten.gfo_zaken_id

        Sort Method: quicksort  Memory: 25kB

        ->  Seq Scan on gfo_zaken_kosten  (cost=0.00..1.04 rows=1 width=8) (actual time=0.008..0.008 rows=1 loops=1)

              Filter: (id = 13)

              Rows Removed by Filter: 2

Planning Time: 69.151 ms

Execution Time: 619.410 ms

 

So 6520 times slower after vacuum analyze then before.

 

Doing VACUUM ANALYZE again doesn’t make it better, only pg_dump + pg_restore will go back to the original plan, but then it will break again on the first VACUUM ANALYZE.

 

I attached a 800K test DB with all sensitive data removed.

 

I tried both the default config without changes, and the default config with our settings appended:

 

max_connections = 1000

shared_buffers = 512MB

effective_cache_size = 6GB

work_mem = 10485kB

maintenance_work_mem = 512MB

min_wal_size = 1GB

max_wal_size = 2GB

checkpoint_completion_target = 0.7

wal_buffers = 16MB

default_statistics_target = 100

random_page_cost = 1

wal_sync_method = open_datasync

fsync = on

synchronous_commit = off

 

Doing a VACUUM ANALYZE shouldn’t change a straightforward plan.

 

Regards,

 

Jan Kort

 

 

Attachment

pgsql-bugs by date:

Previous
From: PG Bug reporting form
Date:
Subject: BUG #17016: Cannot sync pgdg-common repo with reposync due to failed signature check
Next
From: PG Bug reporting form
Date:
Subject: BUG #17017: Two versions of the same row of records are returned in one query