SQL performance issue (postgresql chooses a bad plan when a better one is available) - Mailing list pgsql-performance

From Chris Stephens
Subject SQL performance issue (postgresql chooses a bad plan when a better one is available)
Date
Msg-id CAEFL0sxgEFF8oCtTWEvz0_fyo75V4bqcnZqH=rh7qCYoj7JtSw@mail.gmail.com
Whole thread Raw
Responses Re: SQL performance issue (postgresql chooses a bad plan when a better one is available)  (Laurenz Albe <laurenz.albe@cybertec.at>)
List pgsql-performance
AWS RDS v12

The following SQL takes ~25 seconds to run. I'm relatively new to postgres but the execution plan (https://explain.depesz.com/s/N4oR) looks like it's materializing the entire EXISTS subquery for each row returned by the rest of the query before probing for plate_384_id existence. postgres is choosing sequential scans on sample_plate_384 and test_result when suitable, efficient indexes exist. a re-written query produces a much better plan (https://explain.depesz.com/s/zXJ6). Executing the EXISTS portion of the query with an explicit PLATE_384_ID yields the execution plan we want as well (https://explain.depesz.com/s/3QAK). unnesting the EXISTS and adding a DISTINCT on the result also yields a better plan.

I've tried tried the following:

disable parallel
set join_collapse_limit=1 and played with order of EXISTS/NOT EXISTS
changed work_mem and enable_material to see if that had any effect
VACUUM FULL'd TEST_RESULT and SAMPLE_PLATE_384 
created a stats object on (sample_id, sample_plate_384_id) for both TEST_RESULT and SAMPLE_PLATE_384 to see if that would help (they increment fairly consistently with each other)

I'm out of ideas on how to convince postgres to choose a better plan. any and all help/suggestions/explanations would be greatly appreciated. the rewritten SQL performs sufficiently well but i'd like to understand why postgres is doing this and what to do about it so i can't tackle the next SQL performance issue with a little more knowledge.

SELECT count(*) AS "count" FROM "plate_384_scan"
WHERE NOT EXISTS (SELECT 1 FROM "plate_384_scan" AS "plate_384_scan_0" WHERE "plate_384_scan_0"."ts" > "plate_384_scan"."ts" AND "plate_384_scan_0"."plate_384_id" = "plate_384_scan"."plate_384_id")
  AND EXISTS (SELECT 1 FROM "sample_plate_384" INNER JOIN "test_result" USING ("sample_plate_384_id", "sample_id") WHERE "test_result" IS NULL AND "plate_384_scan_id" = "plate_384_scan"."plate_384_scan_id")
  AND NOT EXISTS (SELECT 1 FROM "plate_384_abandoned" WHERE "plate_384_id" = "plate_384_scan"."plate_384_id");

[limsdb_dev] # SELECT relname, relpages, reltuples, relallvisible, relkind, relnatts, relhassubclass, reloptions, pg_table_size(oid) FROM pg_class WHERE relname in ('sample_plate_384','test_result', 'plate_384_scan','plate_384_abandoned') order by 1;
       relname       | relpages | reltuples | relallvisible | relkind | relnatts | relhassubclass | reloptions | pg_table_size
---------------------+----------+-----------+---------------+---------+----------+----------------+------------+---------------
 plate_384_abandoned |        1 |        16 |             0 | r       |        4 | f              | (null)     |         16384
 plate_384_scan      |       13 |      1875 |             0 | r       |        5 | f              | (null)     |        131072
 sample_plate_384    |     3827 |    600701 |             0 | r       |        9 | f              | (null)     |      31350784
 test_result         |     4900 |    599388 |             0 | r       |        8 | f              | (null)     |      40140800
(4 rows)

Time: 44.405 ms
[limsdb_dev] # \d plate_384_abandoned
                        Table "lab_data.plate_384_abandoned"
    Column    |           Type           | Collation | Nullable |      Default
--------------+--------------------------+-----------+----------+-------------------
 plate_384_id | integer                  |           | not null |
 reason       | text                     |           | not null |
 tech_id      | integer                  |           |          |
 ts           | timestamp with time zone |           | not null | CURRENT_TIMESTAMP
Indexes:
    "plate_384_abandoned_pkey" PRIMARY KEY, btree (plate_384_id)
Foreign-key constraints:
    "plate_384_abandoned_plate_384_id_fkey" FOREIGN KEY (plate_384_id) REFERENCES plate_384(plate_384_id)
    "plate_384_abandoned_tech_id_fkey" FOREIGN KEY (tech_id) REFERENCES tech(tech_id)

[limsdb_dev] # \d plate_384_scan
                                                 Table "lab_data.plate_384_scan"
      Column       |           Type           | Collation | Nullable |                          Default
-------------------+--------------------------+-----------+----------+-----------------------------------------------------------
 plate_384_scan_id | integer                  |           | not null | nextval('plate_384_scan_plate_384_scan_id_seq'::regclass)
 plate_384_id      | integer                  |           | not null |
 equipment_id      | integer                  |           | not null |
 tech_id           | integer                  |           | not null |
 ts                | timestamp with time zone |           | not null | CURRENT_TIMESTAMP
Indexes:
    "pk_plate_384_scan" PRIMARY KEY, btree (plate_384_scan_id)
    "plate_384_scan_idx001" btree (ts, plate_384_scan_id)
    "plate_384_scan_idx002" btree (plate_384_id, ts)
Foreign-key constraints:
    "fk_plate_384_scan_equipment_id" FOREIGN KEY (equipment_id) REFERENCES equipment(equipment_id)
    "fk_plate_384_scan_plate_384_id" FOREIGN KEY (plate_384_id) REFERENCES plate_384(plate_384_id)
    "fk_plate_384_scan_tech_id" FOREIGN KEY (tech_id) REFERENCES tech(tech_id)
Referenced by:
    TABLE "sample_plate_384" CONSTRAINT "fk_sample_plate_384_plate_384_scan_id" FOREIGN KEY (plate_384_scan_id) REFERENCES plate_384_scan(plate_384_scan_id)
    TABLE "sample_plate_384_removed" CONSTRAINT "sample_plate_384_removed_plate_384_scan_id_fkey" FOREIGN KEY (plate_384_scan_id) REFERENCES plate_384_scan(plate_384_scan_id)
    TABLE "test_result_file" CONSTRAINT "test_result_file_plate_384_scan_id_fkey" FOREIGN KEY (plate_384_scan_id) REFERENCES plate_384_scan(plate_384_scan_id)

[limsdb_dev] # \d sample_plate_384
                                          Table "lab_data.sample_plate_384"
       Column        |  Type   | Collation | Nullable |                            Default
---------------------+---------+-----------+----------+---------------------------------------------------------------
 sample_plate_384_id | integer |           | not null | nextval('sample_plate_384_sample_plate_384_id_seq'::regclass)
 sample_id           | integer |           | not null |
 plate_384_scan_id   | integer |           | not null |
 plate_384_well      | integer |           | not null |
Indexes:
    "pk_sample_plate_384" PRIMARY KEY, btree (sample_plate_384_id)
    "sample_plate_384_idx001" btree (sample_id, sample_plate_384_id)
    "sample_plate_384_idx002" btree (sample_id, sample_plate_384_id, plate_384_scan_id)
    "sample_plate_384_idx003" btree (plate_384_scan_id, sample_plate_384_id)
    "sample_plate_384_idx004" btree (plate_384_scan_id, sample_plate_384_id, sample_id)
    "sample_plate_384_plate_384_scan_id_plate_384_well_idx" UNIQUE, btree (plate_384_scan_id, plate_384_well)
Foreign-key constraints:
    "fk_sample_plate_384_plate_384_scan_id" FOREIGN KEY (plate_384_scan_id) REFERENCES plate_384_scan(plate_384_scan_id)
    "fk_sample_plate_384_sample" FOREIGN KEY (sample_id) REFERENCES sample(sample_id)
Referenced by:
    TABLE "sample_plate_96_plate_384" CONSTRAINT "fk_sample_plate_96_plate_384_sample_plate_384_id" FOREIGN KEY (sample_plate_384_id) REFERENCES sample_plate_384(sample_plate_384_id)
    TABLE "test_result" CONSTRAINT "fk_test_result_sample_plate_384" FOREIGN KEY (sample_plate_384_id) REFERENCES sample_plate_384(sample_plate_384_id)
Statistics objects:
    "lab_data"."sp384_stats" (ndistinct, dependencies, mcv) ON sample_plate_384_id, sample_id FROM sample_plate_384

[limsdb_dev] # \d test_result
                                                Table "lab_data.test_result"
       Column        |           Type           | Collation | Nullable |                       Default
---------------------+--------------------------+-----------+----------+-----------------------------------------------------
 test_result_id      | integer                  |           | not null | nextval('test_result_test_result_id_seq'::regclass)
 sample_plate_384_id | integer                  |           | not null |
 sample_id           | integer                  |           | not null |
 equipment_id        | integer                  |           |          |
 test_result         | character varying(100)   |           |          |
 final_result_flag   | boolean                  |           |          |
 tech_id             | integer                  |           |          |
 ts                  | timestamp with time zone |           |          | CURRENT_TIMESTAMP
Indexes:
    "pk_test_result" PRIMARY KEY, btree (test_result_id)
    "test_result_idx001" btree (sample_id, ts, final_result_flag, test_result)
    "test_result_idx002" btree (ts, final_result_flag, test_result, sample_id)
    "test_result_idx003" btree (ts, sample_id)
    "test_result_idx004" btree (sample_id, sample_plate_384_id)
    "test_result_idx005" btree (sample_plate_384_id)
    "test_result_idx006" btree (sample_plate_384_id, sample_id, test_result)
    "test_result_sample_plate_384_id_idx" UNIQUE, btree (sample_plate_384_id)
Foreign-key constraints:
    "fk_test_result_equipment" FOREIGN KEY (equipment_id) REFERENCES equipment(equipment_id)
    "fk_test_result_sample" FOREIGN KEY (sample_id) REFERENCES sample(sample_id)
    "fk_test_result_sample_plate_384" FOREIGN KEY (sample_plate_384_id) REFERENCES sample_plate_384(sample_plate_384_id)
    "fk_test_result_tech" FOREIGN KEY (tech_id) REFERENCES tech(tech_id)
Referenced by:
    TABLE "test_result_detail" CONSTRAINT "fk_test_result_detail" FOREIGN KEY (test_result_id) REFERENCES test_result(test_result_id)
    TABLE "reported_test_result" CONSTRAINT "reported_test_result_test_result_id_fkey" FOREIGN KEY (test_result_id) REFERENCES test_result(test_result_id)
Statistics objects:
    "lab_data"."test_result_stats" (ndistinct, dependencies, mcv) ON sample_plate_384_id, sample_id FROM test_result

pgsql-performance by date:

Previous
From: Manish Lad
Date:
Subject: Re: How do we hint a query to use index in postgre
Next
From: Laurenz Albe
Date:
Subject: Re: SQL performance issue (postgresql chooses a bad plan when a better one is available)