Why is the optimiser choosing the slower query, or, understanding explain analyze output - Mailing list pgsql-performance

From Alistair Bayley
Subject Why is the optimiser choosing the slower query, or, understanding explain analyze output
Date
Msg-id CAKYODbj6xQDShQd2g3+9oue0+cz3NXjrnRvmbrfRDRoReWxFKQ@mail.gmail.com
Whole thread Raw
Responses Re: Why is the optimiser choosing the slower query, or, understanding explain analyze output
List pgsql-performance
I have postgresql 8.4.15 on Ubuntu 10.04 and this query:

    SELECT MAX(probeTable.PROBE_ALARM_EVENT_ID) AS MAX_EVENT_ID
    FROM ALARM_EVENT eventTable
    INNER JOIN ALARM_EVENT_PROBE probeTable
       ON eventTable.ALARM_EVENT_ID = probeTable.ALARM_EVENT_ID
    WHERE probeTable.PROBE_ID = 2

which is running slower than it could. Table definitions and explain
analyze output below.
The first explain is the current plan (uses sequential scans).
The second is after I have disabled sequential scans, and is the plan
I would prefer.

I have vacuum analyzed both tables. In terms of relevant changes to
the default postgresql.conf, we have these:

    shared_buffers = 28MB
    constraint_exclusion = on

I want to understand why the optimiser is choosing the plan with
sequential table scans, rather than the plan with index scans.
I am not sure how to interpret the predicted vs actual times/costs,
and want to understand why the predicted cost for the index scan plan
seems to be way off.

I have read: https://wiki.postgresql.org/images/4/45/Explaining_EXPLAIN.pdf


    show server_version;

     server_version
    ----------------
     8.4.15


          Table "public.alarm_event_probe"
            Column        |  Type   | Modifiers
    ----------------------+---------+-----------
     alarm_event_id       | bigint  | not null
     probe_id             | integer | not null
     probe_alarm_event_id | bigint  | not null
    Indexes:
        "alarm_event_probe_pkey" PRIMARY KEY, btree (alarm_event_id)
        "alarm_event_probe_unique" UNIQUE, btree (probe_id,
probe_alarm_event_id)
    Foreign-key constraints:
        "alarm_event_probe_fk" FOREIGN KEY (probe_id) REFERENCES probe(probe_id)
    Triggers:
        alarm_event_probe_alarm_event_foreign_key_trigger BEFORE
INSERT OR UPDATE ON alarm_event_probe FOR EACH ROW EXECUTE PROCEDURE
alarm_event_foreign_key_function()

                                                     Table "public.alarm_event"
             Column         |            Type             |
                  Modifiers

------------------------+-----------------------------+----------------------------------------------------------------------
     alarm_event_id         | bigint                      | not null
default nextval('alarm_event_alarm_event_id_seq'::regclass)
     timestamp_device_utc   | timestamp without time zone |
     timestamp_manager_utc  | timestamp without time zone |
     timestamp_proxy_utc    | timestamp without time zone | not null
     timestamp_database_utc | timestamp without time zone | not null
default systimestamp()
     device_name            | character varying(32)       | not null
     device_location        | character varying(4)        | not null
     device_type            | character varying(6)        | not null
     device_category        | character varying(32)       | not null
     device_instance        | integer                     | not null
     proxy_name             | character varying(128)      | not null
     proxy_instance         | character varying(32)       | not null
     proxy_source           | character varying(256)      | not null
     native_alarm_id        | character varying(32)       |
     alarm_name             | character varying(64)       | not null
     alarm_severity         | integer                     |
     alarm_description      | character varying(1024)     | not null
     org_code               | integer                     | not null default 348
     domain_id              | integer                     |
     service_state          | integer                     | not null default 0
     proactive              | boolean                     | not null
default true
     alarm_id               | bigint                      |
    Indexes:
        "alarm_event_pkey" PRIMARY KEY, btree (alarm_event_id)
    Check constraints:
        "alarm_event_alarm_severity_range" CHECK (alarm_severity >= 0
AND alarm_severity <= 5)
        "alarm_event_service_state_range" CHECK (service_state >= 0
AND service_state <= 2)
    Foreign-key constraints:
        "alarm_event_domain_fk" FOREIGN KEY (domain_id) REFERENCES
domain(domain_id)
        "alarm_event_organisation_fk" FOREIGN KEY (org_code)
REFERENCES organisation(org_code)
    Triggers:
        alarm_event_1_raw_trigger AFTER INSERT ON alarm_event FOR EACH
ROW EXECUTE PROCEDURE alarm_event_raw_trigger_function()
        alarm_event_2_trigger_a AFTER INSERT OR DELETE OR UPDATE ON
alarm_event FOR EACH ROW EXECUTE PROCEDURE
alarm_event_trigger_function()
        alarm_event_probe_alarm_event_delete_cascade_trigger BEFORE
DELETE ON alarm_event FOR EACH ROW EXECUTE PROCEDURE
alarm_event_delete_alarm_event_probe_function()
        alarm_event_trigger_b BEFORE INSERT OR DELETE OR UPDATE ON
alarm_event FOR EACH ROW EXECUTE PROCEDURE
alarm_event_trigger_function()
        alarm_event_zz_insert_trigger AFTER INSERT ON alarm_event FOR
EACH ROW EXECUTE PROCEDURE alarm_event_insert()
    Number of child tables: 33 (Use \d+ to list them.)


       QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------------
     Aggregate  (cost=451984.11..451984.12 rows=1 width=8) (actual
time=248693.886..248693.889 rows=1 loops=1)
       ->  Hash Join  (cost=107.06..443578.92 rows=3362075 width=8)
(actual time=2521.275..248683.459 rows=1934 loops=1)
             Hash Cond: (eventtable.alarm_event_id = probetable.alarm_event_id)
             ->  Append  (cost=0.00..370989.07 rows=7772408 width=8)
(actual time=14.364..220430.509 rows=7771865 loops=1)
                   ->  Seq Scan on alarm_event eventtable
(cost=0.00..3.00 rows=1 width=8) (actual time=0.088..0.088 rows=0
loops=1)
                   ->  Seq Scan on alarm_event_y2011m12 eventtable
(cost=0.00..12638.54 rows=290254 width=8) (actual
time=14.267..11136.290 rows=290254 loops=1)
                   ->  Seq Scan on alarm_event_y2011m10 eventtable
(cost=0.00..9146.57 rows=214457 width=8) (actual time=7.719..5820.900
rows=214457 loops=1)
                   ->  Seq Scan on alarm_event_y2011m09 eventtable
(cost=0.00..1183.60 rows=27660 width=8) (actual time=0.046..111.918
rows=27660 loops=1)
                   ->  Seq Scan on alarm_event_y2011m11 eventtable
(cost=0.00..6209.42 rows=149342 width=8) (actual time=0.055..662.708
rows=149342 loops=1)
                   ->  Seq Scan on alarm_event_y2012m01 eventtable
(cost=0.00..8661.84 rows=207184 width=8) (actual time=0.075..943.065
rows=207184 loops=1)
                   ->  Seq Scan on alarm_event_y2012m02 eventtable
(cost=0.00..7824.78 rows=182378 width=8) (actual time=0.051..6620.416
rows=182378 loops=1)
                   ->  Seq Scan on alarm_event_y2012m03 eventtable
(cost=0.00..16717.50 rows=386350 width=8) (actual
time=101.301..14018.406 rows=386350 loops=1)
                   ->  Seq Scan on alarm_event_y2012m04 eventtable
(cost=0.00..10125.17 rows=238117 width=8) (actual
time=25.155..3013.045 rows=238117 loops=1)
                   ->  Seq Scan on alarm_event_y2012m05 eventtable
(cost=0.00..10747.73 rows=251573 width=8) (actual time=0.058..1605.062
rows=251573 loops=1)
                   ->  Seq Scan on alarm_event_y2012m06 eventtable
(cost=0.00..16638.79 rows=387879 width=8) (actual time=0.122..4477.169
rows=387879 loops=1)
                   ->  Seq Scan on alarm_event_y2012m07 eventtable
(cost=0.00..9675.58 rows=222658 width=8) (actual time=85.891..8504.216
rows=222658 loops=1)
                   ->  Seq Scan on alarm_event_y2012m08 eventtable
(cost=0.00..9570.01 rows=234201 width=8) (actual
time=106.049..7978.829 rows=234201 loops=1)
                   ->  Seq Scan on alarm_event_y2012m09 eventtable
(cost=0.00..12731.91 rows=300791 width=8) (actual time=5.516..5596.174
rows=300791 loops=1)
                   ->  Seq Scan on alarm_event_y2012m10 eventtable
(cost=0.00..11052.83 rows=266483 width=8) (actual time=0.064..1159.065
rows=266483 loops=1)
                   ->  Seq Scan on alarm_event_y2012m11 eventtable
(cost=0.00..9540.78 rows=226778 width=8) (actual time=0.045..878.269
rows=226778 loops=1)
                   ->  Seq Scan on alarm_event_y2012m12 eventtable
(cost=0.00..8110.64 rows=208464 width=8) (actual time=0.061..883.399
rows=208464 loops=1)
                   ->  Seq Scan on alarm_event_y2013m01 eventtable
(cost=0.00..13391.78 rows=272178 width=8) (actual
time=0.028..10847.953 rows=270708 loops=1)
                   ->  Seq Scan on alarm_event_y2013m02 eventtable
(cost=0.00..15720.65 rows=297365 width=8) (actual
time=134.814..8192.080 rows=297204 loops=1)
                   ->  Seq Scan on alarm_event_y2013m03 eventtable
(cost=0.00..41027.29 rows=810229 width=8) (actual
time=11.567..8419.318 rows=805929 loops=1)
                   ->  Seq Scan on alarm_event_y2013m04 eventtable
(cost=0.00..13382.35 rows=253135 width=8) (actual time=0.072..9731.649
rows=253329 loops=1)
                   ->  Seq Scan on alarm_event_y2013m05 eventtable
(cost=0.00..9793.55 rows=175455 width=8) (actual time=4.434..9965.956
rows=176525 loops=1)
                   ->  Seq Scan on alarm_event_y2013m06 eventtable
(cost=0.00..9961.74 rows=184074 width=8) (actual
time=123.567..11228.522 rows=184335 loops=1)
                   ->  Seq Scan on alarm_event_y2013m07 eventtable
(cost=0.00..10330.15 rows=190215 width=8) (actual time=4.728..4839.910
rows=189743 loops=1)
                   ->  Seq Scan on alarm_event_y2013m08 eventtable
(cost=0.00..8808.29 rows=160329 width=8) (actual time=8.731..2313.534
rows=161477 loops=1)
                   ->  Seq Scan on alarm_event_y2013m09 eventtable
(cost=0.00..14232.17 rows=273617 width=8) (actual time=0.035..1367.249
rows=273621 loops=1)
                   ->  Seq Scan on alarm_event_y2013m10 eventtable
(cost=0.00..17202.44 rows=320544 width=8) (actual time=0.105..1853.031
rows=320310 loops=1)
                   ->  Seq Scan on alarm_event_y2013m11 eventtable
(cost=0.00..15857.97 rows=278997 width=8) (actual time=0.044..7627.316
rows=281997 loops=1)
                   ->  Seq Scan on alarm_event_y2013m12 eventtable
(cost=0.00..15012.38 rows=278738 width=8) (actual
time=121.673..11397.349 rows=280114 loops=1)
                   ->  Seq Scan on alarm_event_y2014m01 eventtable
(cost=0.00..17534.21 rows=331521 width=8) (actual
time=70.202..7098.822 rows=330846 loops=1)
                   ->  Seq Scan on alarm_event_y2014m02 eventtable
(cost=0.00..8124.21 rows=151321 width=8) (actual time=22.929..1894.643
rows=151158 loops=1)
                   ->  Seq Scan on alarm_event_y2014m03 eventtable
(cost=0.00..10.40 rows=40 width=8) (actual time=0.003..0.003 rows=0
loops=1)
                   ->  Seq Scan on alarm_event_y2014m04 eventtable
(cost=0.00..10.40 rows=40 width=8) (actual time=0.004..0.004 rows=0
loops=1)
                   ->  Seq Scan on alarm_event_y2014m05 eventtable
(cost=0.00..10.40 rows=40 width=8) (actual time=0.003..0.003 rows=0
loops=1)
             ->  Hash  (cost=82.89..82.89 rows=1934 width=16) (actual
time=16.402..16.402 rows=1934 loops=1)
                   ->  Seq Scan on alarm_event_probe probetable
(cost=0.00..82.89 rows=1934 width=16) (actual time=0.403..8.985
rows=1934 loops=1)
                         Filter: (probe_id = 2)
     Total runtime: 248864.890 ms
    (42 rows)

    set enable_seqscan = false;
    SET


                      QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
     Aggregate  (cost=502051.06..502051.07 rows=1 width=8) (actual
time=11597.236..11597.239 rows=1 loops=1)
       ->  Nested Loop  (cost=0.00..493645.84 rows=3362085 width=8)
(actual time=151.505..11587.339 rows=1934 loops=1)
             Join Filter: (eventtable.alarm_event_id =
probetable.alarm_event_id)
             ->  Index Scan using alarm_event_probe_pkey on
alarm_event_probe probetable  (cost=0.00..168.49 rows=1934 width=16)
(actual time=143.484..159.889 rows=1934 loops=1)
                   Filter: (probe_id = 2)
             ->  Append  (cost=0.00..254.73 rows=34 width=8) (actual
time=5.343..5.874 rows=1 loops=1934)
                   ->  Index Scan using alarm_event_pkey on
alarm_event eventtable  (cost=0.00..6.65 rows=1 width=8) (actual
time=0.008..0.008 rows=0 loops=1934)
                         Index Cond: (eventtable.alarm_event_id =
probetable.alarm_event_id)
                   ->  Index Scan using
alarm_event_y2011m12_alarm_event_id_ix on alarm_event_y2011m12
eventtable  (cost=0.00..7.89 rows=1 width=8) (actual time=0.378..0.378
rows=0 loops=1934)
                         Index Cond: (eventtable.alarm_event_id =
probetable.alarm_event_id)
                   ->  Index Scan using
alarm_event_y2011m10_alarm_event_id_ix on alarm_event_y2011m10
eventtable  (cost=0.00..7.87 rows=1 width=8) (actual time=0.008..0.008
rows=0 loops=1934)
                         Index Cond: (eventtable.alarm_event_id =
probetable.alarm_event_id)
                   ->  Index Scan using
alarm_event_y2011m09_alarm_event_id_ix on alarm_event_y2011m09
eventtable  (cost=0.00..7.79 rows=1 width=8) (actual time=0.007..0.007
rows=0 loops=1934)
                         Index Cond: (eventtable.alarm_event_id =
probetable.alarm_event_id)
                   ->  Index Scan using
alarm_event_y2011m11_alarm_event_id_ix on alarm_event_y2011m11
eventtable  (cost=0.00..7.85 rows=1 width=8) (actual time=0.008..0.008
rows=0 loops=1934)
                         Index Cond: (eventtable.alarm_event_id =
probetable.alarm_event_id)
                   ->  Index Scan using
alarm_event_y2012m01_alarm_event_id_ix on alarm_event_y2012m01
eventtable  (cost=0.00..7.87 rows=1 width=8) (actual time=0.479..0.480
rows=0 loops=1934)
                         Index Cond: (eventtable.alarm_event_id =
probetable.alarm_event_id)
                   ->  Index Scan using
alarm_event_y2012m02_alarm_event_id_ix on alarm_event_y2012m02
eventtable  (cost=0.00..7.86 rows=1 width=8) (actual time=0.015..0.015
rows=0 loops=1934)
                         Index Cond: (eventtable.alarm_event_id =
probetable.alarm_event_id)
                   ->  Index Scan using
alarm_event_y2012m03_alarm_event_id_ix on alarm_event_y2012m03
eventtable  (cost=0.00..7.91 rows=1 width=8) (actual time=0.016..0.016
rows=0 loops=1934)
                         Index Cond: (eventtable.alarm_event_id =
probetable.alarm_event_id)
                   ->  Index Scan using
alarm_event_y2012m04_alarm_event_id_ix on alarm_event_y2012m04
eventtable  (cost=0.00..7.87 rows=1 width=8) (actual time=0.014..0.014
rows=0 loops=1934)
                         Index Cond: (eventtable.alarm_event_id =
probetable.alarm_event_id)
                   ->  Index Scan using
alarm_event_y2012m05_alarm_event_id_ix on alarm_event_y2012m05
eventtable  (cost=0.00..7.88 rows=1 width=8) (actual time=0.015..0.015
rows=0 loops=1934)
                         Index Cond: (eventtable.alarm_event_id =
probetable.alarm_event_id)
                   ->  Index Scan using
alarm_event_y2012m06_alarm_event_id_ix on alarm_event_y2012m06
eventtable  (cost=0.00..7.91 rows=1 width=8) (actual time=0.071..0.071
rows=0 loops=1934)
                         Index Cond: (eventtable.alarm_event_id =
probetable.alarm_event_id)
                   ->  Index Scan using
alarm_event_y2012m07_alarm_event_id_ix on alarm_event_y2012m07
eventtable  (cost=0.00..7.87 rows=1 width=8) (actual time=0.016..0.016
rows=0 loops=1934)
                         Index Cond: (eventtable.alarm_event_id =
probetable.alarm_event_id)
                   ->  Index Scan using
alarm_event_y2012m08_alarm_event_id_ix on alarm_event_y2012m08
eventtable  (cost=0.00..7.88 rows=1 width=8) (actual time=0.027..0.027
rows=0 loops=1934)
                         Index Cond: (eventtable.alarm_event_id =
probetable.alarm_event_id)
                   ->  Index Scan using
alarm_event_y2012m09_alarm_event_id_ix on alarm_event_y2012m09
eventtable  (cost=0.00..7.89 rows=1 width=8) (actual time=0.664..0.670
rows=0 loops=1934)
                         Index Cond: (eventtable.alarm_event_id =
probetable.alarm_event_id)
                   ->  Index Scan using
alarm_event_y2012m10_alarm_event_id_ix on alarm_event_y2012m10
eventtable  (cost=0.00..7.88 rows=1 width=8) (actual time=0.381..0.381
rows=0 loops=1934)
                         Index Cond: (eventtable.alarm_event_id =
probetable.alarm_event_id)
                   ->  Index Scan using
alarm_event_y2012m11_alarm_event_id_ix on alarm_event_y2012m11
eventtable  (cost=0.00..7.87 rows=1 width=8) (actual time=0.308..0.309
rows=0 loops=1934)
                         Index Cond: (eventtable.alarm_event_id =
probetable.alarm_event_id)
                   ->  Index Scan using
alarm_event_y2012m12_alarm_event_id_ix on alarm_event_y2012m12
eventtable  (cost=0.00..7.86 rows=1 width=8) (actual time=0.033..0.033
rows=0 loops=1934)
                         Index Cond: (eventtable.alarm_event_id =
probetable.alarm_event_id)
                   ->  Index Scan using
alarm_event_y2013m01_alarm_event_id_ix on alarm_event_y2013m01
eventtable  (cost=0.00..7.89 rows=1 width=8) (actual time=0.032..0.032
rows=0 loops=1934)
                         Index Cond: (eventtable.alarm_event_id =
probetable.alarm_event_id)
                   ->  Index Scan using
alarm_event_y2013m02_alarm_event_id_ix on alarm_event_y2013m02
eventtable  (cost=0.00..7.90 rows=1 width=8) (actual time=0.254..0.254
rows=0 loops=1934)
                         Index Cond: (eventtable.alarm_event_id =
probetable.alarm_event_id)
                   ->  Index Scan using
alarm_event_y2013m03_alarm_event_id_ix on alarm_event_y2013m03
eventtable  (cost=0.00..8.03 rows=1 width=8) (actual time=0.030..0.030
rows=0 loops=1934)
                         Index Cond: (eventtable.alarm_event_id =
probetable.alarm_event_id)
                   ->  Index Scan using
alarm_event_y2013m04_alarm_event_id_ix on alarm_event_y2013m04
eventtable  (cost=0.00..7.89 rows=1 width=8) (actual time=0.095..0.095
rows=0 loops=1934)
                         Index Cond: (eventtable.alarm_event_id =
probetable.alarm_event_id)
                   ->  Index Scan using
alarm_event_y2013m05_alarm_event_id_ix on alarm_event_y2013m05
eventtable  (cost=0.00..7.86 rows=1 width=8) (actual time=0.023..0.023
rows=0 loops=1934)
                         Index Cond: (eventtable.alarm_event_id =
probetable.alarm_event_id)
                   ->  Index Scan using
alarm_event_y2013m06_alarm_event_id_ix on alarm_event_y2013m06
eventtable  (cost=0.00..7.87 rows=1 width=8) (actual time=0.104..0.104
rows=0 loops=1934)
                         Index Cond: (eventtable.alarm_event_id =
probetable.alarm_event_id)
                   ->  Index Scan using
alarm_event_y2013m07_alarm_event_id_ix on alarm_event_y2013m07
eventtable  (cost=0.00..7.87 rows=1 width=8) (actual time=0.179..0.180
rows=0 loops=1934)
                         Index Cond: (eventtable.alarm_event_id =
probetable.alarm_event_id)
                   ->  Index Scan using
alarm_event_y2013m08_alarm_event_id_ix on alarm_event_y2013m08
eventtable  (cost=0.00..7.87 rows=1 width=8) (actual time=0.250..0.250
rows=0 loops=1934)
                         Index Cond: (eventtable.alarm_event_id =
probetable.alarm_event_id)
                   ->  Index Scan using
alarm_event_y2013m09_alarm_event_id_ix on alarm_event_y2013m09
eventtable  (cost=0.00..7.89 rows=1 width=8) (actual time=0.626..0.627
rows=0 loops=1934)
                         Index Cond: (eventtable.alarm_event_id =
probetable.alarm_event_id)
                   ->  Index Scan using
alarm_event_y2013m10_alarm_event_id_ix on alarm_event_y2013m10
eventtable  (cost=0.00..7.90 rows=1 width=8) (actual time=0.300..0.300
rows=0 loops=1934)
                         Index Cond: (eventtable.alarm_event_id =
probetable.alarm_event_id)
                   ->  Index Scan using
alarm_event_y2013m11_alarm_event_id_ix on alarm_event_y2013m11
eventtable  (cost=0.00..7.90 rows=1 width=8) (actual time=0.201..0.201
rows=0 loops=1934)
                         Index Cond: (eventtable.alarm_event_id =
probetable.alarm_event_id)
                   ->  Index Scan using
alarm_event_y2013m12_alarm_event_id_ix on alarm_event_y2013m12
eventtable  (cost=0.00..7.90 rows=1 width=8) (actual time=0.552..0.552
rows=0 loops=1934)
                         Index Cond: (eventtable.alarm_event_id =
probetable.alarm_event_id)
                   ->  Index Scan using
alarm_event_y2014m01_alarm_event_id_ix on alarm_event_y2014m01
eventtable  (cost=0.00..7.91 rows=1 width=8) (actual time=0.323..0.324
rows=0 loops=1934)
                         Index Cond: (eventtable.alarm_event_id =
probetable.alarm_event_id)
                   ->  Index Scan using
alarm_event_y2014m02_alarm_event_id_ix on alarm_event_y2014m02
eventtable  (cost=0.00..7.86 rows=1 width=8) (actual time=0.302..0.303
rows=0 loops=1934)
                         Index Cond: (eventtable.alarm_event_id =
probetable.alarm_event_id)
                   ->  Index Scan using
alarm_event_y2014m03_alarm_event_id_ix on alarm_event_y2014m03
eventtable  (cost=0.00..3.87 rows=1 width=8) (actual time=0.005..0.005
rows=0 loops=1934)
                         Index Cond: (eventtable.alarm_event_id =
probetable.alarm_event_id)
                   ->  Index Scan using
alarm_event_y2014m04_alarm_event_id_ix on alarm_event_y2014m04
eventtable  (cost=0.00..3.87 rows=1 width=8) (actual time=0.005..0.005
rows=0 loops=1934)
                         Index Cond: (eventtable.alarm_event_id =
probetable.alarm_event_id)
                   ->  Index Scan using
alarm_event_y2014m05_alarm_event_id_ix on alarm_event_y2014m05
eventtable  (cost=0.00..3.87 rows=1 width=8) (actual time=0.005..0.005
rows=0 loops=1934)
                         Index Cond: (eventtable.alarm_event_id =
probetable.alarm_event_id)
     Total runtime: 11599.260 ms
    (75 rows)


pgsql-performance by date:

Previous
From: "Tomas Vondra"
Date:
Subject: Re: Optimal settings for RAID controller - optimized for writes
Next
From: David Wall
Date:
Subject: DB size and TABLE sizes don't seem to add up