Why is the optimiser choosing a sub-optimal plan? - Mailing list pgsql-performance
From | Stephen Cresswell |
---|---|
Subject | Why is the optimiser choosing a sub-optimal plan? |
Date | |
Msg-id | CAEywW=zTiZeCgcogXmBX1ufo0Y2+mz+JhC9tW=gwSN2CjFu4bw@mail.gmail.com Whole thread Raw |
Responses |
Re: Why is the optimiser choosing a sub-optimal plan?
|
List | pgsql-performance |
I have the a table with two indexes...
CREATE TABLE mobile_summary_usage
(
import text,
msisdn text,
type text,
total integer,
day date,
cycle text
);
CREATE INDEX mobile_summary_usage_msisdn_ cycle ON mobile_summary_usage
USING btree (msisdn, cycle);
CREATE INDEX mobile_summary_usage_cycle ON mobile_summary_usage USING
btree (cycle);
We insert approximately 2M records into this table each day. Whenever
someone wants to see the total amount of voice calls, text messages or
data they've used we query the table with the following
SELECT msisdn, type, sum (total), units
FROM mobile_summary_usage msu, mobile_summary_type mst
WHERE type = id AND msisdn = ? AND cycle = ?
GROUP BY msisdn, type, units;
Where:
msisdn is a mobile number
cycle is a billing cycle, e.g. 2016-10
mobile_summary_type contains 3 rows, one for each usage type.
Everything was working fine until we flipped over from 2016-10 to
2016-11. Then instead of averaging well below a 0.5 seconds to
respond, Postgres started taking over a second.
Running EXPLAIN ANALYZE on the above query shows that in 2016-10 when
there are approximately 100M rows, Postgres uses the compound (msisdn,
cycle) index. This has a cost of 3218.98 and takes 0.071 seconds.
HashAggregate (cost=3213.12..3218.98 rows=586 width=52) (actual
time=0.071..0.071 rows=0 loops=1)
Group Key: msu.msisdn, msu.type, mst.units
-> Hash Join (cost=62.54..3205.15 rows=797 width=52) (actual
time=0.069..0.069 rows=0 loops=1)
Hash Cond: (msu.type = mst.id)
-> Bitmap Heap Scan on mobile_summary_usage msu
(cost=32.74..3164.39 rows=797 width=20) (actual time=0.037..0.037
rows=0 loops=1)
Recheck Cond: ((msisdn = '07700900331'::text) AND (cycle
= '2016-10'::text))
-> Bitmap Index Scan on
mobile_summary_usage_msisdn_ cycle (cost=0.00..32.54 rows=797 width=0)
(actual time=0.036..0.036 rows=0 loops=1)
Index Cond: ((msisdn = '07700900331'::text) AND
(cycle = '2016-10'::text))
-> Hash (cost=18.80..18.80 rows=880 width=64) (actual
time=0.026..0.026 rows=4 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on mobile_summary_type mst
(cost=0.00..18.80 rows=880 width=64) (actual time=0.014..0.016 rows=4
loops=1)
Planning time: 0.197 ms
Execution time: 0.125 ms
When I re-run the plan for 2016-11 (currently 4M rows), Postgres uses
the simpler "cycle" index. The cost is 12.79 but the actual time taken
is 1412.609 seconds
HashAggregate (cost=12.78..12.79 rows=1 width=52) (actual
time=1412.609..1412.609 rows=0 loops=1)
Execution time: 1412.674 ms
Group Key: msu.msisdn, msu.type, mst.units
-> Nested Loop (cost=0.72..12.77 rows=1 width=52) (actual
time=1412.606..1412.606 rows=0 loops=1)
-> Index Scan using mobile_summary_usage_cycle on
mobile_summary_usage msu (cost=0.57..4.59 rows=1 width=20) (actual
time=1412.604..1412.604 rows=0 loops=1)
-> Index Scan using mobile_summary_type_pkey on
mobile_summary_type mst (cost=0.15..8.17 rows=1 width=64) (never
executed)
Rows Removed by Filter: 3932875
Index Cond: (id = msu.type)
Index Cond: (cycle = '2016-11'::text)
Filter: (msisdn = '07700900331'::text)
I understand there are a whole host of reasons why postgres may chose
different plans based on data volumes, but in this case despite the
lower cost the performance is significantly worse. Is there any
explanation for why it's making such a poor decision and
recommendations for how to fix it?
Any help appreciated.
CREATE TABLE mobile_summary_usage
(
import text,
msisdn text,
type text,
total integer,
day date,
cycle text
);
CREATE INDEX mobile_summary_usage_msisdn_
USING btree (msisdn, cycle);
CREATE INDEX mobile_summary_usage_cycle ON mobile_summary_usage USING
btree (cycle);
We insert approximately 2M records into this table each day. Whenever
someone wants to see the total amount of voice calls, text messages or
data they've used we query the table with the following
SELECT msisdn, type, sum (total), units
FROM mobile_summary_usage msu, mobile_summary_type mst
WHERE type = id AND msisdn = ? AND cycle = ?
GROUP BY msisdn, type, units;
Where:
msisdn is a mobile number
cycle is a billing cycle, e.g. 2016-10
mobile_summary_type contains 3 rows, one for each usage type.
Everything was working fine until we flipped over from 2016-10 to
2016-11. Then instead of averaging well below a 0.5 seconds to
respond, Postgres started taking over a second.
Running EXPLAIN ANALYZE on the above query shows that in 2016-10 when
there are approximately 100M rows, Postgres uses the compound (msisdn,
cycle) index. This has a cost of 3218.98 and takes 0.071 seconds.
HashAggregate (cost=3213.12..3218.98 rows=586 width=52) (actual
time=0.071..0.071 rows=0 loops=1)
Group Key: msu.msisdn, msu.type, mst.units
-> Hash Join (cost=62.54..3205.15 rows=797 width=52) (actual
time=0.069..0.069 rows=0 loops=1)
Hash Cond: (msu.type = mst.id)
-> Bitmap Heap Scan on mobile_summary_usage msu
(cost=32.74..3164.39 rows=797 width=20) (actual time=0.037..0.037
rows=0 loops=1)
Recheck Cond: ((msisdn = '07700900331'::text) AND (cycle
= '2016-10'::text))
-> Bitmap Index Scan on
mobile_summary_usage_msisdn_
(actual time=0.036..0.036 rows=0 loops=1)
Index Cond: ((msisdn = '07700900331'::text) AND
(cycle = '2016-10'::text))
-> Hash (cost=18.80..18.80 rows=880 width=64) (actual
time=0.026..0.026 rows=4 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on mobile_summary_type mst
(cost=0.00..18.80 rows=880 width=64) (actual time=0.014..0.016 rows=4
loops=1)
Planning time: 0.197 ms
Execution time: 0.125 ms
When I re-run the plan for 2016-11 (currently 4M rows), Postgres uses
the simpler "cycle" index. The cost is 12.79 but the actual time taken
is 1412.609 seconds
HashAggregate (cost=12.78..12.79 rows=1 width=52) (actual
time=1412.609..1412.609 rows=0 loops=1)
Execution time: 1412.674 ms
Group Key: msu.msisdn, msu.type, mst.units
-> Nested Loop (cost=0.72..12.77 rows=1 width=52) (actual
time=1412.606..1412.606 rows=0 loops=1)
-> Index Scan using mobile_summary_usage_cycle on
mobile_summary_usage msu (cost=0.57..4.59 rows=1 width=20) (actual
time=1412.604..1412.604 rows=0 loops=1)
-> Index Scan using mobile_summary_type_pkey on
mobile_summary_type mst (cost=0.15..8.17 rows=1 width=64) (never
executed)
Rows Removed by Filter: 3932875
Index Cond: (id = msu.type)
Index Cond: (cycle = '2016-11'::text)
Filter: (msisdn = '07700900331'::text)
I understand there are a whole host of reasons why postgres may chose
different plans based on data volumes, but in this case despite the
lower cost the performance is significantly worse. Is there any
explanation for why it's making such a poor decision and
recommendations for how to fix it?
Any help appreciated.
pgsql-performance by date: