Thread: QUERY PLANNER - Indexe mono column VS composite Index
Hello,
My 9.4 database is used as datawharehouse. I can't change the queries generated.
first index : INDEX COL (A,B,C,D,E)
In case of query based on COL A, the query planner sometimes go to a seq scan instead of using the first composite index.
The solution is to add a second indexe (redondant)
second index : INDEX COL (A)
In case of query based on COL A, B, C, D, (without E) as well, it doesn't uses the first index and prefers a seq scan.
I could create a third indexe :
first index : INDEX COL (A,B,C,D)
But I hope there is an other solution for that (table is huge).
It seems that the malus for using composite indexes is high.
Question is : is there a way to make the composite index more attractive to query planner ? (idealy equivalent to mono column indexe)
Thanks by advance
2015-07-09 22:34 GMT+02:00 Nicolas Paris <niparisco@gmail.com>:
Hello,My 9.4 database is used as datawharehouse. I can't change the queries generated.first index : INDEX COL (A,B,C,D,E)In case of query based on COL A, the query planner sometimes go to a seq scan instead of using the first composite index.The solution is to add a second indexe (redondant)second index : INDEX COL (A)In case of query based on COL A, B, C, D, (without E) as well, it doesn't uses the first index and prefers a seq scan.I could create a third indexe :first index : INDEX COL (A,B,C,D)But I hope there is an other solution for that (table is huge).It seems that the malus for using composite indexes is high.Question is : is there a way to make the composite index more attractive to query planner ? (idealy equivalent to mono column indexe)
There's no way we can answer that without seeing actual queries and query plans.
--
Ok, here is the problem (it's different than what I explained before)
==INDEX ==
CREATE INDEX of_idx_modifier
ON i2b2data_multi_nomi.observation_fact
USING btree
(concept_cd COLLATE pg_catalog."default", modifier_cd COLLATE pg_catalog."default", valtype_cd COLLATE pg_catalog."default", tval_char COLLATE pg_catalog."default", nval_num);
==QUERY==
EXPLAIN ANALYSE select f.patient_num
from i2b2data_multi_nomi.observation_fact f
where
f.concept_cd IN (select concept_cd from i2b2data_multi_nomi.concept_dimension where concept_path LIKE '\\i2b2\\cim10\\A00-B99\\%')
AND ( modifier_cd = '@' AND valtype_cd = 'T' AND tval_char IN ('DP') )
group by f.patient_num ;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=35153.99..35154.40 rows=41 width=4) (actual time=81.223..82.718 rows=5206 loops=1)
Group Key: f.patient_num
-> Nested Loop (cost=4740.02..35089.11 rows=25951 width=4) (actual time=45.393..76.893 rows=7359 loops=1)
-> HashAggregate (cost=4739.45..4748.64 rows=919 width=10) (actual time=45.097..45.586 rows=925 loops=1)
Group Key: (concept_dimension.concept_cd)::text
-> Seq Scan on concept_dimension (cost=0.00..4734.73 rows=1892 width=10) (actual time=17.479..44.573 rows=925 loops=1)
Filter: ((concept_path)::text ~~ '\\i2b2\\cim10\\A00-B99\\%'::text)
Rows Removed by Filter: 186413
-> Index Scan using of_idx_modifier on observation_fact f (cost=0.56..32.86 rows=15 width=14) (actual time=0.025..0.031 rows=8 loops=925)
Index Cond: (((concept_cd)::text = (concept_dimension.concept_cd)::text) AND ((modifier_cd)::text = '@'::text) AND ((valtype_cd)::text
= 'T'::text) AND ((tval_char)::text = 'DP'::text))
Planning time: 2.843 ms
Execution time: 83.273 ms
(12 rows)
============2 : without 3 constraint that match index => seq scan=======================================================================
EXPLAIN ANALYSE select f.patient_num
from i2b2data_multi_nomi.observation_fact f
where
f.concept_cd IN (select concept_cd from i2b2data_multi_nomi.concept_dimension where concept_path LIKE '\\i2b2\\cim10\\A00-B99\\%')
-- AND ( modifier_cd = '@' AND valtype_cd = 'T' AND tval_char IN ('DP') )
group by f.patient_num ;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=1345377.85..1346073.80 rows=69595 width=4) (actual time=18043.140..18048.741 rows=16865 loops=1)
Group Key: f.patient_num
-> Hash Join (cost=4760.13..1233828.53 rows=44619728 width=4) (actual time=17109.041..18027.763 rows=33835 loops=1)
Hash Cond: ((f.concept_cd)::text = (concept_dimension.concept_cd)::text)
-> Seq Scan on observation_fact f (cost=0.00..1057264.28 rows=44619728 width=14) (actual time=0.040..7918.984 rows=44619320 loops=1)
-> Hash (cost=4748.64..4748.64 rows=919 width=10) (actual time=49.523..49.523 rows=925 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 39kB
-> HashAggregate (cost=4739.45..4748.64 rows=919 width=10) (actual time=48.806..49.117 rows=925 loops=1)
Group Key: (concept_dimension.concept_cd)::text
-> Seq Scan on concept_dimension (cost=0.00..4734.73 rows=1892 width=10) (actual time=18.828..48.191 rows=925 loops=1)
Filter: ((concept_path)::text ~~ '\\i2b2\\cim10\\A00-B99\\%'::text)
Rows Removed by Filter: 186413
Planning time: 2.588 ms
Execution time: 18051.031 ms
(14 rows)
=========3: without a constraint on tval_char => seq scan========================================================================
EXPLAIN ANALYSE select f.patient_num
from i2b2data_multi_nomi.observation_fact f
where
f.concept_cd IN (select concept_cd from i2b2data_multi_nomi.concept_dimension where concept_path LIKE '\\i2b2\\cim10\\A00-B99\\%')
AND ( modifier_cd = '@' AND valtype_cd = 'T' )
group by f.patient_num ;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=1305637.84..1305688.23 rows=5039 width=4) (actual time=22689.279..22694.583 rows=16865 loops=1)
Group Key: f.patient_num
-> Hash Join (cost=4760.13..1297561.67 rows=3230468 width=4) (actual time=12368.418..22674.145 rows=33835 loops=1)
Hash Cond: ((f.concept_cd)::text = (concept_dimension.concept_cd)::text)
-> Seq Scan on observation_fact f (cost=0.00..1280362.92 rows=3230468 width=14) (actual time=0.226..22004.808 rows=3195625 loops=1)
Filter: (((modifier_cd)::text = '@'::text) AND ((valtype_cd)::text = 'T'::text))
Rows Removed by Filter: 41423695
-> Hash (cost=4748.64..4748.64 rows=919 width=10) (actual time=46.833..46.833 rows=925 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 39kB
-> HashAggregate (cost=4739.45..4748.64 rows=919 width=10) (actual time=46.196..46.515 rows=925 loops=1)
Group Key: (concept_dimension.concept_cd)::text
-> Seq Scan on concept_dimension (cost=0.00..4734.73 rows=1892 width=10) (actual time=18.899..45.800 rows=925 loops=1)
Filter: ((concept_path)::text ~~ '\\i2b2\\cim10\\A00-B99\\%'::text)
Rows Removed by Filter: 186413
Planning time: 1.940 ms
Execution time: 22695.913 ms
What I would like is the planner allways hit of_idx_modifier
Thanks !
2015-07-09 22:49 GMT+02:00 Guillaume Lelarge <guillaume@lelarge.info>:
2015-07-09 22:34 GMT+02:00 Nicolas Paris <niparisco@gmail.com>:Hello,My 9.4 database is used as datawharehouse. I can't change the queries generated.first index : INDEX COL (A,B,C,D,E)In case of query based on COL A, the query planner sometimes go to a seq scan instead of using the first composite index.The solution is to add a second indexe (redondant)second index : INDEX COL (A)In case of query based on COL A, B, C, D, (without E) as well, it doesn't uses the first index and prefers a seq scan.I could create a third indexe :first index : INDEX COL (A,B,C,D)But I hope there is an other solution for that (table is huge).It seems that the malus for using composite indexes is high.Question is : is there a way to make the composite index more attractive to query planner ? (idealy equivalent to mono column indexe)There's no way we can answer that without seeing actual queries and query plans.
--
On Fri, Jul 10, 2015 at 2:34 AM, Nicolas Paris <niparisco@gmail.com> wrote:
=========3: without a constraint on tval_char => seq scan========================================================================EXPLAIN ANALYSE select f.patient_numfrom i2b2data_multi_nomi.observation_fact fwheref.concept_cd IN (select concept_cd from i2b2data_multi_nomi.concept_dimension where concept_path LIKE '\\i2b2\\cim10\\A00-B99\\%')AND ( modifier_cd = '@' AND valtype_cd = 'T' )group by f.patient_num ;QUERY PLAN---------------------------------------------------------------------------------------------------------------------------------------HashAggregate (cost=1305637.84..1305688.23 rows=5039 width=4) (actual time=22689.279..22694.583 rows=16865 loops=1)Group Key: f.patient_num-> Hash Join (cost=4760.13..1297561.67 rows=3230468 width=4) (actual time=12368.418..22674.145 rows=33835 loops=1)Hash Cond: ((f.concept_cd)::text = (concept_dimension.concept_cd)::text)-> Seq Scan on observation_fact f (cost=0.00..1280362.92 rows=3230468 width=14) (actual time=0.226..22004.808 rows=3195625 loops=1)Filter: (((modifier_cd)::text = '@'::text) AND ((valtype_cd)::text = 'T'::text))Rows Removed by Filter: 41423695-> Hash (cost=4748.64..4748.64 rows=919 width=10) (actual time=46.833..46.833 rows=925 loops=1)Buckets: 1024 Batches: 1 Memory Usage: 39kB-> HashAggregate (cost=4739.45..4748.64 rows=919 width=10) (actual time=46.196..46.515 rows=925 loops=1)Group Key: (concept_dimension.concept_cd)::text-> Seq Scan on concept_dimension (cost=0.00..4734.73 rows=1892 width=10) (actual time=18.899..45.800 rows=925 loops=1)Filter: ((concept_path)::text ~~ '\\i2b2\\cim10\\A00-B99\\%'::text)Rows Removed by Filter: 186413Planning time: 1.940 msExecution time: 22695.913 msWhat I would like is the planner allways hit of_idx_modifier
What does the above explain analyze query give when you have an index on just modifier_cd, or maybe on both (modifier_cd, valtype_cd)?
Your original email said it uses the index in that case, but we would need to see the numbers in the query plan in order to figure out why it is doing that.
It seems like that the "tval_char IN ('DP')" part of the restriction is very selective, while the other two restrictions are not.
Cheers,
Jeff