Thread: Wrong index selection
For Postgresql:
> select version();
version
-------------------------------------------------------------------------------------------------------------
PostgreSQL 9.3.2 on amd64-portbld-freebsd9.2, compiled by cc (GCC) 4.2.1 20070831 patched [FreeBSD], 64-bit
> \d core.cookies2tags
Tabela "core.cookies2tags"
Kolumna | Typ | Modyfikatory
-----------------------+-----------------------------+--------------------------
co2ta_co_id | integer | niepusty
co2ta_cl_id | integer | niepusty
co2ta_ta_id | integer | niepusty
co2ta_ta_ukey_id | text |
co2ta_ta_ukey_hash | character(40) |
co2ta_fpr_id | integer |
co2ta_date_first | timestamp without time zone | niepusty domyślnie now()
co2ta_date_last | timestamp without time zone | niepusty domyślnie now()
co2ta_count_all | integer | niepusty domyślnie 1
co2ta_count_1 | integer | niepusty domyślnie 1
co2ta_date_1 | date | niepusty
co2ta_datelist_date | date |
co2ta_datelist_counts | integer[] |
co2ta_ta_params | hstore |
co2ta_fca_id | integer |
co2ta_mco_id | integer |
Indeksy:
"cookies2tags_ukey1" UNIQUE, btree (co2ta_co_id, co2ta_cl_id, co2ta_ta_id, co2ta_ta_ukey_hash) WHERE co2ta_ta_ukey_hash IS NOT NULL
"cookies2tags_ukey2" UNIQUE, btree (co2ta_co_id, co2ta_cl_id, co2ta_ta_id) WHERE co2ta_ta_ukey_hash IS NULL
"cookies2tags_co_id_key" btree (co2ta_co_id)
"cookies2tags_co_id_key2" btree (co2ta_co_id, co2ta_cl_id)
"cookies2tags_key1" btree (co2ta_cl_id, co2ta_ta_id, co2ta_ta_ukey_hash)
"cookies2tags_key2" btree (co2ta_cl_id, co2ta_ta_ukey_hash) WHERE co2ta_fpr_id IS NULL AND (co2ta_ta_id = ANY (ARRAY[1, 2, 3, 4]))
"cookies2tags_key3" btree (co2ta_cl_id, co2ta_ta_id, co2ta_date_1)
"cookies2tags_key4" btree (co2ta_mco_id)
"idx_co_id_date_last" btree (co2ta_co_id, co2ta_date_last)
Table is rather big (about 150M rows).
For this query:
WITH s AS (
SELECT
co2ta_co_id AS co_id,
co2ta_ta_id AS ta_id,
MIN(co2ta_date_last) AS co2ta_date_last_min,
MAX(co2ta_date_last) AS co2ta_date_last_max,
COUNT(DISTINCT(co2ta_ta_ukey_hash)) AS co2ta_ta_ukey_count,
1
FROM
core.cookies2tags co2ta
WHERE
co2ta.co2ta_co_id = ANY('{"1","123567429","123872617","123929118","123930244","123935996","123937156","123944495","123944999","123945469"}'::int[]) AND
co2ta.co2ta_cl_id = 97 AND
co2ta.co2ta_ta_id = ANY('{"142"}'::int[])
GROUP BY
ta_id,
co_id
)
SELECT
*
FROM
s
UNION ALL
SELECT
s.co_id,
NULL,
MIN(s.co2ta_date_last_min),
MAX(s.co2ta_date_last_min),
NULL,
1
FROM
s
GROUP BY
s.co_id
i get following plan:
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------
Append (cost=49.38..49.44 rows=2 width=36) (actual time=39.009..39.034 rows=16 loops=1)
CTE s
-> GroupAggregate (cost=49.35..49.38 rows=1 width=57) (actual time=39.006..39.016 rows=8 loops=1)
-> Sort (cost=49.35..49.35 rows=1 width=57) (actual time=38.993..38.993 rows=8 loops=1)
Sort Key: co2ta.co2ta_ta_id, co2ta.co2ta_co_id
Sort Method: quicksort Memory: 25kB
-> Index Scan using cookies2tags_key3 on cookies2tags co2ta (cost=0.57..49.34 rows=1 width=57) (actual time=38.339..38.982 rows=8 loops=1)
Index Cond: ((co2ta_cl_id = 97) AND (co2ta_ta_id = ANY ('{142}'::integer[])))
Filter: (co2ta_co_id = ANY ('{1,123567429,123872617,123929118,123930244,123935996,123937156,123944495,123944999,123945469}'::integer[]))
Rows Removed by Filter: 32120
-> CTE Scan on s (cost=0.00..0.02 rows=1 width=36) (actual time=39.008..39.021 rows=8 loops=1)
-> HashAggregate (cost=0.03..0.04 rows=1 width=12) (actual time=0.009..0.010 rows=8 loops=1)
-> CTE Scan on s s_1 (cost=0.00..0.02 rows=1 width=12) (actual time=0.000..0.001 rows=8 loops=1)
Total runtime: 39.079 ms
But if i remove one of co2ta_co_id in query (eq. "1") i get:
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Append (cost=45.28..45.35 rows=2 width=36) (actual time=0.233..0.255 rows=16 loops=1)
CTE s
-> GroupAggregate (cost=45.25..45.28 rows=1 width=57) (actual time=0.230..0.241 rows=8 loops=1)
-> Sort (cost=45.25..45.26 rows=1 width=57) (actual time=0.224..0.225 rows=8 loops=1)
Sort Key: co2ta.co2ta_ta_id, co2ta.co2ta_co_id
Sort Method: quicksort Memory: 25kB
-> Index Scan using cookies2tags_co_id_key2 on cookies2tags co2ta (cost=0.58..45.24 rows=1 width=57) (actual time=0.031..0.215 rows=8 loops=1)
Index Cond: ((co2ta_co_id = ANY ('{123567429,123872617,123929118,123930244,123935996,123937156,123944495,123944999,123945469}'::integer[])) AND (co2ta_cl_id = 97))
Filter: (co2ta_ta_id = ANY ('{142}'::integer[]))
Rows Removed by Filter: 187
-> CTE Scan on s (cost=0.00..0.02 rows=1 width=36) (actual time=0.232..0.244 rows=8 loops=1)
-> HashAggregate (cost=0.03..0.04 rows=1 width=12) (actual time=0.007..0.009 rows=8 loops=1)
-> CTE Scan on s s_1 (cost=0.00..0.02 rows=1 width=12) (actual time=0.001..0.001 rows=8 loops=1)
Total runtime: 0.321 ms
This plan is much faster. I notice that if I put more co2ta_co_id values in query than some threshold PostgreSQL creates unoptimal plan.
I wonder what should I tune, to get PostgreSQL use other index for queries with more co2ta_co_id values in query?
Currently as hotfix I split input values, to execute more queries with less co2ta_co_id values.
--
Piotr Gasidło =?UTF-8?Q?Piotr_Gasid=C5=82o?= <quaker@barbara.eu.org> writes: > [ planner prefers this: ] > -> Index Scan using cookies2tags_key3 on cookies2tags > co2ta (cost=0.57..49.34 rows=1 width=57) (actual time=38.339..38.982 > rows=8 loops=1) > Index Cond: ((co2ta_cl_id = 97) AND (co2ta_ta_id = > ANY ('{142}'::integer[]))) > Filter: (co2ta_co_id = ANY > ('{1,123567429,123872617,123929118,123930244,123935996,123937156,123944495,123944999,123945469}'::integer[])) > Rows Removed by Filter: 32120 > [ over this: ] > -> Index Scan using cookies2tags_co_id_key2 on > cookies2tags co2ta (cost=0.58..45.24 rows=1 width=57) (actual > time=0.031..0.215 rows=8 loops=1) > Index Cond: ((co2ta_co_id = ANY > ('{123567429,123872617,123929118,123930244,123935996,123937156,123944495,123944999,123945469}'::integer[])) > AND (co2ta_cl_id = 97)) > Filter: (co2ta_ta_id = ANY ('{142}'::integer[])) > Rows Removed by Filter: 187 Well, as you can see the planner thinks these are going to cost about the same, but actually the first one fetches a lot of rows that end up getting rejected by the filter condition. That means the co2ta_ta_id condition is somewhat redundant given the other two, much more so than the co2ta_co_id condition is given the other two. If the individual conditions are estimated about right (have you checked?), then that means that this is an artifact of cross-column correlation statistics, which unfortunately Postgres doesn't know anything about. Is there any way of normalizing the data to reduce the cross-column correlations? My other advice would be to simplify and reduce the set of indexes --- IMO someone's gone way overboard with index creation here. It's unlikely that those indexes are all pulling their weight for their maintenance costs, and you can reduce problems with choosing the "wrong" index if that index simply isn't there. On the other hand, if this is a near-read-only table such that having lots of indexes is basically free, you could fix the problem by creating an index on all three columns, which should dominate both of these choices. regards, tom lane