Thread: Wrong index selection

Wrong index selection

From
Piotr Gasidło
Date:
For Postgresql:

> select version();
                                                   version                                                   
-------------------------------------------------------------------------------------------------------------
 PostgreSQL 9.3.2 on amd64-portbld-freebsd9.2, compiled by cc (GCC) 4.2.1 20070831 patched [FreeBSD], 64-bit

For table:

> \d core.cookies2tags
                           Tabela "core.cookies2tags"
        Kolumna        |             Typ             |       Modyfikatory       
-----------------------+-----------------------------+--------------------------
 co2ta_co_id           | integer                     | niepusty
 co2ta_cl_id           | integer                     | niepusty
 co2ta_ta_id           | integer                     | niepusty
 co2ta_ta_ukey_id      | text                        | 
 co2ta_ta_ukey_hash    | character(40)               | 
 co2ta_fpr_id          | integer                     | 
 co2ta_date_first      | timestamp without time zone | niepusty domyślnie now()
 co2ta_date_last       | timestamp without time zone | niepusty domyślnie now()
 co2ta_count_all       | integer                     | niepusty domyślnie 1
 co2ta_count_1         | integer                     | niepusty domyślnie 1
 co2ta_date_1          | date                        | niepusty
 co2ta_datelist_date   | date                        | 
 co2ta_datelist_counts | integer[]                   | 
 co2ta_ta_params       | hstore                      | 
 co2ta_fca_id          | integer                     | 
 co2ta_mco_id          | integer                     | 
Indeksy:
    "cookies2tags_ukey1" UNIQUE, btree (co2ta_co_id, co2ta_cl_id, co2ta_ta_id, co2ta_ta_ukey_hash) WHERE co2ta_ta_ukey_hash IS NOT NULL
    "cookies2tags_ukey2" UNIQUE, btree (co2ta_co_id, co2ta_cl_id, co2ta_ta_id) WHERE co2ta_ta_ukey_hash IS NULL
    "cookies2tags_co_id_key" btree (co2ta_co_id)
    "cookies2tags_co_id_key2" btree (co2ta_co_id, co2ta_cl_id)
    "cookies2tags_key1" btree (co2ta_cl_id, co2ta_ta_id, co2ta_ta_ukey_hash)
    "cookies2tags_key2" btree (co2ta_cl_id, co2ta_ta_ukey_hash) WHERE co2ta_fpr_id IS NULL AND (co2ta_ta_id = ANY (ARRAY[1, 2, 3, 4]))
    "cookies2tags_key3" btree (co2ta_cl_id, co2ta_ta_id, co2ta_date_1)
    "cookies2tags_key4" btree (co2ta_mco_id)
    "idx_co_id_date_last" btree (co2ta_co_id, co2ta_date_last)

Table is rather big (about 150M rows).

For this query:

WITH s AS (
  SELECT 
    co2ta_co_id AS co_id,
    co2ta_ta_id AS ta_id,
    MIN(co2ta_date_last) AS co2ta_date_last_min,
    MAX(co2ta_date_last) AS co2ta_date_last_max,
    COUNT(DISTINCT(co2ta_ta_ukey_hash)) AS co2ta_ta_ukey_count,
    1
  FROM
    core.cookies2tags co2ta
  WHERE 
    co2ta.co2ta_co_id = ANY('{"1","123567429","123872617","123929118","123930244","123935996","123937156","123944495","123944999","123945469"}'::int[]) AND
    co2ta.co2ta_cl_id = 97 AND
    co2ta.co2ta_ta_id = ANY('{"142"}'::int[])
  GROUP BY 
    ta_id,
    co_id
SELECT
  *
FROM 
  s
UNION ALL
SELECT
  s.co_id,
  NULL,
  MIN(s.co2ta_date_last_min),
  MAX(s.co2ta_date_last_min),
  NULL,
  1
FROM 
  s
GROUP BY
  s.co_id

i get following plan:

                                                                          QUERY PLAN                                                                           
----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Append  (cost=49.38..49.44 rows=2 width=36) (actual time=39.009..39.034 rows=16 loops=1)
   CTE s
     ->  GroupAggregate  (cost=49.35..49.38 rows=1 width=57) (actual time=39.006..39.016 rows=8 loops=1)
           ->  Sort  (cost=49.35..49.35 rows=1 width=57) (actual time=38.993..38.993 rows=8 loops=1)
                 Sort Key: co2ta.co2ta_ta_id, co2ta.co2ta_co_id
                 Sort Method: quicksort  Memory: 25kB
                 ->  Index Scan using cookies2tags_key3 on cookies2tags co2ta  (cost=0.57..49.34 rows=1 width=57) (actual time=38.339..38.982 rows=8 loops=1)
                       Index Cond: ((co2ta_cl_id = 97) AND (co2ta_ta_id = ANY ('{142}'::integer[])))
                       Filter: (co2ta_co_id = ANY ('{1,123567429,123872617,123929118,123930244,123935996,123937156,123944495,123944999,123945469}'::integer[]))
                       Rows Removed by Filter: 32120
   ->  CTE Scan on s  (cost=0.00..0.02 rows=1 width=36) (actual time=39.008..39.021 rows=8 loops=1)
   ->  HashAggregate  (cost=0.03..0.04 rows=1 width=12) (actual time=0.009..0.010 rows=8 loops=1)
         ->  CTE Scan on s s_1  (cost=0.00..0.02 rows=1 width=12) (actual time=0.000..0.001 rows=8 loops=1)
 Total runtime: 39.079 ms

But if i remove one of co2ta_co_id in query (eq. "1") i get:

                                                                                        QUERY PLAN                                                                                         
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Append  (cost=45.28..45.35 rows=2 width=36) (actual time=0.233..0.255 rows=16 loops=1)
   CTE s
     ->  GroupAggregate  (cost=45.25..45.28 rows=1 width=57) (actual time=0.230..0.241 rows=8 loops=1)
           ->  Sort  (cost=45.25..45.26 rows=1 width=57) (actual time=0.224..0.225 rows=8 loops=1)
                 Sort Key: co2ta.co2ta_ta_id, co2ta.co2ta_co_id
                 Sort Method: quicksort  Memory: 25kB
                 ->  Index Scan using cookies2tags_co_id_key2 on cookies2tags co2ta  (cost=0.58..45.24 rows=1 width=57) (actual time=0.031..0.215 rows=8 loops=1)
                       Index Cond: ((co2ta_co_id = ANY ('{123567429,123872617,123929118,123930244,123935996,123937156,123944495,123944999,123945469}'::integer[])) AND (co2ta_cl_id = 97))
                       Filter: (co2ta_ta_id = ANY ('{142}'::integer[]))
                       Rows Removed by Filter: 187
   ->  CTE Scan on s  (cost=0.00..0.02 rows=1 width=36) (actual time=0.232..0.244 rows=8 loops=1)
   ->  HashAggregate  (cost=0.03..0.04 rows=1 width=12) (actual time=0.007..0.009 rows=8 loops=1)
         ->  CTE Scan on s s_1  (cost=0.00..0.02 rows=1 width=12) (actual time=0.001..0.001 rows=8 loops=1)
 Total runtime: 0.321 ms

This plan is much faster. I notice that if I put more co2ta_co_id values in query than some threshold PostgreSQL creates unoptimal plan.

I wonder what should I tune, to get PostgreSQL use other index for queries with more co2ta_co_id values in query?
Currently as hotfix I split input values, to execute more queries with less co2ta_co_id values.

-- 
Piotr Gasidło

Re: Wrong index selection

From
Tom Lane
Date:
=?UTF-8?Q?Piotr_Gasid=C5=82o?= <quaker@barbara.eu.org> writes:
> [ planner prefers this: ]

>                  ->  Index Scan using cookies2tags_key3 on cookies2tags
> co2ta  (cost=0.57..49.34 rows=1 width=57) (actual time=38.339..38.982
> rows=8 loops=1)
>                        Index Cond: ((co2ta_cl_id = 97) AND (co2ta_ta_id =
> ANY ('{142}'::integer[])))
>                        Filter: (co2ta_co_id = ANY
> ('{1,123567429,123872617,123929118,123930244,123935996,123937156,123944495,123944999,123945469}'::integer[]))
>                        Rows Removed by Filter: 32120

> [ over this: ]

>                  ->  Index Scan using cookies2tags_co_id_key2 on
> cookies2tags co2ta  (cost=0.58..45.24 rows=1 width=57) (actual
> time=0.031..0.215 rows=8 loops=1)
>                        Index Cond: ((co2ta_co_id = ANY
> ('{123567429,123872617,123929118,123930244,123935996,123937156,123944495,123944999,123945469}'::integer[]))
> AND (co2ta_cl_id = 97))
>                        Filter: (co2ta_ta_id = ANY ('{142}'::integer[]))
>                        Rows Removed by Filter: 187

Well, as you can see the planner thinks these are going to cost about the
same, but actually the first one fetches a lot of rows that end up getting
rejected by the filter condition.  That means the co2ta_ta_id condition
is somewhat redundant given the other two, much more so than the
co2ta_co_id condition is given the other two.  If the individual
conditions are estimated about right (have you checked?), then that means
that this is an artifact of cross-column correlation statistics, which
unfortunately Postgres doesn't know anything about.  Is there any way of
normalizing the data to reduce the cross-column correlations?

My other advice would be to simplify and reduce the set of indexes ---
IMO someone's gone way overboard with index creation here.  It's unlikely
that those indexes are all pulling their weight for their maintenance
costs, and you can reduce problems with choosing the "wrong" index if
that index simply isn't there.

On the other hand, if this is a near-read-only table such that having lots
of indexes is basically free, you could fix the problem by creating an
index on all three columns, which should dominate both of these choices.

            regards, tom lane