Wrong index selection - Mailing list pgsql-performance

From Piotr Gasidło
Subject Wrong index selection
Date
Msg-id CAF8akQtr9ENiroROUmyRe_gpW5aDf-5WNOaewBSYRizvx6CPtA@mail.gmail.com
Whole thread Raw
Responses Re: Wrong index selection  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
For Postgresql:

> select version();
                                                   version                                                   
-------------------------------------------------------------------------------------------------------------
 PostgreSQL 9.3.2 on amd64-portbld-freebsd9.2, compiled by cc (GCC) 4.2.1 20070831 patched [FreeBSD], 64-bit

For table:

> \d core.cookies2tags
                           Tabela "core.cookies2tags"
        Kolumna        |             Typ             |       Modyfikatory       
-----------------------+-----------------------------+--------------------------
 co2ta_co_id           | integer                     | niepusty
 co2ta_cl_id           | integer                     | niepusty
 co2ta_ta_id           | integer                     | niepusty
 co2ta_ta_ukey_id      | text                        | 
 co2ta_ta_ukey_hash    | character(40)               | 
 co2ta_fpr_id          | integer                     | 
 co2ta_date_first      | timestamp without time zone | niepusty domyślnie now()
 co2ta_date_last       | timestamp without time zone | niepusty domyślnie now()
 co2ta_count_all       | integer                     | niepusty domyślnie 1
 co2ta_count_1         | integer                     | niepusty domyślnie 1
 co2ta_date_1          | date                        | niepusty
 co2ta_datelist_date   | date                        | 
 co2ta_datelist_counts | integer[]                   | 
 co2ta_ta_params       | hstore                      | 
 co2ta_fca_id          | integer                     | 
 co2ta_mco_id          | integer                     | 
Indeksy:
    "cookies2tags_ukey1" UNIQUE, btree (co2ta_co_id, co2ta_cl_id, co2ta_ta_id, co2ta_ta_ukey_hash) WHERE co2ta_ta_ukey_hash IS NOT NULL
    "cookies2tags_ukey2" UNIQUE, btree (co2ta_co_id, co2ta_cl_id, co2ta_ta_id) WHERE co2ta_ta_ukey_hash IS NULL
    "cookies2tags_co_id_key" btree (co2ta_co_id)
    "cookies2tags_co_id_key2" btree (co2ta_co_id, co2ta_cl_id)
    "cookies2tags_key1" btree (co2ta_cl_id, co2ta_ta_id, co2ta_ta_ukey_hash)
    "cookies2tags_key2" btree (co2ta_cl_id, co2ta_ta_ukey_hash) WHERE co2ta_fpr_id IS NULL AND (co2ta_ta_id = ANY (ARRAY[1, 2, 3, 4]))
    "cookies2tags_key3" btree (co2ta_cl_id, co2ta_ta_id, co2ta_date_1)
    "cookies2tags_key4" btree (co2ta_mco_id)
    "idx_co_id_date_last" btree (co2ta_co_id, co2ta_date_last)

Table is rather big (about 150M rows).

For this query:

WITH s AS (
  SELECT 
    co2ta_co_id AS co_id,
    co2ta_ta_id AS ta_id,
    MIN(co2ta_date_last) AS co2ta_date_last_min,
    MAX(co2ta_date_last) AS co2ta_date_last_max,
    COUNT(DISTINCT(co2ta_ta_ukey_hash)) AS co2ta_ta_ukey_count,
    1
  FROM
    core.cookies2tags co2ta
  WHERE 
    co2ta.co2ta_co_id = ANY('{"1","123567429","123872617","123929118","123930244","123935996","123937156","123944495","123944999","123945469"}'::int[]) AND
    co2ta.co2ta_cl_id = 97 AND
    co2ta.co2ta_ta_id = ANY('{"142"}'::int[])
  GROUP BY 
    ta_id,
    co_id
SELECT
  *
FROM 
  s
UNION ALL
SELECT
  s.co_id,
  NULL,
  MIN(s.co2ta_date_last_min),
  MAX(s.co2ta_date_last_min),
  NULL,
  1
FROM 
  s
GROUP BY
  s.co_id

i get following plan:

                                                                          QUERY PLAN                                                                           
----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Append  (cost=49.38..49.44 rows=2 width=36) (actual time=39.009..39.034 rows=16 loops=1)
   CTE s
     ->  GroupAggregate  (cost=49.35..49.38 rows=1 width=57) (actual time=39.006..39.016 rows=8 loops=1)
           ->  Sort  (cost=49.35..49.35 rows=1 width=57) (actual time=38.993..38.993 rows=8 loops=1)
                 Sort Key: co2ta.co2ta_ta_id, co2ta.co2ta_co_id
                 Sort Method: quicksort  Memory: 25kB
                 ->  Index Scan using cookies2tags_key3 on cookies2tags co2ta  (cost=0.57..49.34 rows=1 width=57) (actual time=38.339..38.982 rows=8 loops=1)
                       Index Cond: ((co2ta_cl_id = 97) AND (co2ta_ta_id = ANY ('{142}'::integer[])))
                       Filter: (co2ta_co_id = ANY ('{1,123567429,123872617,123929118,123930244,123935996,123937156,123944495,123944999,123945469}'::integer[]))
                       Rows Removed by Filter: 32120
   ->  CTE Scan on s  (cost=0.00..0.02 rows=1 width=36) (actual time=39.008..39.021 rows=8 loops=1)
   ->  HashAggregate  (cost=0.03..0.04 rows=1 width=12) (actual time=0.009..0.010 rows=8 loops=1)
         ->  CTE Scan on s s_1  (cost=0.00..0.02 rows=1 width=12) (actual time=0.000..0.001 rows=8 loops=1)
 Total runtime: 39.079 ms

But if i remove one of co2ta_co_id in query (eq. "1") i get:

                                                                                        QUERY PLAN                                                                                         
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Append  (cost=45.28..45.35 rows=2 width=36) (actual time=0.233..0.255 rows=16 loops=1)
   CTE s
     ->  GroupAggregate  (cost=45.25..45.28 rows=1 width=57) (actual time=0.230..0.241 rows=8 loops=1)
           ->  Sort  (cost=45.25..45.26 rows=1 width=57) (actual time=0.224..0.225 rows=8 loops=1)
                 Sort Key: co2ta.co2ta_ta_id, co2ta.co2ta_co_id
                 Sort Method: quicksort  Memory: 25kB
                 ->  Index Scan using cookies2tags_co_id_key2 on cookies2tags co2ta  (cost=0.58..45.24 rows=1 width=57) (actual time=0.031..0.215 rows=8 loops=1)
                       Index Cond: ((co2ta_co_id = ANY ('{123567429,123872617,123929118,123930244,123935996,123937156,123944495,123944999,123945469}'::integer[])) AND (co2ta_cl_id = 97))
                       Filter: (co2ta_ta_id = ANY ('{142}'::integer[]))
                       Rows Removed by Filter: 187
   ->  CTE Scan on s  (cost=0.00..0.02 rows=1 width=36) (actual time=0.232..0.244 rows=8 loops=1)
   ->  HashAggregate  (cost=0.03..0.04 rows=1 width=12) (actual time=0.007..0.009 rows=8 loops=1)
         ->  CTE Scan on s s_1  (cost=0.00..0.02 rows=1 width=12) (actual time=0.001..0.001 rows=8 loops=1)
 Total runtime: 0.321 ms

This plan is much faster. I notice that if I put more co2ta_co_id values in query than some threshold PostgreSQL creates unoptimal plan.

I wonder what should I tune, to get PostgreSQL use other index for queries with more co2ta_co_id values in query?
Currently as hotfix I split input values, to execute more queries with less co2ta_co_id values.

-- 
Piotr Gasidło

pgsql-performance by date:

Previous
From: Igor Neyman
Date:
Subject: Re: Issue with query scanning through all data even with indexes
Next
From: Tom Lane
Date:
Subject: Re: Wrong index selection