Estimates on partial index - Mailing list pgsql-performance

From Victor Yegorov
Subject Estimates on partial index
Date
Msg-id CAGnEbohwiUPY1-x6=u4bJuAvnhr1oNYP_oTpWOPVcdCsyisjyw@mail.gmail.com
Whole thread Raw
Responses Re: Estimates on partial index
Re: Estimates on partial index
List pgsql-performance
Greetings.

I have a question on why planner chooses `IndexScan` for the following query:

    SELECT la.loan_id, la.due_date, la.is_current
      FROM loan_agreements la WHERE la.is_current AND '2016-08-11' > la.due_date;

Relevant (cannot post it all, sorry) table definition is:

                Column                        Type             Modifiers
    ------------------------------ --------------------------- ---------
    id                             bigint                      not null
    ...
    is_current                     boolean                     not null
    due_date                       date                        not null
    loan_id                        bigint
    
    Indexes:
        "loan_agreements_pkey" PRIMARY KEY, btree (id)
        ...
        "idx_loan_agreements_due_date" btree (due_date)
        "idx_loan_agreemnets_loan_id_cond_is_current_true" btree (loan_id) WHERE is_current = true

Some stats:
    SELECT relname,reltuples::numeric,relpages FROM pg_class WHERE oid IN ('loan_agreements'::regclass, 'idx_loan_agreemnets_loan_id_cond_is_current_true'::regclass, 'idx_loan_agreements_due_date'::regclass);
                    relname                      reltuples relpages
    ------------------------------------------------ --------- --------
    idx_loan_agreements_due_date                        664707     1828
    idx_loan_agreemnets_loan_id_cond_is_current_true    237910      655
    loan_agreements                                     664707    18117


Settings:

    SELECT name,setting,unit FROM pg_settings WHERE name ~ '(buffers|mem|cost)$';
            name         setting  unit
    -------------------- -------- ----
    autovacuum_work_mem  524288   kB
    cpu_index_tuple_cost 0.005    ¤
    cpu_operator_cost    0.0025   ¤
    cpu_tuple_cost       0.01     ¤
    maintenance_work_mem 16777216 kB
    random_page_cost     2.5      ¤
    seq_page_cost        1        ¤
    shared_buffers       1572864  8kB
    temp_buffers         8192     8kB
    wal_buffers          2048     8kB
    work_mem             65536    kB
    PostgreSQL 9.4.8 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-16), 64-bit

Planner chooses the following plan:
                                                                                         QUERY PLAN
    ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    Index Scan using idx_loan_agreemnets_loan_id_cond_is_current_true on loan_agreements la      (cost=0.42..16986.53 rows=226145 width=13) (actual time=0.054..462.394 rows=216530 loops=1)
      Filter: ('2016-08-11'::date > due_date)
      Rows Removed by Filter: 21304
      Buffers: shared hit=208343 read=18399
    Planning time: 0.168 ms
    Execution time: 479.773 ms

If I disable IndexScans, plan changes likes this:
                                                                              QUERY PLAN
    ----------------------------------------------------------------------------------------------------------------------------------------------------------------------
    Bitmap Heap Scan on loan_agreements la  (cost=2884.01..23974.88 rows=226145 width=13) (actual time=38.893..200.376 rows=216530 loops=1)
      Recheck Cond: is_current
      Filter: ('2016-08-11'::date > due_date)
      Rows Removed by Filter: 21304
      Heap Blocks: exact=18117
      Buffers: shared hit=18212 read=557
      ->  Bitmap Index Scan on idx_loan_agreemnets_loan_id_cond_is_current_true  (cost=0.00..2827.47 rows=237910 width=0) (actual time=35.166..35.166 rows=237853 loops=1)
            Buffers: shared hit=119 read=533
    Planning time: 0.171 ms
    Execution time: 214.341 ms

Question is — why IndexScan over partial index is estimated less than BitmapHeap + BitmapIndex scan. And how can I tell Planner, that IndexScan over 1/3 of table is not a good thing — IndexScan is touching 10x more pages and in a typical situation those are cold.

Thanks in advance.

--
Victor Y. Yegorov

pgsql-performance by date:

Previous
From: pinker
Date:
Subject: Big data INSERT optimization - ExclusiveLock on extension of the table
Next
From: Tom Lane
Date:
Subject: Re: Estimates on partial index