Re: [BUGS] BUG #14565: query planner does not use partial index inpartiton if query is performed on multiple partitions - Mailing list pgsql-bugs

From Zbigniew Szot
Subject Re: [BUGS] BUG #14565: query planner does not use partial index inpartiton if query is performed on multiple partitions
Date
Msg-id cc8da8c64abce6d7137fa90fbbf737e7@softiq.pl
Whole thread Raw
In response to Re: [BUGS] BUG #14565: query planner does not use partial index in partiton if query is performed on multiple partitions  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs
On 2017-02-24 06:41, Tom Lane wrote:
> Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> writes:
>> On 2017/02/23 20:10, zbigniew.szot@softiq.pl wrote:
>>> -- this one DOES NOT use  partial_not_working_4 .. bug or feature ? 
>>> ;-)
>>> explain select * from test_table where chec_key  in
>>> ('4400df00-0000-4000-a000-000000000000'
>>> ,'1400df00-0000-4000-a000-000000000000')and some_date <'2015-11-02';
> 
>> Not a bug, I'd think.
> 
> After looking at this more closely, I think the OP is probably wishing
> that the planner would consider a BitmapOr plan on two different 
> partial
> indexes.  You can get it to consider that if the query is phrased as an
> OR, but not when it's written like this with IN (which will get 
> converted
> to an "= ANY(ARRAY[])" condition).
> 
> Trivial example:
> 
> regression=# create table foo (f1 int, f2 text);
> CREATE TABLE
> regression=# create index on foo(f1) where f1 >= 0 and f1 < 10;
> CREATE INDEX
> regression=# create index on foo(f1) where f1 >= 10 and f1 < 20;
> CREATE INDEX
> regression=# explain select * from foo where f1 in (7, 11);
>                       QUERY PLAN
> ------------------------------------------------------
>  Seq Scan on foo  (cost=0.00..25.88 rows=13 width=36)
>    Filter: (f1 = ANY ('{7,11}'::integer[]))
> (2 rows)
> 
> regression=# explain select * from foo where f1 = 7 or f1 = 11;
>                                    QUERY PLAN
> --------------------------------------------------------------------------------
>  Bitmap Heap Scan on foo  (cost=8.27..19.00 rows=13 width=36)
>    Recheck Cond: ((f1 = 7) OR (f1 = 11))
>    ->  BitmapOr  (cost=8.27..8.27 rows=13 width=0)
>          ->  Bitmap Index Scan on foo_f1_idx  (cost=0.00..4.13 rows=6 
> width=0)
>                Index Cond: (f1 = 7)
>          ->  Bitmap Index Scan on foo_f1_idx1  (cost=0.00..4.13 rows=6 
> width=0)
>                Index Cond: (f1 = 11)
> (7 rows)
> 
> You could certainly claim it's a bug that these two phrasings of the 
> query
> aren't treated 100% identically, but I'd tell you to get lost.  The IN
> planning code is designed to handle fairly large numbers of IN items
> without planner performance going into the toilet; it's not practical
> for it to consider a different index for each item.
> 
> The underlying reason why I'm not very excited about this issue is that
> I think the above-depicted index design is fundamentally stupid anyway.
> It's much simpler, both for you and for the planner, just to make one
> non-partial index on the whole range of f1.  And I know of no reason to
> believe that multiple partial indexes would outperform that design for
> any ordinary workload.
> 
>             regards, tom lane

The thing that you are all (not) saying  and consider obvious is that 
indexes  created on partitions are considered by a query planer as if 
they were set on "main" table -in this case you are right t his query 
does not hit the index and sequence scan is the only option... but well 
... in fact it is not true ;-)
It's of course arguable if such a behaviour is a bug, but I find it sub 
optimal (for really big tables specially).

1)

You are right I made an assumption that in(..) with less then 10 
elements is evaluated to sequence of OR's (this is a redshift feature.. 
since redshift is based on postgresql I thought it's also a postgres one 
.. I was wrong). This is more like performance feature rather then bug 
anyway ;-)
But still.. This is not the problem.

2)
I made another more important  assumption ( and possibly also wrong) 
that if table is partitioned (means its technical set of tables) then 
the where condition  is  "split" and redistributed to partitions and 
merged later on.
In this case this would be sort of
with temp as (select * from test_table_1 where chec_key  = 
'1400df00-0000-4000-a000-000000000000' union select *  from  
test_table_4 where chec_key = '4400df00-0000-4000-a000-000000000000')
being so the  second part is straight shooter for  partial_not_working_4 
and the first part  hits test_table_1_brinn (this also could benefit 
from parallel processing anyway...)

If I don't drop test_table_4_brinn index then the query is hitting 
test_table_1_brinn and test_table4_brinn which is exactly what expected.

As I assume from yours post it hits those indexes just because they are 
condition less / non partial and operate on columns from where clause - 
so they are valid for a whole where clause. not just the partition 
specific as I thought.

background of "why to do that" :

What I was trying to achieve is sort of "partition the partition" since 
test_table_4 has grown big enough to make indexes choking..  I decided 
to split indexes.
This is (was)  an alternative to reparation the table - repartition => 
maintenance break => CEO heart attack ;-) .. and indexes can be build on 
a fly.

Ps.

I'm more Dev then Op ;-)

-- 
ZBIGNIEW SZOT



-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

pgsql-bugs by date:

Previous
From: henti@geekware.co.za
Date:
Subject: [BUGS] BUG #14567: Overriding PGDATA during initdb always fails
Next
From: xdl@logossmartcard.com
Date:
Subject: [BUGS] BUG #14568: timezone WIT is not support