Re: partition pruning doesn't work with IS NULL clause in multikeyrange partition case - Mailing list pgsql-hackers

From Dilip Kumar
Subject Re: partition pruning doesn't work with IS NULL clause in multikeyrange partition case
Date
Msg-id CAFiTN-vNtLh_n7Dw7GfEidjQ78ThapgOUmof9uct=TwEb_XNAg@mail.gmail.com
Whole thread Raw
In response to partition pruning doesn't work with IS NULL clause in multikey rangepartition case  (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>)
Responses Re: partition pruning doesn't work with IS NULL clause in multikeyrange partition case  (Dilip Kumar <dilipbalaut@gmail.com>)
List pgsql-hackers
On Wed, Jul 11, 2018 at 3:06 PM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
> Hi,
> Consider following test case.
> create table prt (a int, b int, c int) partition by range(a, b);
> create table prt_p1 partition of prt for values (0, 0) to (100, 100);
> create table prt_p1 partition of prt for values from (0, 0) to (100, 100);
> create table prt_p2 partition of prt for values from (100, 100) to (200, 200);
> create table prt_def partition of prt default;
>
> In a range partitioned table, a row with any partition key NULL goes
> to the default partition if it exists.
> insert into prt values (null, 1);
> insert into prt values (1, null);
> insert into prt values (null, null);
> select tableoid::regclass, * from prt;
>  tableoid | a | b | c
> ----------+---+---+---
>  prt_def  |   | 1 |
>  prt_def  | 1 |   |
>  prt_def  |   |   |
> (3 rows)
>
> There's a comment in get_partition_for_tuple(), which says so.
> /*
>  * No range includes NULL, so this will be accepted by the
>  * default partition if there is one, and otherwise rejected.
>  */
>
> But when there is IS NULL clause on any of the partition keys with
> some condition on other partition key, all the partitions scanned. I
> expected pruning to prune all the partitions except the default one.
>
> explain verbose select * from prt where a is null and b = 100;
>                               QUERY PLAN
> ----------------------------------------------------------------------
>  Append  (cost=0.00..106.52 rows=3 width=12)
>    ->  Seq Scan on public.prt_p1  (cost=0.00..35.50 rows=1 width=12)
>          Output: prt_p1.a, prt_p1.b, prt_p1.c
>          Filter: ((prt_p1.a IS NULL) AND (prt_p1.b = 100))
>    ->  Seq Scan on public.prt_p2  (cost=0.00..35.50 rows=1 width=12)
>          Output: prt_p2.a, prt_p2.b, prt_p2.c
>          Filter: ((prt_p2.a IS NULL) AND (prt_p2.b = 100))
>    ->  Seq Scan on public.prt_def  (cost=0.00..35.50 rows=1 width=12)
>          Output: prt_def.a, prt_def.b, prt_def.c
>          Filter: ((prt_def.a IS NULL) AND (prt_def.b = 100))
> (10 rows)
>
> I thought that the following code in get_matching_range_bounds()
>     /*
>      * If there are no datums to compare keys with, or if we got an IS NULL
>      * clause just return the default partition, if it exists.
>      */
>     if (boundinfo->ndatums == 0 || !bms_is_empty(nullkeys))
>     {
>         result->scan_default = partition_bound_has_default(boundinfo);
>         return result;
>     }
>
> would do the trick but through the debugger I saw that nullkeys is
> NULL for this query.
>
> I didn't investigate further to see why nullkeys is NULL, but it looks
> like that's the problem and we are missing an optimization.


I think the problem is that the gen_partprune_steps_internal expect
that all the keys should match to IS NULL clause, only in such case
the "partition pruning step" will store the nullkeys.

After a small change, it is able to prune the partitions.

--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -857,7 +857,7 @@
gen_partprune_steps_internal(GeneratePruningStepsContext *context,
         * If generate_opsteps is set to false it means no OpExprs were directly
         * present in the input list.
         */
-       if (!generate_opsteps)
+       if (nullkeys || !generate_opsteps)
        {
                /*
                 * Generate one prune step for the information derived
from IS NULL,
@@ -865,8 +865,7 @@
gen_partprune_steps_internal(GeneratePruningStepsContext *context,
                 * clauses for all partition keys.
                 */
                if (!bms_is_empty(nullkeys) &&
-                       (part_scheme->strategy != PARTITION_STRATEGY_HASH ||
-                        bms_num_members(nullkeys) == part_scheme->partnatts))
+                       (part_scheme->strategy != PARTITION_STRATEGY_HASH))
                {
                        PartitionPruneStep *step;

postgres=# explain verbose select * from prt where a is null and b = 100;
                              QUERY PLAN
----------------------------------------------------------------------
 Append  (cost=0.00..35.51 rows=1 width=12)
   ->  Seq Scan on public.prt_def  (cost=0.00..35.50 rows=1 width=12)
         Output: prt_def.a, prt_def.b, prt_def.c
         Filter: ((prt_def.a IS NULL) AND (prt_def.b = 100))
(4 rows)

Above fix is just to show the root cause of the issue, I haven't
investigated that what should be the exact fix for this issue.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Andrew Gierth
Date:
Subject: Re: Problem with tupdesc in jsonb_to_recordset
Next
From: Alexander Korotkov
Date:
Subject: Re: Jsonb transform for pl/python