Thread: Partition pruning on parameters grouped into an array does not prune properly

Hi,

Working on improving partition pruning [1] I found a case that I may 
treat like a bug. I mean the case with expression on a partitioning 
column like:

id = ARRAY[$1,$2]

Basically, pruning on ARRAY[$1,$2] works in the case of single column 
(see correct-pruning-example.sql in attachment):

PREPARE test (int, int) AS
   SELECT * FROM array_prune WHERE id = ANY(ARRAY[$1,$2]);
EXPLAIN (COSTS OFF) EXECUTE test(1,2);

  Append
    Subplans Removed: 1
    ->  Seq Scan on array_prune_t0 array_prune_1
          Filter: (id = ANY (ARRAY[$1, $2]))

But if we partition on HASH(x,y) it is not working (see 
incorrect-pruning-example.sql):

PREPARE test2 (int,int) AS
  SELECT 1 FROM array_prune
  WHERE id1 = ANY(ARRAY[$1]) AND id2 = ANY(ARRAY[$2]);
EXPLAIN (COSTS OFF) EXECUTE test2(1,-1);

  Append
    ->  Seq Scan on array_prune_t0 array_prune_1
          Filter: ((id1 = ANY (ARRAY[$1])) AND (id2 = ANY (ARRAY[$2])))
    ->  Seq Scan on array_prune_t1 array_prune_2
          Filter: ((id1 = ANY (ARRAY[$1])) AND (id2 = ANY (ARRAY[$2])))

Although its analogue works nice:

PREPARE test3 (int,int) AS
  SELECT 1 FROM array_prune
  WHERE id1 = $1 AND id2 = $2;
EXPLAIN (COSTS OFF) EXECUTE test3(1,-1);

  Append
    Subplans Removed: 1
    ->  Seq Scan on array_prune_t0 array_prune_1
          Filter: ((id1 = $1) AND (id2 = $2))

So, before diving into the partitioning depths, someone may quickly say 
it is not a bug, but I am missing something. Some hidden semantics?


[1] Prune partitions by ScalarArrayOpExpr with an array parameter 
(partkey = ANY($1))
https://www.postgresql.org/message-id/b8cdd20f-b34b-42b9-8c7c-dae864b7b3b2@gmail.com

-- 
regards, Andrei Lepikhov

Attachment
On Thu, 27 Mar 2025 at 04:19, Andrei Lepikhov <lepihov@gmail.com> wrote:
> But if we partition on HASH(x,y) it is not working (see
> incorrect-pruning-example.sql):
>
> PREPARE test2 (int,int) AS
>   SELECT 1 FROM array_prune
>   WHERE id1 = ANY(ARRAY[$1]) AND id2 = ANY(ARRAY[$2]);
> EXPLAIN (COSTS OFF) EXECUTE test2(1,-1);
>
>   Append
>     ->  Seq Scan on array_prune_t0 array_prune_1
>           Filter: ((id1 = ANY (ARRAY[$1])) AND (id2 = ANY (ARRAY[$2])))
>     ->  Seq Scan on array_prune_t1 array_prune_2
>           Filter: ((id1 = ANY (ARRAY[$1])) AND (id2 = ANY (ARRAY[$2])))

It is a bug.  This is down to how match_clause_to_partition_key()
handles ScalarArrayOpExpr.  To save some complexity in the handling of
ScalarArrayOpExpr, these get transformed into OpExprs, one for each
item in the ScalarArrayOpExpr.  Look for the call to make_opclause()
in match_clause_to_partition_key(). Just a few lines down, you see
that we recursively call gen_partprune_steps_internal() to pass down
the OpExprs that we just generated.  The problem is that the recursive
call only contains the OpExprs generated for one of the
ScalarArrayOpExpr, gen_prune_steps_from_opexps() requires equality
quals (or at least an key IS NULL qual) for all partitioned keys for
hash partitioning, otherwise it'll bail out on the following:

if (part_scheme->strategy == PARTITION_STRATEGY_HASH &&
clauselist == NIL && !bms_is_member(i, nullkeys))
    return NIL;

I wonder if we need to redesign this to not do that recursive
processing and instead have it so match_clause_to_partition_key() can
generate multiple PartClauseInfos. If we've matched to the
ScalarArrayOpExpr then I think each generated PartClauseInfo should
have the same PartClauseMatchStatus. That would also get rid of the
(kinda silly) overhead we have of having to match the
ScalarArrayOpExpr to the partition key, then generating OpExprs and
having to match those again, even though we know they will match.

I suspect the fix for this might be a bit invasive to backpatch. Maybe
it's something we can give a bit more clear thought to after the
freeze is over.

David



On 3/27/25 01:58, David Rowley wrote:
> I wonder if we need to redesign this to not do that recursive
> processing and instead have it so match_clause_to_partition_key() can
> generate multiple PartClauseInfos. If we've matched to the
> ScalarArrayOpExpr then I think each generated PartClauseInfo should
> have the same PartClauseMatchStatus. That would also get rid of the
> (kinda silly) overhead we have of having to match the
> ScalarArrayOpExpr to the partition key, then generating OpExprs and
> having to match those again, even though we know they will match.
> 
> I suspect the fix for this might be a bit invasive to backpatch. Maybe
> it's something we can give a bit more clear thought to after the
> freeze is over.
Thank you for the explanation!

Why does the pruning machinery only include the OpExpr pruning 
operation? Often, when preparing for pruning steps, we don’t know the 
exact number of values we will encounter during the initial or execution 
pruning stages. I believe it would be beneficial to have an iterator - 
something similar to the predicate_implied_by function - that can 
iteratively extract values from an array. This would allow pruning in 
practical scenarios, such as the following:

CREATE OR REPLACE FUNCTION some_business_logic(val integer)
RETURNS integer[] AS $$
BEGIN
  IF txid_current() % 2 = 0 THEN
    RETURN ARRAY[val];
  ELSE
    RETURN ARRAY[val + 1];
  END IF;
END;
$$ LANGUAGE plpgsql STRICT STABLE;

PREPARE test (int) AS
   SELECT * FROM array_prune
   WHERE id = ANY (some_business_logic($1));
EXPLAIN (ANALYZE, COSTS OFF) EXECUTE test(1);

Also in that case we wouldn't need to decompose a ScalarArrayOpExpr to 
the list of OpExpr clauses to prune partitions.
-- 
regards, Andrei Lepikhov



On 3/27/25 01:58, David Rowley wrote:
> I suspect the fix for this might be a bit invasive to backpatch. Maybe
> it's something we can give a bit more clear thought to after the
> freeze is over.
One more issue I think may be addressed (or just considered) here is the 
following:

CREATE TABLE parted (a int, b int) PARTITION BY RANGE (a, b);

CREATE TABLE part1 PARTITION OF parted
   FOR VALUES FROM (1, 1) TO (1, 10);
CREATE TABLE part2 PARTITION OF parted
   FOR VALUES FROM (2, 1) TO (2, 10);

INSERT INTO parted (VALUES (1, 2));
INSERT INTO parted VALUES (2, 2);

EXPLAIN (COSTS OFF)
SELECT * FROM parted WHERE a > 1 AND b < 1;
EXPLAIN (COSTS OFF)
SELECT * FROM parted WHERE a > 1 AND b > 10;

/*
  Seq Scan on part2 parted
    Filter: ((a > 1) AND (b < 1))
  Seq Scan on part2 parted
    Filter: ((a > 1) AND (b > 10))
*/

I think partition part2 could be pruned like in the following example:

EXPLAIN (COSTS OFF)
SELECT * FROM parted WHERE a > 2 AND b > 10;

/*
  Result
    One-Time Filter: false
*/

-- 
regards, Andrei Lepikhov