Re: [HACKERS] path toward faster partition pruning - Mailing list pgsql-hackers

From Amit Langote
Subject Re: [HACKERS] path toward faster partition pruning
Date
Msg-id 21e00a6a-1e8d-d866-ae40-07c2058512de@lab.ntt.co.jp
Whole thread Raw
In response to Re: [HACKERS] path toward faster partition pruning  (Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>)
Responses Re: [HACKERS] path toward faster partition pruning  (Jesper Pedersen <jesper.pedersen@redhat.com>)
Re: [HACKERS] path toward faster partition pruning  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
Re: [HACKERS] path toward faster partition pruning  (David Rowley <david.rowley@2ndquadrant.com>)
List pgsql-hackers
On 2018/03/23 21:19, Amit Langote wrote:
> On 2018/03/21 6:29, Robert Haas wrote:
>> +    /*
>> +     * If there are multiple pruning steps, we perform them one after another,
>> +     * passing the result of one step as input to another.  Based on the type
>> +     * of pruning step, perform_pruning_step may add or remove partitions from
>> +     * the set of partitions it receives as the input.
>> +     */
>>
>> The comment sounds great, but the code doesn't work that way; it
>> always calls bms_int_members to intersect the new result with any
>> previous result.  I'm baffled as to how this manages to DTRT if
>> COMBINE_OR is used.  In general I had hoped that the list of pruning
>> steps was something over which we were only going to iterate, not
>> recurse.  This definitely recurses for the combine steps, but it's
>> still (sorta) got the idea of a list of iterable steps.  That's a
>> weird mix.
> 
> At the top-level (in get_matching_partitions), it is assumed that the
> steps in the input list come from implicitly AND'd clauses, so the
> intersection between partition sets that we get for each.
> 
> Anyway, after David's rewrite of this portion of the patch incorporated in
> the latest patch, things look a bit different here, although there is
> still recursion for combine steps.  I'm still considering how to make the
> recursion go away.

I have managed to make the recursion go away in the attached updated
version.  I guess that's the result of employing the idea of a "output
register" for individual pruning steps as mentioned in Robert's email
upthread where he detailed the "pruning steps" approach [1].

With the new patch, pruning steps for arguments of, say, an OR clause are
not performed recursively.  Instead, each pruning step is performed
independently and its output is stored in a slot dedicated to it.  Combine
steps are always executed after all of the steps corresponding to its
arguments have been executed.  That's ensured by the way steps are allocated.


Jesper, off-list, reported an unused variable which has been removed in
the updated patch.  Thanks Jesper!  He also pointed out a case with a
list-partitioned table where pruning doesn't a produce a result as one
would expect and what constraint exclusion would produce.

create table lp (a char) partition by list (a);
create table lp_ad partition of lp for values in ('a', 'd');
create table lp_bc partition of lp for values in ('b', 'c');
create table lp_default partition of lp default;
explain (costs off) select * from lp where a > 'a' and a < 'd';
                        QUERY PLAN
-----------------------------------------------------------
 Append
   ->  Seq Scan on lp_ad
         Filter: ((a > 'a'::bpchar) AND (a < 'd'::bpchar))
   ->  Seq Scan on lp_bc
         Filter: ((a > 'a'::bpchar) AND (a < 'd'::bpchar))
   ->  Seq Scan on lp_default
         Filter: ((a > 'a'::bpchar) AND (a < 'd'::bpchar))
(7 rows)

One would expect that lp_ad is not scanned.

With the implementation where a > 'a' and a < 'd' are used to prune
separately, this cannot be avoided given the way
get_partitions_for_keys_list() added by the patch works.  What happens is
we prune first with a step generated for a > 'a', which returns partitions
for all datums in the table's boundinfo greater than 'a' that have a
partition assigned, which means we'll include the partition that accepts
'd'.  Then when pruning with a < 'd', we select partitions for all datums
less than 'd' that have a partition assigned, which means we end up
including the partition that accepts 'a'.  We intersect the result of
running these independent steps, but lp_ad is present in the result sets
of both the sets, so it ends up in the final result.  Maybe there is a way
to fix that, but I haven't done anything about it yet.

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/CA%2BTgmoahUxagjeNeJTcJkD0rbk%2BmHTXROzWcEd%2BtZ8DuQG83cg%40mail.gmail.com

Attachment

pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: Changing WAL Header to reduce contention during ReserveXLogInsertLocation()
Next
From: Simon Riggs
Date:
Subject: Re: [HACKERS] MERGE SQL Statement for PG11