Re: [HACKERS] path toward faster partition pruning - Mailing list pgsql-hackers
From | Amit Langote |
---|---|
Subject | Re: [HACKERS] path toward faster partition pruning |
Date | |
Msg-id | 21e00a6a-1e8d-d866-ae40-07c2058512de@lab.ntt.co.jp Whole thread Raw |
In response to | Re: [HACKERS] path toward faster partition pruning (Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>) |
Responses |
Re: [HACKERS] path toward faster partition pruning
(Jesper Pedersen <jesper.pedersen@redhat.com>)
Re: [HACKERS] path toward faster partition pruning (Alvaro Herrera <alvherre@alvh.no-ip.org>) Re: [HACKERS] path toward faster partition pruning (David Rowley <david.rowley@2ndquadrant.com>) |
List | pgsql-hackers |
On 2018/03/23 21:19, Amit Langote wrote: > On 2018/03/21 6:29, Robert Haas wrote: >> + /* >> + * If there are multiple pruning steps, we perform them one after another, >> + * passing the result of one step as input to another. Based on the type >> + * of pruning step, perform_pruning_step may add or remove partitions from >> + * the set of partitions it receives as the input. >> + */ >> >> The comment sounds great, but the code doesn't work that way; it >> always calls bms_int_members to intersect the new result with any >> previous result. I'm baffled as to how this manages to DTRT if >> COMBINE_OR is used. In general I had hoped that the list of pruning >> steps was something over which we were only going to iterate, not >> recurse. This definitely recurses for the combine steps, but it's >> still (sorta) got the idea of a list of iterable steps. That's a >> weird mix. > > At the top-level (in get_matching_partitions), it is assumed that the > steps in the input list come from implicitly AND'd clauses, so the > intersection between partition sets that we get for each. > > Anyway, after David's rewrite of this portion of the patch incorporated in > the latest patch, things look a bit different here, although there is > still recursion for combine steps. I'm still considering how to make the > recursion go away. I have managed to make the recursion go away in the attached updated version. I guess that's the result of employing the idea of a "output register" for individual pruning steps as mentioned in Robert's email upthread where he detailed the "pruning steps" approach [1]. With the new patch, pruning steps for arguments of, say, an OR clause are not performed recursively. Instead, each pruning step is performed independently and its output is stored in a slot dedicated to it. Combine steps are always executed after all of the steps corresponding to its arguments have been executed. That's ensured by the way steps are allocated. Jesper, off-list, reported an unused variable which has been removed in the updated patch. Thanks Jesper! He also pointed out a case with a list-partitioned table where pruning doesn't a produce a result as one would expect and what constraint exclusion would produce. create table lp (a char) partition by list (a); create table lp_ad partition of lp for values in ('a', 'd'); create table lp_bc partition of lp for values in ('b', 'c'); create table lp_default partition of lp default; explain (costs off) select * from lp where a > 'a' and a < 'd'; QUERY PLAN ----------------------------------------------------------- Append -> Seq Scan on lp_ad Filter: ((a > 'a'::bpchar) AND (a < 'd'::bpchar)) -> Seq Scan on lp_bc Filter: ((a > 'a'::bpchar) AND (a < 'd'::bpchar)) -> Seq Scan on lp_default Filter: ((a > 'a'::bpchar) AND (a < 'd'::bpchar)) (7 rows) One would expect that lp_ad is not scanned. With the implementation where a > 'a' and a < 'd' are used to prune separately, this cannot be avoided given the way get_partitions_for_keys_list() added by the patch works. What happens is we prune first with a step generated for a > 'a', which returns partitions for all datums in the table's boundinfo greater than 'a' that have a partition assigned, which means we'll include the partition that accepts 'd'. Then when pruning with a < 'd', we select partitions for all datums less than 'd' that have a partition assigned, which means we end up including the partition that accepts 'a'. We intersect the result of running these independent steps, but lp_ad is present in the result sets of both the sets, so it ends up in the final result. Maybe there is a way to fix that, but I haven't done anything about it yet. Thanks, Amit [1] https://www.postgresql.org/message-id/CA%2BTgmoahUxagjeNeJTcJkD0rbk%2BmHTXROzWcEd%2BtZ8DuQG83cg%40mail.gmail.com
Attachment
pgsql-hackers by date: