Re: [HACKERS] path toward faster partition pruning - Mailing list pgsql-hackers

From Amit Langote
Subject Re: [HACKERS] path toward faster partition pruning
Date
Msg-id acb50fa6-4548-f09b-634b-667dced4bc74@lab.ntt.co.jp
Whole thread Raw
In response to Re: [HACKERS] path toward faster partition pruning  (David Rowley <david.rowley@2ndquadrant.com>)
List pgsql-hackers
On 2018/03/02 11:12, David Rowley wrote:
> On 2 March 2018 at 08:13, Robert Haas <robertmhaas@gmail.com> wrote:
>> I don't like the comments at the top of partprune.c very much.  It
>> seems strange to document individual functions here; those functions
>> can (and should) be documented in their individual header comments.
>> What we should have at the top of the file is a discussion of the
>> overall theory of operation of this module, something that currently
>> seems not to exist anywhere in the patch.  I tried to figure that out
>> by looking at the new data structures the patch introduces:
>> PartitionPruneContext, PartScanKeyInfo, PartitionClauseInfo, and
>> PartClause.  It looks like the general idea idea is that code that
>> wants to use these facilities does the following:
>>
>> Step 1. Generate a PartitionPruneContext.  In this patch, this seems
>> to consist entirely of copying information from the RelOptInfo or its
>> PartitionScheme.
>> Step 2. Call generate_partition_clauses() to extract relevant clauses
>> from rel->baserestrictinfo and generate a PartClauseInfo.
>> Step 3. Call get_partitions_from_clauses() to generate a set of
>> unpruned partition indexes.  Internally, that function will first
>> populate a PartScanKeyInfo from the PartClauseInfo by calling
>> extract_bounding_datums().  Then it calls get_partitions_for_keys()
>> which generates the set of unpruned partition indexes from the
>> PartitionPruneContext and PartScanKeyInfo.
> 
> Hi Robert,
> 
> I feel I should step in here and answer this part as it was me who
> first came up with the idea of the context struct. I've typed up
> something below which is my first cut at what I'd have imagined the
> header comment of partprune.c should look like. Some parts are only
> revant after run-time pruning is also using this stuff. I've tried to
> highlight those areas, I'm not sure how much or if there should be any
> mention of that at all as part of this patch.
> 
> Here goes:
> 
> partprune.c
> 
> Allows efficient identification of the minimal set of partitions which match a
> given set of clauses.  Thus allowing useful things such as ignoring unneeded
> partitions which cannot possibly contain tuples matching the given set of
> clauses.
> 
> This module breaks the process of determining the matching partitions into
> two distinct steps, each of which has its own function which is externally
> visible outside of this module.  The reason for not performing everything
> in one step as down to the fact that there are times where we may wish to
> perform the 2nd step multiple times over.  The steps could be thought of as a
> compilation step followed by an execution step.
> 
> Step 1 (compilation):
> 
> Pre-process the given list of clauses and attempt to match individual clauses
> up to a partition key.
> 
> The end result of this process is a PartitionClauseInfo containing details of
> each clause found to match the partition key.  This output is required as
> input for the 2nd step.
> 
> Step 2 (execution):
> 
> Step 2 outputs the minimal set of matching partitions based on the input from
> step 1.
> 
> Internally, this step is broken down into smaller sub-steps, each of which
> is explained in detail in the comments in the corresponding function.
> 
> Step 2 can be executed multiple times for its input values.  The inputs to this
> step are not modified by the processing done within.  It is expected that this
> step is executed multiple times in cases where the matching partitions must be
> determined during query execution.  A subsequent evaluation of this step will
> be required whenever a parameter which was found in a clause matching the
> partition key changes its value.
> 
> PartitionPruneContext:
> 
> Each of the steps described above also requires an input of a
> PartitionPruneContext.  This stores all of the remaining required inputs to
> each step.  The context will vary slightly depending on the context in which
> the step is being called from; i.e the planner or executor. For example,
> during query planning, we're unable to determine the value of a Param found
> matching the partition key.  When this step is called from the executor the
> PlanState can be set in the context which allows evaluation of these Params
> into Datum values. *** Only after run-time pruning ***
> 
> The PartitionPruneContext is also required since many of the query planner
> node types are unavailable to the executor, which means that the source
> information used to populate the context will vary depending on if it's being
> called from the query planner or executor.
> 
> *** Only after run-time pruning ***
> 
> The context is also modified during step 1 to record all of the Param IDs
> which were found to match the partition key.
> 
> -------------
> 
> Hopefully that also helps explain the intensions with the current code strucure.

Thanks David for writing this down.

Thanks,
Amit



pgsql-hackers by date:

Previous
From: Magnus Hagander
Date:
Subject: Re: [PATCH] Verify Checksums during Basebackups
Next
From: Pierre Ducroquet
Date:
Subject: ALTER TABLE does not check for column existence before starting operations