Re: [HACKERS] path toward faster partition pruning - Mailing list pgsql-hackers

From David Rowley
Subject Re: [HACKERS] path toward faster partition pruning
Date
Msg-id CAKJS1f_TArTvFCY4Abj2PqM+JfRR_jxwZ1FtT_Md=WKd5Nw_xg@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] path toward faster partition pruning  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: [HACKERS] path toward faster partition pruning  (Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>)
List pgsql-hackers
On 2 March 2018 at 08:13, Robert Haas <robertmhaas@gmail.com> wrote:
> I don't like the comments at the top of partprune.c very much.  It
> seems strange to document individual functions here; those functions
> can (and should) be documented in their individual header comments.
> What we should have at the top of the file is a discussion of the
> overall theory of operation of this module, something that currently
> seems not to exist anywhere in the patch.  I tried to figure that out
> by looking at the new data structures the patch introduces:
> PartitionPruneContext, PartScanKeyInfo, PartitionClauseInfo, and
> PartClause.  It looks like the general idea idea is that code that
> wants to use these facilities does the following:
>
> Step 1. Generate a PartitionPruneContext.  In this patch, this seems
> to consist entirely of copying information from the RelOptInfo or its
> PartitionScheme.
> Step 2. Call generate_partition_clauses() to extract relevant clauses
> from rel->baserestrictinfo and generate a PartClauseInfo.
> Step 3. Call get_partitions_from_clauses() to generate a set of
> unpruned partition indexes.  Internally, that function will first
> populate a PartScanKeyInfo from the PartClauseInfo by calling
> extract_bounding_datums().  Then it calls get_partitions_for_keys()
> which generates the set of unpruned partition indexes from the
> PartitionPruneContext and PartScanKeyInfo.

Hi Robert,

I feel I should step in here and answer this part as it was me who
first came up with the idea of the context struct. I've typed up
something below which is my first cut at what I'd have imagined the
header comment of partprune.c should look like. Some parts are only
revant after run-time pruning is also using this stuff. I've tried to
highlight those areas, I'm not sure how much or if there should be any
mention of that at all as part of this patch.

Here goes:

partprune.c

Allows efficient identification of the minimal set of partitions which match a
given set of clauses.  Thus allowing useful things such as ignoring unneeded
partitions which cannot possibly contain tuples matching the given set of
clauses.

This module breaks the process of determining the matching partitions into
two distinct steps, each of which has its own function which is externally
visible outside of this module.  The reason for not performing everything
in one step as down to the fact that there are times where we may wish to
perform the 2nd step multiple times over.  The steps could be thought of as a
compilation step followed by an execution step.

Step 1 (compilation):

Pre-process the given list of clauses and attempt to match individual clauses
up to a partition key.

The end result of this process is a PartitionClauseInfo containing details of
each clause found to match the partition key.  This output is required as
input for the 2nd step.

Step 2 (execution):

Step 2 outputs the minimal set of matching partitions based on the input from
step 1.

Internally, this step is broken down into smaller sub-steps, each of which
is explained in detail in the comments in the corresponding function.

Step 2 can be executed multiple times for its input values.  The inputs to this
step are not modified by the processing done within.  It is expected that this
step is executed multiple times in cases where the matching partitions must be
determined during query execution.  A subsequent evaluation of this step will
be required whenever a parameter which was found in a clause matching the
partition key changes its value.

PartitionPruneContext:

Each of the steps described above also requires an input of a
PartitionPruneContext.  This stores all of the remaining required inputs to
each step.  The context will vary slightly depending on the context in which
the step is being called from; i.e the planner or executor. For example,
during query planning, we're unable to determine the value of a Param found
matching the partition key.  When this step is called from the executor the
PlanState can be set in the context which allows evaluation of these Params
into Datum values. *** Only after run-time pruning ***

The PartitionPruneContext is also required since many of the query planner
node types are unavailable to the executor, which means that the source
information used to populate the context will vary depending on if it's being
called from the query planner or executor.

*** Only after run-time pruning ***

The context is also modified during step 1 to record all of the Param IDs
which were found to match the partition key.

-------------

Hopefully that also helps explain the intensions with the current code strucure.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: [HACKERS] Support for Secure Transport SSL library on macOS asOpenSSL alternative
Next
From: Pavel Stehule
Date:
Subject: Re: [HACKERS] PoC plpgsql - possibility to force custom or generic plan