Re: [HACKERS] Runtime Partition Pruning - Mailing list pgsql-hackers

From David Rowley
Subject Re: [HACKERS] Runtime Partition Pruning
Date
Msg-id CAKJS1f8dq1BcvNEMTitp3O7Gc_cN1Dmn9vxxacb-kiX+mmyGSg@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Runtime Partition Pruning  (Jesper Pedersen <jesper.pedersen@redhat.com>)
Responses Re: [HACKERS] Runtime Partition Pruning
Re: [HACKERS] Runtime Partition Pruning
Re: [HACKERS] Runtime Partition Pruning
List pgsql-hackers
On 10 March 2018 at 07:50, Jesper Pedersen <jesper.pedersen@redhat.com> wrote:
> On 03/01/2018 08:29 PM, David Rowley wrote:
>> I'll delay releasing a new patch as there's some discussion over on
>> the faster partition pruning thread which affects this too [1]
>>
>> [1]
>> https://www.postgresql.org/message-id/CA+Tgmoa4D1c4roj7L8cx8gkkeBWAZD=MTcXKxTwBnsLRHD3rig@mail.gmail.com
>>
>
> Sure, 0003-0005 depends on that thread. 0002 is refactoring so that one is
> ready.

Okay, back to this again. The faster partition pruning patch has
undergone a large refactor which cause me to delay basing this patch
on top of it until that patch has stabilised again.  It's now getting
close, so this seems like a good time to send a new verison of this.

> In 0004 should we add a HASH based test case,

I don't think we really need to test each partition type in too much
detail in the run-time pruning patch. I'm more interested in testing
multiple partition levels there's important new code in this patch
which does need lots of additional testing for multi-level partitions.
I think adding a HASH partition case would be more aiming to test the
planner's pruning code, which this does not make any behavioural
changes to.

> Also, should 0004 consider the "Parallel Append" case, aka

There is a parallel Append test case in there already, see the queries
below the "-- parallel append" comment.

I've attached a new version of the patch.  I'm now at v18 after having
some versions of the patch that I didn't release which were based on
various versions of Amit's faster partition pruning patch.

The attached patchset is based on Amit's v45 faster partition pruning [1].

I've made a few changes since the v14 version. Since Amit's v45 patch
now creates the partition pruning details in a data structure that can
be copied from the planner over to the executor, it means this patch
is now able to do much of the setup work for run-time pruning in the
planner. Doing this now allows us to determine if run-time pruning is
even possible at plan time, rather than the executor attempting it and
sometimes wasting effort when it failed to find Params matching the
partition key.

Another change from the last version is that I've separated out the
handling of exec Params and external Params.  The new patch now will
perform a pruning step at executor startup if some external params
match the partition key.  This is very beneficial to a PREPAREd OLTP
type query against a partitioned table as it means we can skip
sub-plan initialisation for all non-matching partitions.  Initialising
Append/MergeAppend/ModifyTable nodes with fewer subnodes than were
originally planned did require a small change in explain.c in some
code that was assuming the subnode arrays were the same length as the
subplan list. I also ended up adding a Int property to EXPLAIN to show
the number of subnodes that have been removed during execution.
Unfortunately, there is a small corner case with this in the case
where all subnodes are removed it leaves EXPLAIN with no subnodes to
search for outer side Vars in. I didn't really see any sane way to
handle this in EXPLAIN, so I think the best fix for this is what I've
done, and that's just to always initalise at least 1 subnode, even
when none match the external Params. Cost-wise, I don't think this is
such a big deal as the cost savings here are coming from saving on
initalising ten's or hundreds of subnodes, not 1.

To put the new patch to the test, I tried pgbench -S -M prepared -s
100 with and without having modified pgbench_accounts to separate into
10 RANGE partitions of equal size.

A non-partitioned table was getting 12503 TPS.
With partitioned tables, the old version of this patch was getting: 5470 TPS.
With partitioned tables, the attached version gets 11247 TPS.
For perspective, today's master with a partitioned table gets 4719 TPS.

So you can see it's a pretty good performance boost by skipping
initialisation of the 9 non-matching subplans.  It's not hard to
imagine the gains getting more significant with a larger number of
partitions. Ideally, the performance of a partitioned table would be
faster than the non-partitioned case, but please remember this query
is a single row PK lookup SELECT, so is a very fast query in both
cases. Partitioning cases should improve more as the table grows and
indexes struggle to fit in RAM, or when many rows are being taken from
the partition and being aggregated.

Unfortunately, the story is not quite as good for the non -M prepared
version of the benchmark. This shows that planning time of partitioned
table queries could still use some improvements. Amit's v45 patch
certainly makes a dent in the planner slow performance here, but it's
still nothing like as fast as the non-partitioned case. More work is
required there in the future.

This patchset also contains a patch to improve the performance of
inheritance planning of UPDATE/DELETE queries. This patch also
implements run-time pruning for UPDATE/DELETE too. This also has a
significant performance improvement for planning of UPDATE/DELETE
operations on partitioned tables with a large number of partitions,
performance is as follows:

Values in transactions per second.

Partitions = 1
Unpatched: 7323.3
Patched: 6573.2 (-10.24%)

Partitions = 2
Unpatched: 6784.8
Patched: 6377.1 (-6.01%)

Partitions = 4
Unpatched: 5903.0
Patched: 6106.8 (3.45%)

Partitions = 8
Unpatched: 4582.0
Patched: 5579.9 (21.78%)

Partitions = 16
Unpatched: 3131.5
Patched: 4521.2 (44.38%)

Partitions = 32
Unpatched: 1779.8
Patched: 3387.8 (90.35%)

Partitions = 64
Unpatched: 821.9
Patched: 2245.4 (173.18%)

Partitions = 128
Unpatched: 322.2
Patched: 1319.6 (309.56%)

Partitions = 256
Unpatched: 84.3
Patched: 731.7 (768.27%)

Partitions = 512
Unpatched: 22.5
Patched: 382.8 (1597.74%)

Partitions = 1024
Unpatched: 5.5
Patched: 150.1 (2607.83%)

This shows a small regression in planner performance of 10% for a
table with 1 partition, but performance starts improving again by just
4 partitions. By the time we get to 1024 partitions the patched
planner is 26 times faster than unpatched. It's likely not a
no-brainer since some people may get a small performance regression,
but it certainly makes having larger numbers of partitions much more
usable than it was previously.

There are still a few bugs lingering in v45 of Amit's faster partition
pruning patch. One of which is causing a regression test to fail in
the attached patch set. I've also attached a patch which must be
applied after Amit's v45 patchset to fixup a couple of things missing.
Hopefully, this won't be needed once v46 is out.

To test this apply all patches in [1], then apply
faster_part_prune_v45_fixups.patch, then the attached 0001-0005
patches.

[1] https://www.postgresql.org/message-id/fc73cef4-6879-26c3-6859-2f910640234a@lab.ntt.co.jp

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: Small proposal to improve out-of-memory messages
Next
From: Craig Ringer
Date:
Subject: Re: Feature Request - DDL deployment with logical replication