Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables
Date
Msg-id CAEepm=3cDKOBsvKA7cmDKo0UCx6X+mFMoKuigMF3+-25_rji0g@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables  (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>)
Responses Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables  (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>)
Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Thu, Aug 10, 2017 at 6:23 PM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
> Your patch didn't improve planning time without partition-wise join,
> so it's something good to have along-with partition-wise join. Given
> that Bitmapsets are used in other parts of code as well, the
> optimization may affect those parts as well, esp. the overhead of
> maintaining first_non_empty_wordnum.

Maybe, but if you consider that this container already deals with the
upper bound moving up by reallocating and copying the whole thing,
adjusting an int when the lower bound moves down doesn't seem like
anything to worry about...

> The comment at the beginning of the file bitmapset.c says
>    3  * bitmapset.c
>    4  *    PostgreSQL generic bitmap set package
>    5  *
>    6  * A bitmap set can represent any set of nonnegative integers, although
>    7  * it is mainly intended for sets where the maximum value is not large,
>    8  * say at most a few hundred.
>
> When we created thousands of children, we have certainly crossed the
> few hundred threashold. So, there may be other optimizations possible
> there. Probably we should leave that out of partition-wise join
> patches.

+1

> Do you think we solving this problem is a prerequisite for
> partition-wise join? Or should we propose that patch as a separate
> enhancement?

No, I'm not proposing anything yet.  For now I just wanted to share
this observation about where hot CPU time goes in simple tests, and
since it turned out to be a loop in a loop that I could see an easy to
way to fix for singleton sets and sets with a small range, I couldn't
help trying it...  But I'm still trying to understand the bigger
picture.  I'll be interested to compare profiles with the ordered
append_rel_list version you have mentioned, to see how that moves the
hot spots.

I guess one very practical question to ask is: can we plan queries
with realistic numbers of partitioned tables and partitions in
reasonable times?  Well, it certainly looks very good for hundreds of
partitions so far...  My own experience of partitioning with other
RDBMSs has been on that order, 'monthly partitions covering the past
10 years' and similar, but on the other hand it wouldn't be surprising
to learn that people want to go to many thousands, especially for
schemas which just keep adding partitions over time and don't want to
drop them.  As for hash partitioning, that seems to be typically done
with numbers like 16, 32 or 64 in other products from what I can
glean.  Speculation: perhaps hash partitioning is more motivated by
parallelism than data maintenance and thus somehow anchored to the
ground by core counts; if so no planning performance worries there I
guess (until core counts double quite a few more times).

One nice thing about the planning time is that restrictions on the
partition key cut down planning time; so where I measure ~7 seconds to
plan SELECT * FROM foofoo JOIN barbar USING (a, b) with 2k partitions,
if I add WHERE a > 50 it's ~4 seconds and WHERE a > 99 it's ~0.8s, so
if someone has a keep-adding-more-partitions-over-time model then at
least their prunable current day/week/whatever queries will not suffer
quite so badly.  (Yeah my computer seems to be a lot slower than yours
for these tests; clang -O2 no asserts on a mid 2014 MBP with i7 @
2.2Ghz).

Curious: would you consider joins between partitioned tables and
non-partitioned tables where the join is pushed down to be a kind of
"partition-wise join", or something else?  If so, would that be a
special case, or just the logical extreme case for
0014-WIP-Partition-wise-join-for-1-1-1-0-0-1-partition-ma.patch, where
one single "partition" on the non-partitioned side maps to all the
partitions on the partitioned size?

-- 
Thomas Munro
http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: [HACKERS] Creating backup history files for backups taken from standbys
Next
From: Beena Emerson
Date:
Subject: Re: [HACKERS] expanding inheritance in partition bound order