Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables - Mailing list pgsql-hackers

From Ashutosh Bapat
Subject Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables
Date
Msg-id CAFjFpReHma+aVSf2Jynr-gKhnzezBkqMsPA7Y5O1uPgrd3zicg@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables  (Thomas Munro <thomas.munro@enterprisedb.com>)
List pgsql-hackers
On Thu, Aug 10, 2017 at 3:13 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Thu, Aug 10, 2017 at 6:23 PM, Ashutosh Bapat
> <ashutosh.bapat@enterprisedb.com> wrote:
>> Your patch didn't improve planning time without partition-wise join,
>> so it's something good to have along-with partition-wise join. Given
>> that Bitmapsets are used in other parts of code as well, the
>> optimization may affect those parts as well, esp. the overhead of
>> maintaining first_non_empty_wordnum.
>
> Maybe, but if you consider that this container already deals with the
> upper bound moving up by reallocating and copying the whole thing,
> adjusting an int when the lower bound moves down doesn't seem like
> anything to worry about...

Yeah. May be we should check whether that makes any difference to
planning times of TPC-H queries. If it shows any difference.

>
>> Do you think we solving this problem is a prerequisite for
>> partition-wise join? Or should we propose that patch as a separate
>> enhancement?
>
> No, I'm not proposing anything yet.  For now I just wanted to share
> this observation about where hot CPU time goes in simple tests, and
> since it turned out to be a loop in a loop that I could see an easy to
> way to fix for singleton sets and sets with a small range, I couldn't
> help trying it...  But I'm still trying to understand the bigger
> picture.  I'll be interested to compare profiles with the ordered
> append_rel_list version you have mentioned, to see how that moves the
> hot spots.

build_simple_rel() which contains that loop takes only .23% of
planning time. So, I doubt if that's going to change much.
+   0.23%  postgres  postgres            [.] build_simple_rel
       ▒


>
> I guess one very practical question to ask is: can we plan queries
> with realistic numbers of partitioned tables and partitions in
> reasonable times?  Well, it certainly looks very good for hundreds of
> partitions so far...  My own experience of partitioning with other
> RDBMSs has been on that order, 'monthly partitions covering the past
> 10 years' and similar, but on the other hand it wouldn't be surprising
> to learn that people want to go to many thousands, especially for
> schemas which just keep adding partitions over time and don't want to
> drop them.  As for hash partitioning, that seems to be typically done
> with numbers like 16, 32 or 64 in other products from what I can
> glean.  Speculation: perhaps hash partitioning is more motivated by
> parallelism than data maintenance and thus somehow anchored to the
> ground by core counts; if so no planning performance worries there I
> guess (until core counts double quite a few more times).

Agreed.

>
> One nice thing about the planning time is that restrictions on the
> partition key cut down planning time; so where I measure ~7 seconds to
> plan SELECT * FROM foofoo JOIN barbar USING (a, b) with 2k partitions,
> if I add WHERE a > 50 it's ~4 seconds and WHERE a > 99 it's ~0.8s, so
> if someone has a keep-adding-more-partitions-over-time model then at
> least their prunable current day/week/whatever queries will not suffer
> quite so badly.  (Yeah my computer seems to be a lot slower than yours
> for these tests; clang -O2 no asserts on a mid 2014 MBP with i7 @
> 2.2Ghz).

That's interesting observation. Thanks for sharing it.

>
> Curious: would you consider joins between partitioned tables and
> non-partitioned tables where the join is pushed down to be a kind of
> "partition-wise join", or something else?  If so, would that be a
> special case, or just the logical extreme case for
> 0014-WIP-Partition-wise-join-for-1-1-1-0-0-1-partition-ma.patch, where
> one single "partition" on the non-partitioned side maps to all the
> partitions on the partitioned size?
>

Parameterized nest loop joins with partition key as parameters
simulate something like that. Apart from that case, I don't see any
case where such a join would be more efficient compared to the current
method of ganging all partitions and joining them to the unpartitioned
table. But oh wait, that could be useful in sharding, when the
unpartitioned table is replicated and partitioned table is distributed
across shards. So, yes, that's a useful case. I am not sure whether
it's some kind of partition-wise join; it doesn't matter, it looks
useful. Said that, I am not planning to handle it in the near future.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



pgsql-hackers by date:

Previous
From: Chris Travers
Date:
Subject: [HACKERS] Funny WAL corruption issue
Next
From: Vladimir Borodin
Date:
Subject: Re: [HACKERS] Funny WAL corruption issue