Home > mailing lists

Re: [HACKERS] Multi column range partition table - Mailing list pgsql-hackers

From	Amit Langote
Subject	Re: [HACKERS] Multi column range partition table
Date	July 5, 2017 12:43:13
Msg-id	a6d6b752-3d08-ded8-3c8f-5cc9f090ec20@lab.ntt.co.jp Whole thread Raw
In response to	Re: [HACKERS] Multi column range partition table (Dean Rasheed <dean.a.rasheed@gmail.com>)
Responses	Re: [HACKERS] Multi column range partition table Re: [HACKERS] Multi column range partition table Re: [HACKERS] Multi column range partition table
List	pgsql-hackers

Tree view

Hi Dean,

On 2017/07/04 16:49, Dean Rasheed wrote:
> On 3 July 2017 at 10:32, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
>> On 2017/07/03 17:36, Dean Rasheed wrote:
>>> The bigger question is do we want this for PG10? If so, time is
>>> getting tight. My feeling is that we do, because otherwise we'd be
>>> changing the syntax in PG11 of a feature only just released in PG10,
>>> and I think the current syntax is flawed, so it would be better not to
>>> have it in any public release. I'd feel better hearing from the
>>> original committer though.
>>
>> The way I have extended the syntax in the posted patch, ABOVE/BELOW (or
>> whatever we decide instead) are optional.  UNBOUNDED without the
>> ABOVE/BELOW specifications implicitly means UNBOUNDED ABOVE if in FROM and
>> vice versa, which seems to me like sensible default behavior and what's
>> already present in PG 10.
>>
>> Do you think ABOVE/BELOW shouldn't really be optional?
>>
> 
> Hmm, I'm not so sure about that.
> 
> The more I think about this, the more I think that the current design
> is broken, and that introducing UNBOUNDED ABOVE/BELOW is just a
> sticking plaster to cover that up. Yes, it allows nicer multi-column
> ranges to be defined, as demonstrated upthread. But, it also allows
> some pretty counterintuitive things like making the lower bound
> exclusive and the upper bound inclusive.

Yes, I kind of got that impression from the example, but wasn't able to
reach the same conclusion as yours that it stems from the underlying
design issues; I thought we'd just have to document them as caveats, but
that doesn't really sound nice.  Thanks for pointing that out.

> I think that's actually the real problem with the current design. If I
> have a single-column partition like
> 
>   (col) FROM (x) TO (y)
> 
> it's pretty clear that's a simple range, inclusive at the lower end
> and exclusive at the upper end:
> 
>   (x) <= (col) < (y)
> 
> If I now make that a 2-column partition, but leave the second column
> unbounded:
> 
>   (col1,col2) FROM (x,UNBOUNDED) TO (y,UNBOUNDED)
> 
> my initial expectation would have been for that to mean the same
> thing, i.e.,
> 
>   (x) <= (col1) < (y)
> 
> but that only happens if "UNBOUNDED" means negative infinity in both
> places. That then starts to give the sort of desirable properties
> you'd expect, like using the same expression for the lower bound of
> one partition as the upper bound of another makes the two partitions
> contiguous.
> 
> But of course, that's not exactly a pretty design either, because then
> you'd be saying that UNBOUNDED means positive infinity if it's the
> upper bound of the first column, and negative infinity if it's the
> lower bound of the first column or either bound of any other column.

Initially, I didn't understand the part where you said FROM (x, UNBOUNDED)
TO (y, UNBOUNDED) would mean the same thing as (x) <= (col1) < (y),
because row comparison logic that underlying multi-column range partition
key comparisons appears to me to contradict the same.  But, maybe it's
thinking about the implementation details like this that's clouding my
judgement about the correctness or the intuitiveness of the current design.

> Another aspect of the current design I don't like is that you have to
> keep repeating UNBOUNDED [ABOVE/BELOW], for each of the rest of the
> columns in the bound, and anything else is an error. That's a pretty
> verbose way of saying "the rest of the columns are unbounded".
>
> So the more I think about this, the more I think that a cleaner design
> would be as follows:
> 
> 1). Don't allow UNBOUNDED, except in the first column, where it can
>     keep it's current meaning.
> 
> 2). Allow the partition bounds to have fewer columns than the
>     partition definition, and have that mean the same as it would have
>     meant if you were partitioning by that many columns. So, for
>     example, if you were partitioning by (col1,col2), you'd be allowed
>     to define a partition like so:
> 
>       FROM (x) TO (y)
> 
>     and it would mean
> 
>       x <= col1 < y
> 
>     Or you'd be able to define a partition like
> 
>       FROM (x1,x2) TO (y)
> 
>     which would mean
> 
>       (col1 > x1) OR (col1 = x1 AND col2 >= x2) AND col1 < y
> 
> 3). Don't allow any value after UNBOUNDED (i.e., only specify
>     UNBOUNDED once in a partition bound).

I assume we don't need the ability of specifying ABOVE/BELOW in this design.

In retrospect, that sounds like something that was implemented in the
earlier versions of the patch, whereby there was no ability to specify
UNBOUNDED on a per-column basis.  So the syntax was:

FROM { (x [, ...]) | UNBOUNDED } TO { (y [, ...]) | UNBOUNDED }

But, it was pointed out to me [1] that that doesn't address the use case,
for example, where part1 goes up to (10, 10) and part2 goes from (10, 10)
up to (10, unbounded).

The new design will limit the usage of unbounded range partitions at the
tail ends.

> This design has a few neat properties:
> 
> - Lower bounds are always inclusive and upper bounds are always
>   exclusive.
> 
> - If the expression for the lower bound of one partition is the same
>   as the expression for the upper bound of another, the 2 partitions
>   are contiguous, making it easy to define a covering set of
>   partitions.
> 
> - It's much easier to understand what a bound of "(x)" means than
>   "(x,UNBOUNDED [ABOVE/BELOW])"
> 
> - It's much less verbose, and there's no needless repetition.

They all sound good to me.

> Of course, it's pretty late in the day to be proposing this kind of
> redesign, but I fear that if we don't tackle it now, it will just be
> harder to deal with in the future.
> 
> Actually, a quick, simple hacky implementation might be to just fill
> in any omitted values in a partition bound with negative infinity
> internally, and when printing a bound, omit any values after an
> infinite value. But really, I think we'd want to tidy up the
> implementation, and I think a number of things would actually get much
> simpler. For example, get_qual_for_range() could simply stop when it
> reached the end of the list of values for the bound, and it wouldn't
> need to worry about an unbounded value following a bounded one.
> 
> Thoughts?

I cooked up a patch for the "hacky" implementation for now, just as you
described in the above paragraph.  Will you be willing to give it a look?
I will also think about the non-hacky way of implementing this.

0001 is your patch to tidy up check_new_partition_bound()  (must be
applied before 0002)

0002 is the patch to implement the range partition syntax redesign that
you outlined above

Thanks again.

Regards,
Amit

[1]
https://www.postgresql.org/message-id/CA%2BTgmoYJcUTcN7vVgg54GHtffH11JJWYZnfF4KiRxjV-iaACQg%40mail.gmail.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

pgsql-hackers by date:

From: Amit Khandekar
Date: 05 July 2017, 12:42:49
Subject: Re: [HACKERS] UPDATE of partition key

From: Ashutosh Sharma
Date: 05 July 2017, 13:18:39
Subject: [HACKERS] Partition : Append node over a single SeqScan

Re: [HACKERS] Multi column range partition table - Mailing list pgsql-hackers

Attachment

Previous

Next