Re: [HACKERS] Multi column range partition table - Mailing list pgsql-hackers
From | Dean Rasheed |
---|---|
Subject | Re: [HACKERS] Multi column range partition table |
Date | |
Msg-id | CAEZATCUVFFDC81AUFm2ksNjnHDp3mNH7RtG7EN+GcJmsjtafEw@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Multi column range partition table (Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>) |
Responses |
Re: [HACKERS] Multi column range partition table
|
List | pgsql-hackers |
On 3 July 2017 at 10:32, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: > On 2017/07/03 17:36, Dean Rasheed wrote: >> The bigger question is do we want this for PG10? If so, time is >> getting tight. My feeling is that we do, because otherwise we'd be >> changing the syntax in PG11 of a feature only just released in PG10, >> and I think the current syntax is flawed, so it would be better not to >> have it in any public release. I'd feel better hearing from the >> original committer though. > > The way I have extended the syntax in the posted patch, ABOVE/BELOW (or > whatever we decide instead) are optional. UNBOUNDED without the > ABOVE/BELOW specifications implicitly means UNBOUNDED ABOVE if in FROM and > vice versa, which seems to me like sensible default behavior and what's > already present in PG 10. > > Do you think ABOVE/BELOW shouldn't really be optional? > Hmm, I'm not so sure about that. The more I think about this, the more I think that the current design is broken, and that introducing UNBOUNDED ABOVE/BELOW is just a sticking plaster to cover that up. Yes, it allows nicer multi-column ranges to be defined, as demonstrated upthread. But, it also allows some pretty counterintuitive things like making the lower bound exclusive and the upper bound inclusive. I think that's actually the real problem with the current design. If I have a single-column partition like (col) FROM (x) TO (y) it's pretty clear that's a simple range, inclusive at the lower end and exclusive at the upper end: (x) <= (col) < (y) If I now make that a 2-column partition, but leave the second column unbounded: (col1,col2) FROM (x,UNBOUNDED) TO (y,UNBOUNDED) my initial expectation would have been for that to mean the same thing, i.e., (x) <= (col1) < (y) but that only happens if "UNBOUNDED" means negative infinity in both places. That then starts to give the sort of desirable properties you'd expect, like using the same expression for the lower bound of one partition as the upper bound of another makes the two partitions contiguous. But of course, that's not exactly a pretty design either, because then you'd be saying that UNBOUNDED means positive infinity if it's the upper bound of the first column, and negative infinity if it's the lower bound of the first column or either bound of any other column. Another aspect of the current design I don't like is that you have to keep repeating UNBOUNDED [ABOVE/BELOW], for each of the rest of the columns in the bound, and anything else is an error. That's a pretty verbose way of saying "the rest of the columns are unbounded". So the more I think about this, the more I think that a cleaner design would be as follows: 1). Don't allow UNBOUNDED, except in the first column, where it can keep it's current meaning. 2). Allow the partition bounds to have fewer columns than the partition definition, and have that mean the same as it wouldhave meant if you were partitioning by that many columns. So, for example, if you were partitioning by (col1,col2),you'd be allowed to define a partition like so: FROM (x) TO (y) and it would mean x <= col1 < y Or you'd be able to define a partition like FROM (x1,x2) TO (y) which would mean (col1 > x1) OR (col1 = x1 AND col2 >= x2) AND col1 < y 3). Don't allow any value after UNBOUNDED (i.e., only specify UNBOUNDED once in a partition bound). This design has a few neat properties: - Lower bounds are always inclusive and upper bounds are always exclusive. - If the expression for the lower bound of one partition is the same as the expression for the upper bound of another, the2 partitions are contiguous, making it easy to define a covering set of partitions. - It's much easier to understand what a bound of "(x)" means than "(x,UNBOUNDED [ABOVE/BELOW])" - It's much less verbose, and there's no needless repetition. Of course, it's pretty late in the day to be proposing this kind of redesign, but I fear that if we don't tackle it now, it will just be harder to deal with in the future. Actually, a quick, simple hacky implementation might be to just fill in any omitted values in a partition bound with negative infinity internally, and when printing a bound, omit any values after an infinite value. But really, I think we'd want to tidy up the implementation, and I think a number of things would actually get much simpler. For example, get_qual_for_range() could simply stop when it reached the end of the list of values for the bound, and it wouldn't need to worry about an unbounded value following a bounded one. Thoughts? Regards, Dean
pgsql-hackers by date: