Thread: Proposal: Automatic partition creation
The previous discussion of automatic partition creation [1] has addressed static and dynamic creation of partitions and ended up with several syntax proposals. In this thread, I want to continue this work. Attached is PoC for static partition creation. The patch core is quite straightforward. It adds one more transform clause to convert given partitioning specification into several CREATE TABLE statements. The patch implements following syntax: CREATE TABLE ... PARTITION BY partition_method (list_of_columns) partition_auto_create_clause where partition_auto_create_clause is CONFIGURATION [IMMEDIATE| DEFERRED] USING partition_bound_spec and partition_bound_spec is: MODULUS integer | VALUES IN (expr [,...]) [, ....] | INTERVAL range_step FROM range_start TO range_end For more examples check auto_partitions.sql in the patch. TODO: - CONFIGURATION is just an existing keyword, that I picked as a stub. Ideas on better wording are welcome. - IMMEDIATE| DEFERRED is optional, DEFERRED is not implemented yet I wonder, is it worth placing a stub for dynamic partitioning, or we can rather add these keywords later. - HASH and LIST static partitioning works as expected. Testing and feedback are welcome. - RANGE partitioning is not really implemented in this patch. Now it only accepts interval data type as 'interval' and respectively date types as range_start and range_end expressions. Only one partition is created. I found it difficult to implement the generation of bounds using internal functions and data types. Both existing solutions (pg_pathman and pg_partman) rely on SQL level routines [2]. I am going to implement this via SPI, which allow to simplify checks and calculations. Do you see any pitfalls in this approach? - Partition naming. Now partition names for all methods look like $tablename_$partnum Do we want more intelligence here? Now we have RunObjectPostCreateHook(), which allows to rename the table. To make it more user-friendly, we can later implement pl/pgsql function that sets the callback, as it is done in pg_pathman set_init_callback() [3]. - Current design doesn't allow to create default partition automatically. Do we need this functionality? - Do you see any restrictions for future extensibility (dynamic partitioning, init_callback, etc.) in the proposed design ? I expect this to be a long discussion, so here is the wiki page [4] to fix important questions and final agreements. [1] https://www.postgresql.org/message-id/flat/alpine.DEB.2.21.1907150711080.22273%40lancre [2] https://github.com/postgrespro/pg_pathman/blob/dbcbd02e411e6acea6d97f572234746007979538/range.sql#L99 [3] https://github.com/postgrespro/pg_pathman#additional-parameters [4] https://wiki.postgresql.org/wiki/Declarative_partitioning_improvements -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
On Mon, Jul 06, 2020 at 01:45:52PM +0300, Anastasia Lubennikova wrote: > The previous discussion of automatic partition creation [1] has addressed > static and dynamic creation of partitions and ended up with several syntax > proposals. ... > where partition_auto_create_clause is > > CONFIGURATION [IMMEDIATE| DEFERRED] USING partition_bound_spec > - IMMEDIATE| DEFERRED is optional, DEFERRED is not implemented yet > I wonder, is it worth placing a stub for dynamic partitioning, or we can > rather add these keywords later. I understand by "deferred" you mean that the partition isn't created at the time CREATE TABLE is run but rather deferred until needed by INSERT. For deferred, range partitioned tables, I think maybe what you'd want to specify (and store) is the INTERVAL. If the table is partitioned by day, then we'd date_trunc('day', time) and dynamically create that day. But if it was partitioned by month, we'd create the month. I think you'd want to have an ALTER command for that (we would use that to change tables between daily/monthly based on their current size). That should also support setting the MODULUS of a HASH partitioned table, to allow changing the size of its partitions (currently, the user would have to more or less recreate the table and move all its data into different partitions, but that's not ideal). I don't know if it's important for anyone, but it would be interesting to think about supporting sub-partitioning: partitions which are themselvese partitioned. Like something => something_YYYY => something_YYYY_MM => something_YYYY_MM_DD. You'd need to specify how to partition each layer of the heirarchy. In the most general case, it could be different partition strategy. If you have a callback function for partition renaming, I think you'd want to pass it not just the current name of the partition, but also the "VALUES" used in partition creation. Like (2020-04-05)TO(2020-05-06). Maybe instead, we'd allow setting a "format" to use to construct the partition name. Like "child.foo_bar_%Y_%m_%d". Ideally, the formats would be fixed-length (zero-padded, etc), so failures with length can happen at "parse" time of the statement and not at "run" time of the creation. You'd still have to handle the case that the name already exists but isn't a partition (or is a partition by doesn't handle the incoming tuple for some reason). Also, maybe your "configuration" syntax would allow specifying other values. Maybe including a retention period (as an INTERVAL for RANGE tables). That's useful if you had a command to PRUNE the oldest partitions, like ALTER..PRUNE. -- Justin
On Mon, Jul 6, 2020 at 6:46 AM Anastasia Lubennikova <a.lubennikova@postgrespro.ru> wrote: > CREATE TABLE ... PARTITION BY partition_method (list_of_columns) > partition_auto_create_clause > > where partition_auto_create_clause is > > CONFIGURATION [IMMEDIATE| DEFERRED] USING partition_bound_spec > > and partition_bound_spec is: > > MODULUS integer | VALUES IN (expr [,...]) [, ....] | INTERVAL > range_step FROM range_start TO range_end Might be good to compare this to what other databases support. > - IMMEDIATE| DEFERRED is optional, DEFERRED is not implemented yet > I wonder, is it worth placing a stub for dynamic partitioning, or we can > rather add these keywords later. I think we should not add any keywords we don't need immediately - and should seek to minimize the number of new keywords that we need to add, though compatibility with other implementations might be a good reason for accepting some new ones. > - HASH and LIST static partitioning works as expected. > Testing and feedback are welcome. > > - RANGE partitioning is not really implemented in this patch. > Now it only accepts interval data type as 'interval' and respectively > date types as range_start and range_end expressions. > Only one partition is created. I found it difficult to implement the > generation of bounds using internal functions and data types. > Both existing solutions (pg_pathman and pg_partman) rely on SQL level > routines [2]. > I am going to implement this via SPI, which allow to simplify checks and > calculations. Do you see any pitfalls in this approach? I don't really see why we need SPI here. Why can't we just try to evaluate the impression and see if we get a constant of the right type, then use that? I think the big problem here is identifying the operator to use. We have no way of identifying the "plus" or "minus" operator associated with a datatype; indeed, that constant doesn't exist. So either we (a) limit this to a short list of data types and hard-code the operators to be used (which is kind of sad given how extensible our type system is) or we (b) invent some new mechanism for identifying the +/- operators that should be used for a datatype, which was also proposed in the context of some previous discussion of window framing options, but which I don't think ever went anywhere (which is a lot of work) or we (c) just look for operators called '+' and/or '-' by operator name (which will probably make Tom throw up in his mouth a little). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Mon, Jul 6, 2020 at 6:46 AM Anastasia Lubennikova > <a.lubennikova@postgrespro.ru> wrote: >> I am going to implement this via SPI, which allow to simplify checks and >> calculations. Do you see any pitfalls in this approach? > I don't really see why we need SPI here. I would vote against any core facility that is implemented via SPI queries. It is just too darn hard to control the semantics completely in the face of fun stuff like varying search_path. Look at what a mess the queries generated by the RI triggers are --- and they only have a very small set of behaviors to worry about. I'm still only about 95% confident they don't have security issues, too. If you're using SPI to try to look up appropriate operators, I think the chances of being vulnerable to security problems are 100%. > I think the big problem here is identifying the operator to use. We > have no way of identifying the "plus" or "minus" operator associated > with a datatype; indeed, that constant doesn't exist. We did indeed solve this in connection with window functions, cf 0a459cec9. I may be misunderstanding what the problem is here, but I think trying to reuse that infrastructure might help. regards, tom lane
On Mon, Jul 6, 2020 at 12:10 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > We did indeed solve this in connection with window functions, cf > 0a459cec9. I may be misunderstanding what the problem is here, > but I think trying to reuse that infrastructure might help. Ah, nice. I didn't realize that we'd added that. But I'm not sure that it helps here, because I think we need to compute the end of the range, not just test whether something is in a range. Like, if someone wants monthly range partitions starting on 2020-01-01, we need to be able to figure out that the subsequent months start on 2020-02-01, 2020-03-01, 2020-04-01, etc. Is there a way to use in_range to achieve that? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Mon, Jul 6, 2020 at 12:10 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> We did indeed solve this in connection with window functions, cf >> 0a459cec9. I may be misunderstanding what the problem is here, >> but I think trying to reuse that infrastructure might help. > Ah, nice. I didn't realize that we'd added that. But I'm not sure that > it helps here, because I think we need to compute the end of the > range, not just test whether something is in a range. Yeah, I was thinking about that later, and I agree that the in_range support function doesn't quite do the job. But we could expand on the principle, and register addition (and subtraction?) functions as btree support functions under the same rules as for in_range functions. The reason in_range isn't just addition is that we wanted it to be able to give correct answers even in cases where addition would overflow. That's still valid for that use-case, but it doesn't apply here. So it'd be something like "btree support function 4, registered under amproclefttype x and amprocrighttype y, must have the signature plus(x, y) returns x and it gives results compatible with the opfamily's ordering of type x". Similarly for subtraction if we think we need that. I'm not sure if we need a formal notion of what "compatible results" means, but it probably would be something like "if x < z according to the opfamily sort ordering, then plus(x, y) < plus(z, y) for any given y". Now this falls to the ground when y is a weird value like Inf or NaN, but we'd want to exclude those as partitioning values anyway. Do we also need some datatype-independent way of identifying such "weird values"? regards, tom lane
Hello Anastasia, My 0.02 €: > The patch implements following syntax: > > CREATE TABLE ... PARTITION BY partition_method (list_of_columns) > partition_auto_create_clause > > where partition_auto_create_clause is > > CONFIGURATION [IMMEDIATE| DEFERRED] USING partition_bound_spec > > and partition_bound_spec is: > > MODULUS integer | VALUES IN (expr [,...]) [, ....] | INTERVAL range_step > FROM range_start TO range_end ISTM That we should avoid new specific syntaxes when possible, and prefer free keyword option style, like it is being discussed for some other commands, because it reduces the impact on the parser. That would suggest a more versatile partition_bound_spec which could look like (<keyword> <constant-or-maybe-even-expr>[, …]): For modulus, looks easy: (MODULUS 8) For interval, maybe something like: (STEP ..., FROM/START ..., TO/END ...) The key point is that for dynamic partitioning there would be no need for boundaries, so that it could just set a point and an interval (START/INIT/FROM??? ..., STEP ...) For lists of values, probably it would make little sense to have dynamic partitioning? Or maybe yes, if we could partition on a column value/expression?! eg "MOD(id, 8)"?? What about pg_dump? Should it be able to regenerate the initial create? > [4] https://wiki.postgresql.org/wiki/Declarative_partitioning_improvements Good point, a wiki is better than a thread for that type of things. I'll look at this page. -- Fabien.
On Wed, Jul 8, 2020 at 10:24 AM Fabien COELHO <coelho@cri.ensmp.fr> wrote: > > > Hello Anastasia, > > My 0.02 €: > > > The patch implements following syntax: > > > > CREATE TABLE ... PARTITION BY partition_method (list_of_columns) > > partition_auto_create_clause > > > > where partition_auto_create_clause is > > > > CONFIGURATION [IMMEDIATE| DEFERRED] USING partition_bound_spec > > > > and partition_bound_spec is: > > > > MODULUS integer | VALUES IN (expr [,...]) [, ....] | INTERVAL range_step > > FROM range_start TO range_end > > ISTM That we should avoid new specific syntaxes when possible, and prefer > free keyword option style, like it is being discussed for some other > commands, because it reduces the impact on the parser. > > That would suggest a more versatile partition_bound_spec which could look > like (<keyword> <constant-or-maybe-even-expr>[, …]): > > For modulus, looks easy: > > (MODULUS 8) > > For interval, maybe something like: > > (STEP ..., FROM/START ..., TO/END ...) > > The key point is that for dynamic partitioning there would be no need for > boundaries, so that it could just set a point and an interval > > (START/INIT/FROM??? ..., STEP ...) > > For lists of values, probably it would make little sense to have dynamic > partitioning? Or maybe yes, if we could partition on a column > value/expression?! eg "MOD(id, 8)"?? > > What about pg_dump? Should it be able to regenerate the initial create? > I don't think this is needed for the proposed "Automatic partitioning (static)" which generates a bunch of CREATE TABLE statements, IIUC. Might be needed later for "Automatic partitioning (dynamic)" where dynamic specifications need to be stored. > > [4] https://wiki.postgresql.org/wiki/Declarative_partitioning_improvements > > Good point, a wiki is better than a thread for that type of things. I'll > look at this page. +1 Regards, Amul
Good to know, thank you for that. I had doubts about the internal usage of SPI,Robert Haas <robertmhaas@gmail.com> writes:On Mon, Jul 6, 2020 at 6:46 AM Anastasia Lubennikova <a.lubennikova@postgrespro.ru> wrote:I am going to implement this via SPI, which allow to simplify checks and calculations. Do you see any pitfalls in this approach?I don't really see why we need SPI here.I would vote against any core facility that is implemented via SPI queries. It is just too darn hard to control the semantics completely in the face of fun stuff like varying search_path. Look at what a mess the queries generated by the RI triggers are --- and they only have a very small set of behaviors to worry about. I'm still only about 95% confident they don't have security issues, too. If you're using SPI to try to look up appropriate operators, I think the chances of being vulnerable to security problems are 100%.
but didn't know what exactly can go wrong.
I think the big problem here is identifying the operator to use. We have no way of identifying the "plus" or "minus" operator associated with a datatype; indeed, that constant doesn't exist.We did indeed solve this in connection with window functions, cf 0a459cec9. I may be misunderstanding what the problem is here, but I think trying to reuse that infrastructure might help.
Do we need to introduce a new support function? Is there a reason why we can
not rely on '+' operator? I understand that the addition operator may lack or
be overloaded for some complex datatypes, but I haven't found any examples that
are useful for range partitioning. Both pg_pathman and pg_partman also use '+'
to generate bounds.
I explored the code a bit more and came up with this function, which is very
similar to generate_series_* functions, but it doesn't use SPI and looks for
the function that implements the '+' operator, instead of direct call:
// almost pseudocode
static Const *
generate_next_bound(Const *start, Const *interval)
{
ObjectWithArgs *sum_oper_object = makeNode(ObjectWithArgs);
sum_oper_object->type = OBJECT_OPERATOR;
/* hardcode '+' operator for addition */
sum_oper_object->objname = list_make1(makeString("+"));
ltype = makeTypeNameFromOid(start->consttype, start->consttypmod);
rtype = makeTypeNameFromOid(interval->consttype, interval->consttypmod);
sum_oper_object->objargs = list_make2(ltype, rtype);
sum_oper_oid = LookupOperWithArgs(sum_oper_object, false);
oprcode = get_opcode(sum_oper_oid);
fmgr_info(oprcode, &opproc);
next_bound->constvalue = FunctionCall2(&opproc,
start->constvalue,
interval->constvalue);
}
Thoughts?
-- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Anastasia Lubennikova <a.lubennikova@postgrespro.ru> writes: > On 06.07.2020 19:10, Tom Lane wrote: >> Robert Haas <robertmhaas@gmail.com> writes: >>> I think the big problem here is identifying the operator to use. We >>> have no way of identifying the "plus" or "minus" operator associated >>> with a datatype; indeed, that constant doesn't exist. >> We did indeed solve this in connection with window functions, cf >> 0a459cec9. I may be misunderstanding what the problem is here, >> but I think trying to reuse that infrastructure might help. > Do we need to introduce a new support function? Is there a reason why we > can not rely on '+' operator? (1) the appropriate operator might not be named '+' (2) even if it is, it might not be in your search_path (3) you're vulnerable to security problems from someone capturing the '+' operator with a better match; since you aren't writing the operator explicitly, you can't fix that by qualifying it (4) if the interval constant is written as an undecorated string literal, the parser may have trouble resolving a match at all > I understand that the addition operator may lack or be overloaded for > some complex datatypes, but I haven't found any examples that are useful > for range partitioning. "It works for all the built-in data types" isn't really a satisfactory answer. But even just in the built-in types, consider "date": # select oid::regoperator from pg_operator where oprname ='+' and oprleft = 'date'::regtype; oid -------------------------------- +(date,interval) +(date,integer) +(date,time without time zone) +(date,time with time zone) (4 rows) It's not that immediately obvious which of these would make sense to use. But the short answer here is that we did not accept relying on '+' being the right thing for window function ranges, and I don't see why it is more acceptable for partitioning ranges. The existing places where our parser relies on implicit operator names are, without exception, problematic [1]. regards, tom lane [1] https://www.postgresql.org/message-id/flat/ffefc172-a487-aa87-a0e7-472bf29735c8%40gmail.com
The previous discussion of automatic partition creation [1] has addressed static and dynamic creation of partitions and ended up with several syntax proposals.
In this thread, I want to continue this work.
...
[1] https://www.postgresql.org/message-id/flat/alpine.DEB.2.21.1907150711080.22273%40lancre
Syntax proposal v2, that takes into account received feedback.
I compared the syntax of other databases. You can find an overview here [1]. It
seems that there is no industry standard, so every DBMS has its own
implementation. I decided to rely on a Greenplum syntax, as the most similar to
the original PostgreSQL syntax.
New proposal is:
CREATE TABLE numbers(int number) PARTITION BY partition_method (list_of_columns) USING (partition_desc) where partition_desc is: MODULUS n | VALUES IN (value_list), [DEFAULT PARTITION part_name] | START ([datatype] 'start_value') END ([datatype] 'end_value') EVERY (partition_step), [DEFAULT PARTITION part_name] where partition_step is: [datatype] [number | INTERVAL] 'interval_value' example: CREATE TABLE years(int year) PARTITION BY RANGE (year) USING (START (2006) END (2016) EVERY (1), DEFAULT PARTITION other_years);
It is less wordy than the previous version. It uses a free keyword option
style. It covers static partitioning for all methods, default partition for
list and range methods, and can be extended to implement dynamic partitioning
for range partitions.
[1] https://wiki.postgresql.org/wiki/Declarative_partitioning_improvements#Other_DBMS
[2] https://wiki.postgresql.org/wiki/Declarative_partitioning_improvements#Proposal_.28is_subject_to_change.29
-- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
New syntax fits to the ALTER command as well.I think you'd want to have an ALTER command for that (we would use that to change tables between daily/monthly based on their current size). That should also support setting the MODULUS of a HASH partitioned table, to allow changing the size of its partitions (currently, the user would have to more or less recreate the table and move all its data into different partitions, but that's not ideal).
ALTER TABLE tbl PARTITION BY HASH (number) USING (partition_desc)In simple cases (i.e. range partitioning granularity), it will simply update
the rule of bound generation, saved in the catalog. More complex hash
partitions will require some rebalancing. Though, the syntax is pretty
straightforward for all cases. In the next versions, we can also add a
CONCURRENTLY keyword to cover partitioning of an existing non-partitioned table
with data.
I don't know if it's important for anyone, but it would be interesting to think about supporting sub-partitioning: partitions which are themselvese partitioned. Like something => something_YYYY => something_YYYY_MM => something_YYYY_MM_DD. You'd need to specify how to partition each layer of the heirarchy. In the most general case, it could be different partition strategy.
I suppose it will be a natural extension of this work. Now we need to ensure
that the proposed syntax is extensible. Greenplum syntax, which I choose as an
example, provides subpartition syntax as well.
If you have a callback function for partition renaming, I think you'd want to pass it not just the current name of the partition, but also the "VALUES" used in partition creation. Like (2020-04-05)TO(2020-05-06). Maybe instead, we'd allow setting a "format" to use to construct the partition name. Like "child.foo_bar_%Y_%m_%d". Ideally, the formats would be fixed-length (zero-padded, etc), so failures with length can happen at "parse" time of the statement and not at "run" time of the creation. You'd still have to handle the case that the name already exists but isn't a partition (or is a partition by doesn't handle the incoming tuple for some reason).
In callback design, I want to use the best from pg_pathman's set_init_callback().
The function accepts jsonb argument, which contains all the data about the
parent table, bounds, and so on. This information can be used to construct name
for the partition and generate RENAME statement.
In this version, I got rid of the 'configuration' keyword. Speaking ofAlso, maybe your "configuration" syntax would allow specifying other values. Maybe including a retention period (as an INTERVAL for RANGE tables). That's useful if you had a command to PRUNE the oldest partitions, like ALTER..PRUNE.
retention, I think that it would be hard to cover all use-cases with a
declarative syntax. While it is relatively easy to implement deletion within a
callback function. See rotation_callback example in pg_pathman [1].
[1] https://github.com/postgrespro/pg_pathman/blob/79e11d94a147095f6e131e980033018c449f8e2e/sql/pathman_callbacks.sql#L107
-- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On 06.07.2020 13:45, Anastasia Lubennikova wrote:The previous discussion of automatic partition creation [1] has addressed static and dynamic creation of partitions and ended up with several syntax proposals.
In this thread, I want to continue this work.
...
[1] https://www.postgresql.org/message-id/flat/alpine.DEB.2.21.1907150711080.22273%40lancreSyntax proposal v2, that takes into account received feedback.
CREATE TABLE numbers(int number) PARTITION BY partition_method (list_of_columns) USING (partition_desc) where partition_desc is: MODULUS n | VALUES IN (value_list), [DEFAULT PARTITION part_name] | START ([datatype] 'start_value') END ([datatype] 'end_value') EVERY (partition_step), [DEFAULT PARTITION part_name] where partition_step is: [datatype] [number | INTERVAL] 'interval_value'It is less wordy than the previous version. It uses a free keyword option
style. It covers static partitioning for all methods, default partition for
list and range methods, and can be extended to implement dynamic partitioning
for range partitions.[1] https://wiki.postgresql.org/wiki/Declarative_partitioning_improvements#Other_DBMS
[2] https://wiki.postgresql.org/wiki/Declarative_partitioning_improvements#Proposal_.28is_subject_to_change.29
Here is the patch for automated HASH and LIST partitioning, that implements proposed syntax.
Range partitioning is more complicated. It will require new support function to calculate bounds, new catalog attribute to store them and so on. So I want to start small and implement automated range partitioning in a separate patch later.
1) Syntax
New syntax is heavily based on Greenplum syntax for automated partitioning with one change. Keyword "USING", that was suggested above, causes shift/reduce conflict with "USING method" syntax of a table access method. It seems that Greenplum folks will face this problem later.
I stick to CONFIGURATION as an existing keyword that makes sense in this context.
Any better ideas are welcome.
Thus, current version is:
CREATE TABLE table_name (attrs)
PARTITION BY partition_method (list_of_columns)
CONFIGURATION (partition_desc)
where partition_desc is:
MODULUS n
| VALUES IN (value_list) [DEFAULT PARTITION part_name]
This syntax can be easily extended for range partitioning as well.
2) Implementation
PartitionBoundAutoSpec is a new part of PartitionSpec, that contains information needed to generate partition bounds.
For HASH and LIST automatic partition creation, transformation happens during parse analysis of CREATE TABLE statement.
transformPartitionAutoCreate() calculates bounds and generates statements to create partition tables.
Partitions are named in a format: $tablename_$partnum. One can use post create hook to rename relations.
For LIST partition one can also define a default partition.
3) TODO
The patch lacks documentation, because I expect some details may change during discussion. Other than that, the feature is ready for review.
Regards
-- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
The patch lacks documentation, because I expect some details may change during discussion. Other than that, the feature is ready for review.
CREATE TABLE tbl (i int) PARTITION BY HASH (i) AUTOMATICALLY (MODULUS 3); (partitions are created automatically)
vs
CREATE TABLE tbl (i int) PARTITION BY HASH (i); (partitions should be created manually by use of PARTITION OF)
CREATE TABLE tbl (i char) PARTITION BY LIST (i) AUTOMATICALLY (VALUES IN ('a', 'b'), ('c', 'd'), ('e','f') DEFAULT PARTITION tbl_default);
vs
CREATE TABLE tbl (i char) PARTITION BY LIST (i); (partitions should be created manually by use of PARTITION OF)
ERROR: out of shared memory
HINT: You might need to increase max_locks_per_transaction.
The patch lacks documentation, because I expect some details may change during discussion. Other than that, the feature is ready for review.
Hi, hackers!From what I've read I see there is much interest in automatic partitions creation. (Overall discussion on the topic is partitioned into two threads: (1) https://www.postgresql.org/message-id/alpine.DEB.2.21.1907150711080.22273%40lancre and (2) https://www.postgresql.org/message-id/flat/7fec3abb-c663-c0d2-8452-a46141be6d4a@postgrespro.ru (current thread) )There were many syntax proposals and finally, there is a patch realizing one of them. So I'd like to review it.The syntax proposed in the patch seems good enough for me and is in accordance with one of the proposals in the discussion. Maybe I'd prefer using the word AUTOMATICALLY/AUTO instead of CONFIGURATION with explicit meaning that using this syntax we'd get already (automatically) created partitions and don't need to create them manually, as in the existing state of postgresql declarative partitioning.CREATE TABLE tbl (i int) PARTITION BY HASH (i) AUTOMATICALLY (MODULUS 3); (partitions are created automatically)
vsCREATE TABLE tbl (i int) PARTITION BY HASH (i); (partitions should be created manually by use of PARTITION OF)
CREATE TABLE tbl (i char) PARTITION BY LIST (i) AUTOMATICALLY (VALUES IN ('a', 'b'), ('c', 'd'), ('e','f') DEFAULT PARTITION tbl_default);
vsCREATE TABLE tbl (i char) PARTITION BY LIST (i); (partitions should be created manually by use of PARTITION OF)
I think this syntax can also be extended later with adding automatic creation of RANGE partitions, with IMMEDIATE/DEFERRED for dynamic/on-demand automatic partition creation, and with SUBPARTITION possibility.But I don't have a strong preference for the word AUTOMATICALLY, moreover I saw opposition to using AUTO at the top of the discussion. I suppose we can go with the existing CONFIGURATION word.
I agree that 'AUTOMATICALLY' keyword is more specific and probably less confusing for users. I've picked 'CONFIGURATION' simply because it is an already existing keyword. It would like to hear other opinions on that.
If compare with existing declarative partitions, I think automatic creation simplifies the process for the end-user and I'd vote for its committing into Postgres. The patch is short and clean in code style. It has enough comments Tests covering the new functionality are included. Yet it doesn't have documentation and I'd suppose it's worth adding it. Even if there will be syntax changes, I hope they will not be more than the replacement of several words. Current syntax is described in the text of a patch.
Fair enough. New patch contains a documentation draft. While writing it, I also noticed, that the syntax, introduced in the patch differs from greenpulm one. For now, list partitioning clause doesn't support 'PARTITION name' part, that is supported in greenplum. I don't think that we aim for 100% compatibility here. Still, the ability to provide table names is probably a good optional feature, especially for list partitions.
What do you think?
The patch applies cleanly and installcheck-world is passed.Some minor things:I've got a compiler warning:parse_utilcmd.c:4280:15: warning: unused variable 'lc' [-Wunused-variable]
Fixed. This was also caught by cfbot. This version should pass it clean.
When the number of partitions is over the maximum value of int32 the output shows a generic syntax error. I don't think it is very important as it is not the case someone will make deliberately, but maybe it's better to output something like "Partitions number is more than the maximum supported value"create table test (i int, t text) partition by hash (i) configuration (modulus 888888888888);ERROR: syntax error at or near "888888888888"
This value is not a valid int32 number, thus parser throws the error before we have a chance to handle it more gracefully.
Well, it looks like a legit error, when we try to lock a lot of objects in one transaction. I will double check if we don't release a lock somewhere.One more piece of nitpicking. Probably we can go just with a mention in documentation.create table test (i int, t text) partition by hash (i) configuration (modulus 8888);
ERROR: out of shared memory
HINT: You might need to increase max_locks_per_transaction.
Do we need to restrict the number of partitions, that can be created by this statement? With what number? As far as I see, there is no such restriction for now, just a recommendation about performance issues. With automatic creation it becomes easier to mess with it.
Probably, it's enough to mention it in documentation and rely on users common sense.
Fixed.Typo:+ /* Add statemets to create each partition after we create parent table */
I also hope that this patch will make it to v14, but for now, I don't see a consensus on the syntax and some details, so I wouldn't rush.Overall I see the patch almost ready for commit and I'd like to meet this functionality in v14.
Besides, it definitely needs more testing. I haven't thoroughly tested following cases yet:
- how triggers and constraints are propagated to partitions;
- how does it handle some tricky clauses in list partitioning expr_list;
and so on.
Also, there is an open question about partition naming. Currently, the patch implements dummy %tbl_%partnum name generation, which is far from user-friendly. I think we must provide some hook or trigger function to rename partitions after they were created.
-- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
On Mon, Sep 14, 2020 at 02:38:56PM +0300, Anastasia Lubennikova wrote: > Fixed. This was also caught by cfbot. This version should pass it clean. Please note that regression tests are failing, because of 6b2c4e59. -- Michael
Attachment
On 24.09.2020 06:27, Michael Paquier wrote: > On Mon, Sep 14, 2020 at 02:38:56PM +0300, Anastasia Lubennikova wrote: >> Fixed. This was also caught by cfbot. This version should pass it clean. > Please note that regression tests are failing, because of 6b2c4e59. > -- > Michael Thank you. Updated patch is attached. Open issues for review: - new syntax; - generation of partition names; - overall patch review and testing, especially with complex partitioning clauses. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
CREATE TABLE tbl_lst (i int) PARTITION BY LIST (i)
CONFIGURATION (values in (1, 2), (3, 4) DEFAULT PARTITION tbl_default);
<para>
Range and list partitioning also support automatic creation of partitions
with an optional <literal>CONFIGURATION</literal> clause.
</para>
+VALUES IN ( <replaceable class="parameter">partition_bound_expr</replaceable> [, ...] ), [( <replaceable class="parameter">partition_bound_expr</replaceable> [, ...] )] [, ...] [DEFAULT PARTITION <replaceable class="parameter">defailt_part_name</replaceable>]
+MODULUS <replaceable class="parameter">numeric_literal</replaceable>
Thank you for your review.Hi Anastasia,I tested the syntax with some basic commands and it works fine, regression tests also pass.
After some consideration, I decided that we don't actually need to introduce IMMEDIATE | DEFERRED keyword. For hash and list partitions it will always be immediate, as the number of partitions cannot change after we initially set it. For range partitions, on the contrary, it doesn't make much sense to make partitions immediately, because in many use-cases one bound will be open.Couple of comments:1. The syntax used omits the { IMMEDIATE | DEFERRED} keywords suggested inthe earlier discussions. I think it is intuitive to include IMMEDIATE with the current implementationso that the syntax can be extended with a DEFERRED clause in future for dynamic partitions.CREATE TABLE tbl_lst (i int) PARTITION BY LIST (i)
CONFIGURATION (values in (1, 2), (3, 4) DEFAULT PARTITION tbl_default);
2. One suggestion for generation of partition names is to append a unique id toavoid conflicts.
Can you please give an example of such a conflict? I agree that current naming scheme is far from perfect, but I think that 'tablename'_partnum provides unique name for each partition.
Yes, you're right. I will fix these typos in next version of the patch.3. Probably, here you mean to write list and hash instead of range and list asper the current state.<para>
Range and list partitioning also support automatic creation of partitions
with an optional <literal>CONFIGURATION</literal> clause.
</para>4. Typo in default_part_name+VALUES IN ( <replaceable class="parameter">partition_bound_expr</replaceable> [, ...] ), [( <replaceable class="parameter">partition_bound_expr</replaceable> [, ...] )] [, ...] [DEFAULT PARTITION <replaceable class="parameter">defailt_part_name</replaceable>]
+MODULUS <replaceable class="parameter">numeric_literal</replaceable>
Thank you,Rahila Syed
-- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
After some consideration, I decided that we don't actually need to introduce IMMEDIATE | DEFERRED keyword. For hash and list partitions it will always be immediate, as the number of partitions cannot change after we initially set it. For range partitions, on the contrary, it doesn't make much sense to make partitions immediately, because in many use-cases one bound will be open.Couple of comments:1. The syntax used omits the { IMMEDIATE | DEFERRED} keywords suggested inthe earlier discussions. I think it is intuitive to include IMMEDIATE with the current implementationso that the syntax can be extended with a DEFERRED clause in future for dynamic partitions.CREATE TABLE tbl_lst (i int) PARTITION BY LIST (i)
CONFIGURATION (values in (1, 2), (3, 4) DEFAULT PARTITION tbl_default);
2. One suggestion for generation of partition names is to append a unique id to
avoid conflicts.Can you please give an example of such a conflict? I agree that current naming scheme is far from perfect, but I think that 'tablename'_partnum provides unique name for each partition.
Sorry for not being clear earlier, I mean the partition name 'tablename_partnum' can conflict with any existing table name.As per current impemetation, if I do the following it results in the table name conflict.postgres=# create table tbl_test_5_1(i int);CREATE TABLEpostgres=# CREATE TABLE tbl_test_5 (i int) PARTITION BY LIST((tbl_test_5)) CONFIGURATION (values in ('(1)'::tbl_test_5), ('(3)'::tbl_test_5) default partition tbl_default_5);ERROR: relation "tbl_test_5_1" already exists
select min(f1), max(f1) from minmaxtest;
QUERY PLAN
---------------------------------------------------------------------------------------------
Result
InitPlan 1 (returns $0)
-> Limit
-> Merge Append
Sort Key: minmaxtest.f1
-> Index Only Scan using minmaxtesti on minmaxtest minmaxtest_1
Index Cond: (f1 IS NOT NULL)
-> Index Only Scan using minmaxtest1i on minmaxtest1 minmaxtest_2
Index Cond: (f1 IS NOT NULL)
-> Index Only Scan Backward using minmaxtest2i on minmaxtest2 minmaxtest_3
Index Cond: (f1 IS NOT NULL)
-> Index Only Scan using minmaxtest3i on minmaxtest3 minmaxtest_4
InitPlan 2 (returns $1)
-> Limit
-> Merge Append
Sort Key: minmaxtest_5.f1 DESC
-> Index Only Scan Backward using minmaxtesti on minmaxtest minmaxtest_6
Index Cond: (f1 IS NOT NULL)
-> Index Only Scan Backward using minmaxtest1i on minmaxtest1 minmaxtest_7
Index Cond: (f1 IS NOT NULL)
-> Index Only Scan using minmaxtest2i on minmaxtest2 minmaxtest_8
Index Cond: (f1 IS NOT NULL)
Attachment
On 06.10.2020 00:21, Pavel Borisov wrote: > Hi, hackers! > I added some extra tests for different cases of use of automatic > partition creation. > v3-0002 can be applied on top of the original v2 patch for correct > work with some corner cases with constraints included in this test. > Thank you for the tests. I've added them and the fix into the patch. I also noticed, that some table parameters, such as persistence were not promoted to auto generated partitions. This is fixed now. The test cases for temp and unlogged auto partitioned tables are updated respectively. Besides, I slightly refactored the code and fixed documentation typos, that were reported by Rahila. With my recent changes, one test statement, that you've added as failing, works. CREATE TABLE list_parted_fail (a int) PARTITION BY LIST (a) CONFIGURATION (VALUES IN ('1' collate "POSIX")); It simply ignores collate POSIX part and creates a table with following structure: Partitioned table "public.list_parted_fail" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description --------+---------+-----------+----------+---------+---------+--------------+------------- a | integer | | | | plain | | Partition key: LIST (a) Partitions: list_parted_fail_0 FOR VALUES IN (1) Do you think that it is a bug? For now, I removed this statement from tests just to calm down the CI. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
Do you think that it is a bug? For now, I removed this statement from
tests just to calm down the CI.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date: Mon Sep 28 13:44:01 2020 -0400
Hi,After some consideration, I decided that we don't actually need to introduce IMMEDIATE | DEFERRED keyword. For hash and list partitions it will always be immediate, as the number of partitions cannot change after we initially set it. For range partitions, on the contrary, it doesn't make much sense to make partitions immediately, because in many use-cases one bound will be open.Couple of comments:1. The syntax used omits the { IMMEDIATE | DEFERRED} keywords suggested inthe earlier discussions. I think it is intuitive to include IMMEDIATE with the current implementationso that the syntax can be extended with a DEFERRED clause in future for dynamic partitions.CREATE TABLE tbl_lst (i int) PARTITION BY LIST (i)
CONFIGURATION (values in (1, 2), (3, 4) DEFAULT PARTITION tbl_default);As per discussions on this thread: https://www.postgresql.org/message-id/flat/alpine.DEB.2.21.1907150711080.22273%40lancreDEFERRED clause refers to creating partitions on the fly, while the data is being inserted.The number of partitions and partition bounds can be the same as specified initiallyduring partitioned table creation, but the actual creation of partitions can be deferred.This seems like a potential extension to statically created partitions even in the case ofhash and list partitions, as it won't involve moving any existing data.
Oh, now I see what you mean. The case with already existing tables will require changes to ALTER TABLE syntax. And that's where we may want to choose between immediate (i.e. locking) and deferred (i.e. concurrent) creation of partitions. I think we should try to implement it with existing keywords, maybe use 'CONCURRENTLY' keyword and it will look like:
ALTER TABLE tbl PARTITION BY ... CONFIGURATION (....) [CONCURRENTLY];
Anyway, the task of handling existing data is much more complicated, especially the 'concurrent' case and to be honest, I haven't put much thought into it yet.
The current patch only implements the simplest case of creating a new partitioned table. And I don't see if CREATE TABLE needs this immediate|deferred clause or if it will need it in the future.
Thoughts?
2. One suggestion for generation of partition names is to append a unique id toavoid conflicts.Can you please give an example of such a conflict? I agree that current naming scheme is far from perfect, but I think that 'tablename'_partnum provides unique name for each partition.
Sorry for not being clear earlier, I mean the partition name 'tablename_partnum' can conflict with any existing table name.As per current impemetation, if I do the following it results in the table name conflict.postgres=# create table tbl_test_5_1(i int);CREATE TABLEpostgres=# CREATE TABLE tbl_test_5 (i int) PARTITION BY LIST((tbl_test_5)) CONFIGURATION (values in ('(1)'::tbl_test_5), ('(3)'::tbl_test_5) default partition tbl_default_5);ERROR: relation "tbl_test_5_1" already exists
I don't mind adding some specific suffix for generated partitions, although it still may conflict with existing table names. The main disadvantage of this idea, is that it reduces number of symbols available for table name, which can lead to something like this:
CREATE TABLE parteddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd (a text, b int NOT NULL DEFAULT 0, CONSTRAINT check_aa CHECK (length(a) > 0))
PARTITION BY LIST (a) CONFIGURATION (VALUES IN ('a','b'),('c','d') DEFAULT PARTITION parted_def) ;;
NOTICE: identifier "parteddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd" will be truncated to "partedddddddddddddddddddddddddddddddddddddddddddddddddddddddddd"
ERROR: relation "partedddddddddddddddddddddddddddddddddddddddddddddddddddddddddd" already exists
The error message here is a bit confusing, as relation 'partedddddddddddddddddddddddddddddddddddddddddddddddddddddddddd' haven't existed before and this is a conflict between partitioned and generated partition table name. For now, I don't know if we can handle it more gracefully. Probably, we could truncate tablename to a shorter size, but it doesn't provide a complete solution, because partition number can contain several digits.
See also pg_partman documentation on the same issue: https://github.com/pgpartman/pg_partman/blob/master/doc/pg_partman.md#naming-length-limits
-- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
2. One suggestion for generation of partition names is to append a unique id toavoid conflicts.Can you please give an example of such a conflict? I agree that current naming scheme is far from perfect, but I think that 'tablename'_partnum provides unique name for each partition.
Sorry for not being clear earlier, I mean the partition name 'tablename_partnum' can conflict with any existing table name.As per current impemetation, if I do the following it results in the table name conflict.postgres=# create table tbl_test_5_1(i int);CREATE TABLEpostgres=# CREATE TABLE tbl_test_5 (i int) PARTITION BY LIST((tbl_test_5)) CONFIGURATION (values in ('(1)'::tbl_test_5), ('(3)'::tbl_test_5) default partition tbl_default_5);ERROR: relation "tbl_test_5_1" already exists
I don't mind adding some specific suffix for generated partitions, although it still may conflict with existing table names. The main disadvantage of this idea, is that it reduces number of symbols available for table name, which can lead to something like this:
CREATE TABLE parteddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd (a text, b int NOT NULL DEFAULT 0, CONSTRAINT check_aa CHECK (length(a) > 0))
PARTITION BY LIST (a) CONFIGURATION (VALUES IN ('a','b'),('c','d') DEFAULT PARTITION parted_def) ;;
NOTICE: identifier "parteddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd" will be truncated to "partedddddddddddddddddddddddddddddddddddddddddddddddddddddddddd"
ERROR: relation "partedddddddddddddddddddddddddddddddddddddddddddddddddddddddddd" already exists
doc/pg_partman.md#naming-length-limits
ERROR: unrecognized auto partition bound specification "a"
On 2020-12-18 21:54, Pavel Borisov wrote: > I've realized one strange effect in current grammar parsing: if I do > > CREATE TABLE foo (a int) PARTITION BY LIST (a) CONFIGURATION (a 1); > ERROR: unrecognized auto partition bound specification "a" > > I consulted the patch code and realized that in fact, the patch > considers it the (invalid) HASH bounds (doesn't find a word 'modulus') > unless it is specified to be (still invalid) LIST. This is due to the > fact that the grammar parser is not context-aware and in the patch, we > tried to avoid the new parser keyword MODULUS. The effect is that > inside a CONFIGURATION parentheses in case of HASH bounds we don't > have a single keyword for the parser to determine it is really a HASH > case. > > It doesn't make the patch work wrongly, besides it checks the validity > of all types of bounds in the HASH case even when the partitioning is > not HASH. I find this slightly bogus. This is because the parser can > not determine the type of partitioning inside the configuration clause > and this makes adding new syntax (e.g. adding RANGE partitioning > configuration inside CONFIGURATION parentheses) complicated. > > So I have one more syntax proposal: to have separate keywords inside > CONFIGURATION parentheses for each partitioning type. > E.g: > CREATE TABLE foo(a int) PARTITION BY LIST(a) CONFIGURATION (FOR VALUES > IN (1,2),(3,4) DEFAULT PARTITION foo_def); > CREATE TABLE foo(a int) PARTITION BY HASH(a) CONFIGURATION (FOR VALUES > WITH MODULUS 3); > CREATE TABLE foo(a int) PARTITION BY RAGE(a) CONFIGURATION (FOR VALUES > FROM 1 TO 1000 INTERVAL 10 DEFAULT PARTITION foo_def); > > This proposal is in accordance with the current syntax of declarative > partitioning: CREATE TABLE foo_1 PARTITION OF foo FOR VALUES ... > > Some more facultative proposals incremental to the abovementioned: > 1. Omit CONFIGURATION with/without parentheses. This makes syntax > closer to (non-automatic) declarative partitioning syntax but the > clause seems less legible (in my opinion). > 2. Omit just FOR VALUES. This makes the clause short, but adds a > difference to (non-automatic) declarative partitioning syntax. > > I'm planning also to add RANGE partitioning syntax to this in the > future and I will be happy if all three types of the syntax could come > along easily. > > I very much appreciate your views on this especially regarding that > changes can be still made easily because the patch is not committed > yet. > > -- > > Best regards, > Pavel Borisov > > Postgres Professional: http://postgrespro.com [1] > > > Links: > ------ > [1] http://www.postgrespro.com In my view, next expressions are the golden ground here. On one hand, not far from the original non-automatic declarative partitioning syntax syntax, on the other hand, omit CONFIGURATION key-word (which is redundant here in terms of gram parsing) makes this expressions less apprehensible for the human. CREATE TABLE foo(a int) PARTITION BY LIST(a) CONFIGURATION (FOR VALUES IN (1,2),(3,4) DEFAULT PARTITION foo_def); CREATE TABLE foo(a int) PARTITION BY HASH(a) CONFIGURATION (FOR VALUES WITH MODULUS 3); CREATE TABLE foo(a int) PARTITION BY RAGE(a) CONFIGURATION (FOR VALUES FROM 1 TO 1000 INTERVAL 10 DEFAULT PARTITION foo_def); In addition to that, adding RANGE PARTITION would be much simpler since we would have specific "branches" in gram instead of using context-sensitive grammar and dealing with it in c-code. --- Best regards, Maxim Orlov.
> CREATE TABLE foo(a int) PARTITION BY LIST(a) CONFIGURATION (FOR VALUES IN > (1,2),(3,4) DEFAULT PARTITION foo_def); I would like to disagree with this syntactic approach because it would very specific to each partition method. IMHO the syntax should be as generic as possible. I'd suggest (probably again) a keyword/value list which would allow to be quite adaptable without inducing any pressure on the parser. -- Fabien.
> CREATE TABLE foo(a int) PARTITION BY LIST(a) CONFIGURATION (FOR VALUES IN
> (1,2),(3,4) DEFAULT PARTITION foo_def);
I would like to disagree with this syntactic approach because it would
very specific to each partition method. IMHO the syntax should be as
generic as possible. I'd suggest (probably again) a keyword/value list
which would allow to be quite adaptable without inducing any pressure on
the parser.
HEllo. >>> CREATE TABLE foo(a int) PARTITION BY LIST(a) CONFIGURATION (FOR VALUES >> IN >>> (1,2),(3,4) DEFAULT PARTITION foo_def); >> >> I would like to disagree with this syntactic approach because it would >> very specific to each partition method. IMHO the syntax should be as >> generic as possible. I'd suggest (probably again) a keyword/value list >> which would allow to be quite adaptable without inducing any pressure on >> the parser. >> > If I remember your proposal correctly it is something like > CREATE TABLE foo(...) PARTITION BY HASH AUTOMATIC (MODULUS 10); Yep, that would be the spirit. > It is still possible but there are some caveats: 1. We'll need to add > keyword MODULUS (and probably AUTOMATIC) to the parser's list. Why? We could accept anything in the list? i.e.: (ident =? value[, ident =? value]*) > I don't against this but as far as I've heard there is some > opposition among PG community against new keywords. Maybe I am wrong. the ident is a keyword that can be interpreted later on, not a "reserved keyword" from a parser perspective, which is the only real issue? The parser does not need to know about it, only the command interpreter which will have to interpret it. AUTOMATIC is a nice parser cue to introduce such a ident-value list. > 2. The existing syntax for declarative partitioning is different to your > proposal. Yep. I think that it was not so good a design choice from a language/extensibility perspective. > It is still not a big problem and your proposal makes query > shorter for several words. I'd just like to see some consensus on the > syntax. Now I must admit there are too many contradictions in opinions > which make progress slow. Also I think it is important to have a really > convenient syntaх. > 2a Maybe we all who participated in the thread can vote for some variant? > 2b Maybe the existing syntax for declarative partitioniong should be given > some priority as it is already committed into CREATE TABLE ... PARTITION OF > ... FOR VALUES IN.. etc. > I'd be happy if everyone will join some version of the proposed syntaх in > this thread and in the previous discussion [1]. If we have a variant with > more than one supporter, sure we can develop patch based on it. > Thank you very much > and Merry Christmas! > > [1] > https://www.postgresql.org/message-id/flat/alpine.DEB.2.21.1907150711080.22273%40lancre > > -- Fabien.
Why? We could accept anything in the list? i.e.:
(ident =? value[, ident =? value]*)
> I don't against this but as far as I've heard there is some
> opposition among PG community against new keywords. Maybe I am wrong.
the ident is a keyword that can be interpreted later on, not a "reserved
keyword" from a parser perspective, which is the only real issue?
The parser does not need to know about it, only the command interpreter
which will have to interpret it. AUTOMATIC is a nice parser cue to
introduce such a ident-value list.
> 2. The existing syntax for declarative partitioning is different to your
> proposal.
Yep. I think that it was not so good a design choice from a
language/extensibility perspective.
> BTW could you tell me a couple of words about pros and cons of c-code > syntax parsing comparing to parsing using gram.y trees? I'd rather use an automatic tool (lexer/parser) if possible instead of doing it by hand if I can. If you want a really nice syntax with clever tricks, then you may need to switch to manual though, but pg/sql is not in that class. > I think both are possible but my predisposition was that we'd better use > the later if possible. I agree. -- Fabien.
> BTW could you tell me a couple of words about pros and cons of c-code
> syntax parsing comparing to parsing using gram.y trees?
I'd rather use an automatic tool (lexer/parser) if possible instead of
doing it by hand if I can. If you want a really nice syntax with clever
tricks, then you may need to switch to manual though, but pg/sql is not in
that class.
> I think both are possible but my predisposition was that we'd better use
> the later if possible.
I agree.
> Fabien, do you consider it possible to change the syntax of declarative > partitioning too? My 0.02 €: What I think does not matter much, what committers think is the way to pass something. However, I do not think that such an idea would pass a committer:-) > It is problematic as it is already committed but also is very tempting > to have the same type of syntax both in automatic partitioning and in > manual (PARTITION OF...) I think that if a "common" syntax, for a given meaning of common, can be thought of, and without breaking backward compatibility, then there may be an argument to provide such a syntax, but I would not put too much energy into that if I were you. I see 3 cases: - partition declaration but no actual table generated, the current version. - partition declaration with actual sub-tables generated, eg for hash where it is pretty straightforward to know what would be needed, or for a bounded range. - partition declaration without generated table, but they are generated on demand, when needed, for a range one may want weekly or monthly without creating tables in advance, esp. if it is unbounded. ISTM that the syntax should be clear and if possible homogeneous for all three use cases, even if they are not implemented yet. It should also allow easy extensibility, hence something without a strong syntax, key/value pairs to be interpreted later. -- Fabien.
My 0.02 €: What I think does not matter much, what committers think is the
way to pass something. However, I do not think that such an idea would
pass a committer:-)
IN (1,2),(3,4) DEFAULT PARTITION foo_def);
CREATE TABLE foo(a int) PARTITION BY HASH(a) CONFIGURATION (FOR VALUES
WITH MODULUS 3);
CREATE TABLE foo(a int) PARTITION BY RAGE(a) CONFIGURATION (FOR VALUES
FROM 1 TO 1000 INTERVAL 10 DEFAULT PARTITION foo_def)
On Wed, Oct 7, 2020 at 6:26 AM Anastasia Lubennikova <a.lubennikova@postgrespro.ru> wrote: > Do you think that it is a bug? For now, I removed this statement from > tests just to calm down the CI. I don't think we can use \d+ on a temporary table here, because the backend ID appears in the namespace, which is causing a failure on one of the CI OSes due to nondeterminism: CREATE TEMP TABLE temp_parted (a char) PARTITION BY LIST (a) CONFIGURATION (VALUES IN ('a') DEFAULT PARTITION temp_parted_default); \d+ temp_parted - Partitioned table "pg_temp_3.temp_parted" + Partitioned table "pg_temp_4.temp_parted"
I don't think we can use \d+ on a temporary table here, because the
backend ID appears in the namespace, which is causing a failure on one
of the CI OSes due to nondeterminism:
CREATE TEMP TABLE temp_parted (a char) PARTITION BY LIST (a)
CONFIGURATION (VALUES IN ('a') DEFAULT PARTITION temp_parted_default);
\d+ temp_parted
- Partitioned table "pg_temp_3.temp_parted"
+ Partitioned table "pg_temp_4.temp_parted"
Attachment
https://commitfest.postgresql.org/32/2694/
I don't know what committers will say, but I think that "ALTER TABLE" might be
the essential thing for this patch to support, not "CREATE". (This is similar
to ALTER..SET STATISTICS, which is not allowed in CREATE.)
The reason is that ALTER is what's important for RANGE partitions, which need
to be created dynamically (for example, to support time-series data
continuously inserting data around 'now'). I assume it's sometimes also
important for LIST. I think this patch should handle those cases better before
being commited, or else we risk implementing grammar and other user-facing interface
that fails to handle what's needed into the future (or that's non-essential).
Even if dynamic creation isn't implemented yet, it seems important to at least
implement the foundation for setting the configuration to *allow* that in the
future, in a manner that's consistent with the initial implementation for
"static" partitions.
ALTER also supports other ideas I mentioned here:
https://www.postgresql.org/message-id/20200706145947.GX4107%40telsasoft.com
- ALTER .. SET interval (for dynamic/deferred RANGE partitioning)
- ALTER .. SET modulus, for HASH partitioning, in the initial implementation,
this would allow CREATING paritions, but wouldn't attempt to handle moving
data if overlapping table already exists:
- Could also set the table-name, maybe by format string;
- Could set "retention interval" for range partitioning;
- Could set if the partitions are themselves partitioned(??)
I think once you allow setting configuration parameters like this, then you
might have an ALTER command to "effect" them, which would create any static
tables required by the configuration. maybe that'd be automatic, but if
there's an "ALTER .. APPLY PARTITIONS" command (or whatever), maybe in the
future, the command could also be used to "repartition" existing table data
into partitions with more fine/course granularity (modulus, or daily vs monthly
range, etc).
--
Justin
I have reviewed the v4 patch. The patch does not get applied on the latest source. Kindly rebase.However I have found few comments.1.> +-- must fail because of wrong configuration> +CREATE TABLE tbl_hash_fail (i int) PARTITION BY HASH (i)> +CONFIGURATION (values in (1, 2), (3, 4) default partition tbl_default);Here some of the keywords are mentioned in UPPER CASE (Ex: CREATE TABLE, CONFIGURATION, etc) and some are mentioned in lower case (Ex: values in, default partition, etc). Kindly make it common. I feel making it to UPPER CASE is better. Please take care of this in all the cases.2. It is better to separate the failure cases and success cases in /src/test/regress/sql/create_table.sql for better readability. All the failure cases can be listed first and then the success cases.3.> + char *part_relname;> +> + /*> + * Generate partition name in the format:> + * $relname_$partnum> + * All checks of name validity will be made afterwards in DefineRelation()> + */> + part_relname = psprintf("%s_%d", cxt->relation->relname, i);The assignment can be done directly while declaring the variable.
Attachment
On Fri, Jul 9, 2021 at 6:30 AM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
> I've rebased the patch and made the changes mentioned.
> PFA v5.
I've set this back to "needs review" in CF.
--
John Naylor
EDB: http://www.enterprisedb.com
> Thank you for your review!
> I've rebased the patch and made the changes mentioned.
> PFA v5.
I've set this back to "needs review" in CF.
On Tue, Mar 2, 2021 at 3:26 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > I don't know what committers will say, but I think that "ALTER TABLE" might be > the essential thing for this patch to support, not "CREATE". (This is similar > to ALTER..SET STATISTICS, which is not allowed in CREATE.) > > The reason is that ALTER is what's important for RANGE partitions, which need > to be created dynamically (for example, to support time-series data > continuously inserting data around 'now'). I assume it's sometimes also > important for LIST. I think this patch should handle those cases better before > being commited, or else we risk implementing grammar and other user-facing interface > that fails to handle what's needed into the future (or that's non-essential). > Even if dynamic creation isn't implemented yet, it seems important to at least > implement the foundation for setting the configuration to *allow* that in the > future, in a manner that's consistent with the initial implementation for > "static" partitions. I don't think it's a hard requirement, but it's an interesting point. My initial reactions to the patch are: - I don't think it's a very good idea to support LIST and HASH but not RANGE. We need a design that can work for all three partitioning strategies, even if we don't have support for all of them in the initial patch. If they CAN all be in the same patch, so much the better. - I am not very impressed with the syntax. CONFIGURATION is an odd word that seems too generic for what we're talking about here. It would be tempting to use a connecting word like WITH or USING except that both would be ambiguous here, so we can't. MySQL and Oracle use the keyword PARTITIONS -- which I realize isn't a keyword at all in PostgreSQL right now -- to introduce the partition specification. DB2 uses no keyword at all; it seems you just say PARTITION BY (mypartitioncol) (...partition specifications go here...). I think either approach could work for us. Avoiding the extra keyword is a plus, especially since I doubt we're likely to support the exact syntax that Oracle and MySQL offer anyway - though if we do, then I'd be in favor of inserting the PARTITIONS keyword so that people's SQL can work without modification. - We need to think a little bit about exactly what we're trying to do. The simplest imaginable thing here would be to just give people a place to put a bunch of partition specifications. So you can imagine letting someone say PARTITION BY HASH (FOR VALUES WITH (MODULUS 2, REMAINDER 0), FOR VALUES WITH (MODULUS 2, REMAINDER 1)). However, the patch quite rightly rejects that approach in favor of the theory that, at CREATE TABLE time, you're just going to want to give a modulus and have the system create one partition for every possible remainder. But that could be expressed even more compactly than what the patch does. Instead of saying PARTITION BY HASH CONFIGURATION (MODULUS 4) we could just let people say PARTITION BY HASH (4) or probably even PARTITION BY HASH 4. - For list partitioning, the patch falls back to just letting you put a bunch of VALUES IN clauses in the CREATE TABLE statement. I don't find something like PARTITION BY LIST CONFIGURATION (VALUES IN (1, 2), (1, 3)) to be particularly readable. What are all the extra keywords adding? We could just say PARTITION BY LIST ((1, 2), (1, 3)). I think I would find that easier to remember; not sure what other people think. As an alternative, PARTITION BY LIST VALUES IN (1, 2), (1, 3) looks workable, too. - What about range partitioning? This is an interesting case because while in theory you could leave gaps between range partitions, in practice people probably don't want to do that very often, and it might be better to have a simpler syntax that caters to the common case, since people can always create partitions individually if they happen to want gaps. So you can imagine making something like PARTITION BY RANGE ((MINVALUE), (42), (163)) mean create two partitions, one from (MINVALUE) to (42) and the other from (42) to (163). I think that would be pretty useful. - Another possible separating keyword here would be INITIALLY, which is already a parser keyword. So then you could have stuff like PARTITION BY HASH INITIALLY 4, PARTITION BY LIST INITIALLY ((1, 2), (1, 3)), PARTITION BY RANGE INITIALLY ((MINVALUE), (42), (163)). - The patch doesn't document the naming convention for the automatically created partitions, and it is worth thinking a bit about how that is going to work. Do people want to be able to specify the name of the partitioned table when they are using this syntax, or are they happy with automatically generated names? If the latter, are they happy with THESE automatically generated names? I guess for HASH appending _%d where %d is the modulus is fine, but it is not necessary so great for LIST. If I said CREATE TABLE foo ... PARTITION BY LIST (('en'), ('ru'), ('jp')) I think I'd be hoping to end up with partitions named foo_en, foo_ru, and foo_jp rather than foo_0, foo_1, foo_2. Or maybe I'd rather say PARTITION BY LIST (foo_en ('en'), foo_ru ('ru'), foo_jp ('jp')) or something like that to be explicit about it. Not sure. But it's worth some thought. I think this comes into focus even more clearly for range partitions, where you probably want the partitions to follow a convention like basetablename_yyyy_mm. - The documentation for the CONFIGURATION option doesn't match the grammar. The documentation makes it an independent clause, so CONFIGURATION could be specified even if PARTITION BY is not. But the implementation makes the better choice to treat CONFIGURATION as a further specification of PARTITION BY. - I don't think this patch is really all that close to being ready for committer. Beyond the design issues which seem to need more thought, there's stuff in the patch like: + elog(DEBUG1,"stransformPartitionAutoCreate HASH i %d MODULUS %d \n %s\n", + i, bound->modulus, nodeToString(part)); Now, on the one hand, debugging elogs like this have little business in a final patch. And, on the other hand, if we were going to include them in the final patch, we'd probably want to at least spell the function name correctly. Similarly, it's evident that this test case has not been carefully reviewed by anyone, including the author: +REATE TABLE fail_parted (a int) PARTITION BY HASH (a) CONFIGURATION +(MODULUS 10 DEFAULT PARTITION hash_default); Not too surprisingly, the system isn't familiar with the REATE command. - There's some questionable error-reporting behavior in here, too, particularly: +CREATE TABLE fail_parted (a int) PARTITION BY LIST (a) CONFIGURATION +(values in (1, 2), (1, 3)); +ERROR: partition "fail_parted_1" would overlap partition "fail_parted_0" +LINE 2: (values in (1, 2), (1, 3)); Since the user hasn't provided the names fail_parted_0 or fail_parted_1, it kind of stinks to use them in an error report. The error cursor is good, but I wonder if we need to do better. One option would be to go to a syntax where the user specifies the partition names explicitly, which then justifies using that name in an error report. Another possibility would be to give a different message in this case, like: ERROR: partition for values in (1, 2) would overlap partition for values in (1, 3) Now, that would require a pretty substantial redesign of the patch, and I'm not sure it's worth the effort. But I'm also not sure that it isn't. -- Robert Haas EDB: http://www.enterprisedb.com
- I don't think it's a very good idea to support LIST and HASH but not
RANGE. We need a design that can work for all three partitioning
strategies, even if we don't have support for all of them in the
initial patch. If they CAN all be in the same patch, so much the
better.
- I am not very impressed with the syntax. CONFIGURATION is an odd
word that seems too generic for what we're talking about here. It
would be tempting to use a connecting word like WITH or USING except
that both would be ambiguous here, so we can't. MySQL and Oracle use
the keyword PARTITIONS -- which I realize isn't a keyword at all in
PostgreSQL right now -- to introduce the partition specification. DB2
uses no keyword at all; it seems you just say PARTITION BY
(mypartitioncol) (...partition specifications go here...). I think
either approach could work for us. Avoiding the extra keyword is a
plus, especially since I doubt we're likely to support the exact
syntax that Oracle and MySQL offer anyway - though if we do, then I'd
be in favor of inserting the PARTITIONS keyword so that people's SQL
can work without modification.
- We need to think a little bit about exactly what we're trying to do.
The simplest imaginable thing here would be to just give people a
place to put a bunch of partition specifications. So you can imagine
letting someone say PARTITION BY HASH (FOR VALUES WITH (MODULUS 2,
REMAINDER 0), FOR VALUES WITH (MODULUS 2, REMAINDER 1)). However, the
patch quite rightly rejects that approach in favor of the theory that,
at CREATE TABLE time, you're just going to want to give a modulus and
have the system create one partition for every possible remainder. But
that could be expressed even more compactly than what the patch does.
Instead of saying PARTITION BY HASH CONFIGURATION (MODULUS 4) we could
just let people say PARTITION BY HASH (4) or probably even PARTITION
BY HASH 4.
- For list partitioning, the patch falls back to just letting you put
a bunch of VALUES IN clauses in the CREATE TABLE statement. I don't
find something like PARTITION BY LIST CONFIGURATION (VALUES IN (1, 2),
(1, 3)) to be particularly readable. What are all the extra keywords
adding? We could just say PARTITION BY LIST ((1, 2), (1, 3)). I think
I would find that easier to remember; not sure what other people
think. As an alternative, PARTITION BY LIST VALUES IN (1, 2), (1, 3)
looks workable, too.
- What about range partitioning? This is an interesting case because
while in theory you could leave gaps between range partitions, in
practice people probably don't want to do that very often, and it
might be better to have a simpler syntax that caters to the common
case, since people can always create partitions individually if they
happen to want gaps. So you can imagine making something like
PARTITION BY RANGE ((MINVALUE), (42), (163)) mean create two
partitions, one from (MINVALUE) to (42) and the other from (42) to
(163). I think that would be pretty useful.
- Another possible separating keyword here would be INITIALLY, which
is already a parser keyword. So then you could have stuff like
PARTITION BY HASH INITIALLY 4, PARTITION BY LIST INITIALLY ((1, 2),
(1, 3)), PARTITION BY RANGE INITIALLY ((MINVALUE), (42), (163)).
CREATE TABLE foo (bar text) PARTITION BY LIST (bar) PARTITIONS (foo_us('US'), foo_uk_ru('UK', 'RU'), { DEFAULT foo_dflt | AUTOMATIC });
CREATE TABLE foo (bar int) PARTITION BY HASH (bar) PARTITIONS (5);
CREATE TABLE foo (bar int) PARTITION BY RANGE (bar) PARTITIONS (FROM 1 TO 10 INTERVAL 2, { DEFAULT foo_dflt | AUTOMATIC });
On Wed, Jul 14, 2021 at 7:28 AM Pavel Borisov <pashkin.elfe@gmail.com> wrote: > What do you think will the described approach lead to a useful patch? Should it be done as a whole or it's possible tocommit it in smaller steps? (E.g. first part without AUTOMATIC capability, then add AUTOMATIC capability. Or with someother order of features implementation) I would suggest that you consider on-the-fly partition creation to be a completely separate feature from initial partition creation as part of CREATE TABLE. I think you can have either without the other, and I think the latter is a lot easier than the former. I doubt that on-the-fly partition creation makes any sense at all for hash partitions; there seems to be no reason not to pre-create all the partitions. It's pretty straightforward to see how it should work for LIST, but RANGE needs an interval or something to be stored in the system catalogs so you can figure out where to put the boundaries, and somehow you've got to identify a + operator for the relevant data type. Tom Lane probably won't be thrilled if you suggest looking it up based on the operator NAME. The bigger issue IMHO with on-the-fly partition creation is avoiding deadlocks in the presence of current inserters; I submit that without at least some kind of attempt to avoid deadlocks and spurious errors there, it's not really a usable scheme, and that seems hard. On the other hand, modulo syntax details, creating partitions at CREATE TABLE time seems relatively simple and, especially in the case of hash partitioning, useful. -- Robert Haas EDB: http://www.enterprisedb.com
On Tue, Jul 20, 2021 at 02:42:16PM -0400, Robert Haas wrote: > The bigger issue IMHO with on-the-fly > partition creation is avoiding deadlocks in the presence of current > inserters; I submit that without at least some kind of attempt to > avoid deadlocks and spurious errors there, it's not really a usable > scheme, and that seems hard. I was thinking that for dynamic creation, there would be a DDL command to create the necessary partitions: -- Creates 2021-01-02, unless the month already exists: ALTER TABLE bydate SET GRANULARITY='1day'; ALTER TABLE bydate CREATE PARTITION FOR VALUE ('2021-01-02'); I'd want it to support changing the granularity of the range partitions: -- Creates 2021-01 unless the month already exists. -- Errors if a day partition already exists which would overlap? ALTER TABLE bydate SET granularity='1month'; ALTER TABLE bydate CREATE PARTITION FOR VALUE ('2021-01-03'); It could support creating ranges, which might create multiple partitions, depending on the granularity: ALTER TABLE bydate CREATE PARTITION FOR VALUES ('2021-01-01') TO ('2021-02-01') Or the catalog could include not only granularity, but also endpoints: ALTER TABLE bydate SET ENDPOINTS ('2012-01-01') ('2022-01-01') ALTER TABLE bydate CREATE PARTITIONS; --create anything needed to fill from a->b ALTER TABLE bydate PRUNE PARTITIONS; --drop anything outside of [a,b] I would use this to set "fine" granularity for large tables, and "course" granularity for tables that were previously set to "fine" granularity, but its partitions are no longer large enough to justify it. This logic currently exists in our application - we create partitions dynamically immediately before inserting. But it'd be nicer if it were created asynchronously. It may create tables which were never inserted into, which is fine - they'd be course granularity tables (one per month). I think this might elegantly allow both 1) subpartitioning; 2) repartitioning to a different granularity (for which I currently have my own tool). -- Justin
On Tue, Jul 20, 2021 at 3:13 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > On Tue, Jul 20, 2021 at 02:42:16PM -0400, Robert Haas wrote: > > The bigger issue IMHO with on-the-fly > > partition creation is avoiding deadlocks in the presence of current > > inserters; I submit that without at least some kind of attempt to > > avoid deadlocks and spurious errors there, it's not really a usable > > scheme, and that seems hard. > > I was thinking that for dynamic creation, there would be a DDL command to > create the necessary partitions: > > -- Creates 2021-01-02, unless the month already exists: > ALTER TABLE bydate SET GRANULARITY='1day'; > ALTER TABLE bydate CREATE PARTITION FOR VALUE ('2021-01-02'); Well, that dodges the deadlock issue with doing it implicitly, but it also doesn't seem to offer a lot of value over just creating the partitions in a fully manual way. I mean you could just say: CREATE TABLE bydate_2021_02_02 PARTITION OF bydate FOR VALUES FROM ('2021-01-02') TO ('2021-02-03'); It's longer, but it's not really that bad. -- Robert Haas EDB: http://www.enterprisedb.com
This thread has stalled since July with review comments unanswered, I'm marking the patch Returned with Feedback. Please feel free to resubmit when/if a new patch is available. -- Daniel Gustafsson https://vmware.com/
This thread has stalled since July with review comments unanswered, I'm marking
the patch Returned with Feedback. Please feel free to resubmit when/if a new
patch is available.
--
Daniel Gustafsson https://vmware.com/