Re: [PATCH] Automatic HASH and LIST partition creation - Mailing list pgsql-hackers

From Anastasia Lubennikova
Subject Re: [PATCH] Automatic HASH and LIST partition creation
Date
Msg-id 74a716a3-986d-c737-6217-6a3ffd488da3@postgrespro.ru
Whole thread Raw
In response to Re: [PATCH] Automatic HASH and LIST partition creation  (Pavel Borisov <pashkin.elfe@gmail.com>)
Responses Re: [PATCH] Automatic HASH and LIST partition creation  (Michael Paquier <michael@paquier.xyz>)
List pgsql-hackers
On 08.09.2020 17:03, Pavel Borisov wrote:

The patch lacks documentation, because I expect some details may change during discussion. Other than that, the feature is ready for review.

Hi, hackers!

From what I've read I see there is much interest in automatic partitions creation. (Overall discussion on the topic is partitioned into two threads: (1) https://www.postgresql.org/message-id/alpine.DEB.2.21.1907150711080.22273%40lancre and (2) https://www.postgresql.org/message-id/flat/7fec3abb-c663-c0d2-8452-a46141be6d4a@postgrespro.ru (current thread) ) 

There were many syntax proposals and finally, there is a patch realizing one of them. So I'd like to review it.

The syntax proposed in the patch seems good enough for me and is in accordance with one of the proposals in the discussion. Maybe I'd prefer using the word AUTOMATICALLY/AUTO instead of CONFIGURATION with explicit meaning that using this syntax we'd get already (automatically) created partitions and don't need to create them manually, as in the existing state of postgresql declarative partitioning. 

CREATE TABLE tbl (i int) PARTITION BY HASH (i) AUTOMATICALLY (MODULUS 3); (partitions are created automatically)
vs
CREATE TABLE tbl (i int) PARTITION BY HASH (i); (partitions should be created manually by use of PARTITION OF)
 
CREATE TABLE tbl (i char) PARTITION BY LIST (i) AUTOMATICALLY (VALUES IN ('a', 'b'), ('c', 'd'), ('e','f') DEFAULT PARTITION tbl_default);
vs
CREATE TABLE tbl (i char) PARTITION BY LIST (i); (partitions should be created manually by use of PARTITION OF)

I think this syntax can also be extended later with adding automatic creation of RANGE partitions, with IMMEDIATE/DEFERRED for dynamic/on-demand automatic partition creation, and with SUBPARTITION possibility.

But I don't have a strong preference for the word AUTOMATICALLY, moreover I saw opposition to using AUTO at the top of the discussion. I suppose we can go with the existing CONFIGURATION word.

I agree that 'AUTOMATICALLY' keyword is more specific and probably less confusing for users. I've picked 'CONFIGURATION' simply because it is an already existing keyword. It would like to hear other opinions on that.


If compare with existing declarative partitions, I think automatic creation simplifies the process for the end-user and  I'd vote for its committing into Postgres. The patch is short and clean in code style. It has enough comments Tests covering the new functionality are included. Yet it doesn't have documentation and I'd suppose it's worth adding it. Even if there will be syntax changes, I hope they will not be more than the replacement of several words. Current syntax is described in the text of a patch.


Fair enough. New patch contains a documentation draft. While writing it, I also noticed, that the syntax, introduced in the patch differs from greenpulm one. For now, list partitioning clause doesn't support 'PARTITION name' part, that is supported in greenplum. I don't think that we aim for 100% compatibility here. Still, the ability to provide table names is probably a good optional feature, especially for list partitions.

What do you think?

The patch applies cleanly and installcheck-world is passed. 

Some minor things:

I've got a compiler warning:
parse_utilcmd.c:4280:15: warning: unused variable 'lc' [-Wunused-variable]

Fixed. This was also caught by cfbot. This version should pass it clean.


When the number of partitions is over the maximum value of int32 the output shows a generic syntax error. I don't think it is very important as it is not the case someone will make deliberately, but maybe it's better to output something like "Partitions number is more than the maximum supported value"
create table test (i int, t text) partition by hash (i) configuration (modulus 888888888888);
ERROR:  syntax error at or near "888888888888"

This value is not a valid int32 number, thus parser throws the error before we have a chance to handle it more gracefully.


One more piece of nitpicking. Probably we can go just with a mention in documentation. 
create table test (i int, t text) partition by hash (i) configuration (modulus 8888);
ERROR:  out of shared memory
HINT:  You might need to increase max_locks_per_transaction.

Well, it looks like a legit error, when we try to lock a lot of objects in one transaction. I will double check if we don't release a lock somewhere.

Do we need to restrict the number of partitions, that can be created by this statement? With what number?  As far as I see, there is no such restriction for now, just a recommendation about performance issues. With automatic creation it becomes easier to mess with it.

Probably, it's enough to mention it in documentation and rely on users common sense.

Typo:
+ /* Add statemets to create each partition after we create parent table */

Fixed.

Overall I see the patch almost ready for commit and I'd like to meet this functionality in v14.
I also hope that this patch will make it to v14, but for now, I don't see a consensus on the syntax and some details, so I wouldn't rush.

Besides, it definitely needs more testing. I haven't thoroughly tested following cases yet:
- how triggers and constraints are propagated to partitions;
- how does it handle some tricky clauses in list partitioning expr_list;
and so on.

Also, there is an open question about partition naming. Currently, the patch implements dummy %tbl_%partnum name generation, which is far from user-friendly. I think we must provide some hook or trigger function to rename partitions after they were created.

-- 
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachment

pgsql-hackers by date:

Previous
From: Dilip Kumar
Date:
Subject: Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
Next
From: Ranier Vilela
Date:
Subject: Re: Avoid incorrect allocation in buildIndexArray