Re: Proposal: Automatic partition creation - Mailing list pgsql-hackers

From Anastasia Lubennikova
Subject Re: Proposal: Automatic partition creation
Date
Msg-id 524595d3-4a87-274a-e1b4-eeab65801af2@postgrespro.ru
Whole thread Raw
In response to Re: Proposal: Automatic partition creation  (Justin Pryzby <pryzby@telsasoft.com>)
List pgsql-hackers
On 06.07.2020 17:59, Justin Pryzby wrote:
I think you'd want to have an
ALTER command for that (we would use that to change tables between
daily/monthly based on their current size).  That should also support setting
the MODULUS of a HASH partitioned table, to allow changing the size of its
partitions (currently, the user would have to more or less recreate the table
and move all its data into different partitions, but that's not ideal).
New syntax fits to the ALTER command as well.
ALTER TABLE tbl
PARTITION BY HASH (number)
USING (partition_desc)
In simple cases (i.e. range partitioning granularity), it will simply update
the rule of bound generation, saved in the catalog. More complex hash
partitions will require some rebalancing. Though, the syntax is pretty
straightforward for all cases. In the next versions, we can also add a
CONCURRENTLY keyword to cover partitioning of an existing non-partitioned table
with data.
I don't know if it's important for anyone, but it would be interesting to think
about supporting sub-partitioning: partitions which are themselvese partitioned.
Like something => something_YYYY => something_YYYY_MM => something_YYYY_MM_DD.
You'd need to specify how to partition each layer of the heirarchy.  In the
most general case, it could be different partition strategy.

I suppose it will be a natural extension of this work. Now we need to ensure
that the proposed syntax is extensible. Greenplum syntax, which I choose as an
example, provides subpartition syntax as well.

If you have a callback function for partition renaming, I think you'd want to
pass it not just the current name of the partition, but also the "VALUES" used
in partition creation.  Like (2020-04-05)TO(2020-05-06).  Maybe instead, we'd
allow setting a "format" to use to construct the partition name.  Like
"child.foo_bar_%Y_%m_%d".  Ideally, the formats would be fixed-length
(zero-padded, etc), so failures with length can happen at "parse" time of the
statement and not at "run" time of the creation.  You'd still have to handle
the case that the name already exists but isn't a partition (or is a partition
by doesn't handle the incoming tuple for some reason).

In callback design, I want to use the best from pg_pathman's set_init_callback().
The function accepts jsonb argument, which contains all the data about the
parent table, bounds, and so on. This information can be used to construct name
for the partition and generate RENAME statement.

Also, maybe your "configuration" syntax would allow specifying other values.
Maybe including a retention period (as an INTERVAL for RANGE tables).  That's
useful if you had a command to PRUNE the oldest partitions, like ALTER..PRUNE.
In this version, I got rid of the 'configuration' keyword. Speaking of
retention, I think that it would be hard to cover all use-cases with a
declarative syntax. While it is relatively easy to implement deletion within a
callback function. See rotation_callback example in pg_pathman [1].

[1] https://github.com/postgrespro/pg_pathman/blob/79e11d94a147095f6e131e980033018c449f8e2e/sql/pathman_callbacks.sql#L107
-- 
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: recovering from "found xmin ... from before relfrozenxid ..."
Next
From: Fabrízio de Royes Mello
Date:
Subject: Re: pg_dump bug for extension owned tables