Home > mailing lists

Re: Declarative partitioning - Mailing list pgsql-hackers

From	Corey Huinker
Subject	Re: Declarative partitioning
Date	January 17, 2016 00:47:54
Msg-id	CADkLM=fazD-GFZKmERQ1pObi8BiQb99+zaEj8BDW02FyptdWYA@mail.gmail.com Whole thread Raw
In response to	Re: Declarative partitioning (Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>)
Responses	Re: Declarative partitioning
List	pgsql-hackers

Tree view

If we have a CREATE statement for each partition, how do we generalize
that to partitions at different levels? For example, if we use something
like the following to create a partition of parent_name:

CREATE PARTITION partition_name OF parent_name FOR VALUES ...
WITH ... TABLESPACE ...

Do we then say:

CREATE PARTITION subpartition_name OF partition_name ...

That's how I'd want it for partitions created after the initial partitioned table is created.

I'd like to be able to identify the parent partition by it's own partitioning parameters rather than name, like the way we can derive the name of an index in ON CONFLICT. But I see no clean way to do that, and if one did come up, we'd simply allow the user to replace

<partition_name>

with

table_name PARTITION partition_spec [...PARTITION partition_spec [ ...PARTITION turtles_all_the_way_down]]).

Again, totally fine with forcing the maintenance script to know or discover the name of the partition to be subpartitioned...for now.

to create a level 2 partition (sub-partition) of parent_name? Obviously,
as is readily apparent from the command, it is still a direct partition of
partition_name for all internal purposes (consider partition list caching
in relcache, recursive tuple routing, etc.) save some others.

I ask that also because it's related to the choice of syntax to use to
declare the partition key for the multi-level case. I'm considering the
SUBPARTITION BY notation and perhaps we could generalize it to more than
just 2 levels. So, for the above case, parent_name would have been created as:

CREATE TABLE parent_name PARTITION BY ... SUBPARTITION BY ...

Needless to say, when subpartition_name is created with the command we saw
a moment ago, the root partitioned table would be locked. In fact, adding
a partition anywhere in the hierarchy needs an exclusive lock on the root
table. Also, partition rule (the FOR VALUES clause) would be validated
against PARTITION BY or SUBPARTITION BY clause at the respective level.

Although, I must admit I feel a little uneasy about the inherent asymmetry
in using SUBPARTITION BY for key declaration whereas piggybacking CREATE
PARTITION for creating sub-partitions. Is there a better way?

Provided that the syntax allows for N levels of partitioning, I don't care if it's

PARTITION BY.., PARTITION BY..., PARTITION BY ...

PARTITION BY.., SUBPARTITION BY..., SUBPARTITION BY ...

The first is probably better for meta-coding purposes, but the second makes it clear which partition layer is first.

> As for the convenience syntax (if at all), how about:
>
> CREATE TABLE foo (
> ...
> )
> PARTITION BY ... ON (...)
> SUBPARTITION BY ... ON (...)
> opt_partition_list;
>
> where opt_partition_list is:
>
> PARTITIONS (
> partname FOR VALUES ... [WITH] [TABLESPACE] opt_subpart_list
> [, ...]
> )
>
> where opt_subpart_list is:
>
> SUBPARTITIONS (
> subpartname FOR VALUES ... [WITH] [ TABLESPACE]
> [, ...]
> )

Do we want this at all? It seems difficult to generalize this to
multi-level hierarchy of more than 2 levels.

I want this.

Granted the syntax of a 3+ level partitioning would be cumbersome, but it is what the user wanted, and the nested PARTITION/SUBPARTITION. In those cases, the user might opt to not create more than the default first subpartition to keep the syntax sane, or we might auto-generate default partitions (with a VALUES clause of whatever "all values" is for that datatype...again, this is an area where leveraging range types would be most valuable).

On one hand, I think to keep treating "partition hierarchies" as
"inheritance hierachies" might have some issues. I am afraid that
documented inheritance semantics may not be what we want to keep using for
the new partitioned tables. By that, I mean all the user-facing behaviors
where inheritance has some bearing. Should it also affect new partitioned
tables? Consider whether inheritance semantics would render infeasible
some of the things that we'd like to introduce for the new partitioned
tables such as automatic tuple routing, or keep us from improving planner
smarts and executor capabilities for partitioned tables over what we
already have.

I feel that Automatic tuple routing should be considered they key benefit of "real" partitions over inherited tables. Trigger maintenance is most of the work of custom partitioning schemes, at least the ones I've written.

There's a great chance that not everyone cares right now about this part
of the new partitioning but just want to put it out there. There are more
contentious issues like the syntax, partitioning maintenance commands that
we plan to support (now or later) and such.

What I've read so far addresses most of my concerns.

Still somewhat on my mind:

1. ability to describe partition bounds via range types, regardless of whether the Automatic Tuple Routing uses those types internally.

2. syntax for splitting a partition in two, merging two adjacent partitions (you probably touched on these earlier and I missed it or forgot).

3. ability to swap a partition with a table not currently associated with the partitioned table.

4. The applicability of this syntax to materialized views, allowing us to do REFRESH CONCURRENTLY a few parts at a time, or only refreshing the data we know needs it.

Items 2 and 3 don't have to be implemented right away, as they're separate ALTER commands. 4 is a pipe dream. With Item 1 I ask only that we don't pick a syntax that prevents description via range types.

pgsql-hackers by date:

From: Andres Freund
Date: 16 January 2016, 23:48:55
Subject: Re: exposing pg_controldata and pg_config as functions

From: Peter Eisentraut
Date: 17 January 2016, 02:09:33
Subject: Re: Some bugs in psql_complete of psql

Re: Declarative partitioning - Mailing list pgsql-hackers

Previous

Next