Declarative partitioning grammar - Mailing list pgsql-hackers

Hi all,

Many of you will have read the dynamic partitioning thread here:

http://archives.postgresql.org/pgsql-hackers/2008-01/msg00028.php

I've proposed an alternative approach, which we've called declarative
partitioning which is grammar based. This grammar was developed by Jeff
Cohen at Greenplum with some assistance from myself. It is to be
completely open source.

The grammar will need some refinement and we'd appreciate comments,
suggestions, etc. The grammar is designed to address the range of use
cases we have out there.

Basics
------

CREATE TABLE is modified to accept a PARTITION BY clause. This clause
contains one or more partition declarations. The syntax is as follows:

PARTITION BY {partition_type} (column_name[, column_name...])
[PARTITIONS number] (      partition_declaration[, partition_declaration...]
 )

The partition type can be one of HASH, RANGE or LIST. The column names
are the partitioning key. The PARTITIONS sub clause instructs the system
on the number of partitions to create if we're doing HASH or, in the
case of LIST or RANGE can act as a safe guard for users who want to
ensure that they do not generate more than a certain number of
partitions.

We have discussed adding the partition type REFERENCE which is akin to
the LIKE clause in CREATE TABLE. That is, it duplicates the partition
configuration of the specified table. Input would be appreciated.

Partition declarations
----------------------

Hash
----

...   PARTITION BY HASH(order_date) PARTITIONS 5;

This will create 5 partitions on the column order_date. Inserts will be
distributed roughly evenly across the 5 partitions.

List
----

...    PARTITION BY LIST (state)      (PARTITION q1_northwest VALUES ('OR', 'WA'),    PARTITION q1_southwest VALUES
('AZ','UT', 'NM'),    PARTITION q1_northeast VALUES  ('NY', 'VM', 'NJ'),    PARTITION q1_southeast VALUES ('FL', 'GA'),
  PARTITION q1_northcentral VALUES ('SD', 'WI'),    PARTITION q1_southcentral VALUES ('OK', 'TX'));
 

Here, we produce 6 different partitions. The first partition groups
states in the North West of the USA. We introduce here the named
partition concept for clarity.

Range
-----

Range has the most expressive grammar. I'll introduce it in steps:

...    PARTITION BY RANGE (b)
(                              
PARTITION aa start (date '2007-01-01') end (date '2008-01-01'),
PARTITION bb start (date '2008-01-01') end (date '2009-01-01') 
);

Here, we create 2 partitions: aa and bb. Partition aa has the range
2007-01-01 to 2008-01-01; partition bb has the range 2008-01-01 to
2009-01-01. Intervals always have this problem: are the bounds included
in the range? To deal with this we define: the start of the range is
included in the range. The ending bound is not. This can be modified
with the keywords INCLUSIVE and EXCLUSIVE, which modify this property on
a rule by rule basis.

It is common that these partitions follow a pattern, such as following
every week, month or year. So, we support the following specification:

...   PARTITION BY RANGE(order_date)     (    START (date '2005-12-01') end (date '2007-12-01')        EVERY(interval
'2months')     );
 

If we like, we can mix the specification a little:

...   PARTITION BY RANGE(order_date)    ( PARTITION Q1_2005 end (date '2005-04-01'),      PARTITION Q2_2005 end (date
'2005-07-01'),     PARTITION Q3_2005 end (date '2005-10-10'),      PARTITION Q4_2005 end (date '2006-01-01'),
START(date '2006-02-01') end (date '2008-04-01')             EVERY (interval '2 weeks')    );
 

an interesting result of the flexibility of the grammar we've come up
with is that you can do something like this:

...   PARTITION BY RANGE(order_date)     ( PARTITION minny end date '2004-12-01'),       end (date '2006-12-01'),
PARTITIONmaxny start (date '2006-12-01')     ); 
 

Here, when order_date is less than 2004-12-01, we put the data in minny,
when it is between 2004-12-01 and 2006-12-01 we put it in an unnamed
partition and after this we put it in maxny.

Tablespaces
-----------

We allow inline tablespace specification, such as:

...   PARTITION BY RANGE(order_date)     (       PARTITION minny TABLESPACE compress,       start (date '2004-12-01')
end(date '2006-12-01') TABLESPACE hot,       PARTITION maxny TABLESPACE compress     );
 

I've used the term compress here intentionally. A number of operating
systems now ship file systems which can compress partitions. Users with
issues with the amount of data they want to keep online can delay the
time until they need new storage to a future date by compressing less
regularly used data with this technique, for a performance cost. Data
being used heavily can live on an uncompressed file system, affected.

Multi-column support
--------------------

We can have multi-column partitions.

...  PARTITION BY LIST (state, deptno)     (        VALUES ('OR', 1, 'WA', 1),       VALUES ('AZ', 1, 'UT', 1, 'NM',
1),      VALUES ('OR', 2, 'WA', 2),       VALUES ('AZ', 2, 'UT', 2, 'NM', 2),       PARTITION region_null VALUES (NULL,
NULL),      PARTITION region_other     );
 

Looking at this syntax now, I think I prefer:

VALUES ('OR', 1),('WA', 1)

To specify keys for the same partition. Thoughts?

Composite partition support
---------------------------

Given that we're talking about systems with potentially very large
amounts of data, power users may want to combine range partitioning with
hash or list partitioning. For example, your analysis might always be on
a date range but also be broken down by sales office. So, this would
combine range and list partitioning (if the sales offices were known) or
hash partitioning (if they weren't known).

To do this, we introduce the SUBPARTITION clause:

...    PARTITION BY RANGE(order_date) SUBPARTITION BY HASH (office_id)       SUBPARTITIONS 8       (        start (date
'2005-12-01')end (date '2007-12-01')                  every (interval '3 months'),           start (date '2007-12-01')
           end (date '2008-12-01') every (interval '1 month')    );
 

The first partition specification covers 8 partitions, the second 12 for
20 partitions in total. Once we add the subpartitioning we have 160
partitions in total (20 * 8).

Subpartitioning by list can look like this (see templates below):

...    PARTITION BY RANGE(order_date) SUBPARTITION BY LIST (customer_id)       (       partition minny (subpartition c1
values(1), subpartition                           c2 values (2)),        start (date '2004-12-01') end (date
'2006-12-01')            (subpartition c1 values (1), subpartition c2 values (2)),        partition maxy (values (1),
values(2)   )
 

So, the list parameters of each sub partition look like arguments to the
primary partition. Again, see templates below if you think this looks
cumbersome.

We do not preclude subpartitions of subpartitions. So, the following is
valid:

...   PARTITION BY HASH(b)
PARTITIONS 2
SUBPARTITION BY HASH(d)
SUBPARTITIONS 2,
SUBPARTITION BY HASH(e) SUBPARTITIONS 2,
SUBPARTITION BY HASH(f) SUBPARTITIONS 2,
SUBPARTITION BY HASH(g) SUBPARTITIONS 2,
SUBPARTITION BY HASH(h) SUBPARTITIONS 2;

Subpartition templates
----------------------

There are times we want subpartitions to be laid out in a specific way
for all partitions. To do this, we use templates:

...   PARTITION BY RANGE (order_date)      SUBPARTITION BY LIST (state)   SUBPARTITION TEMPLATE   (      SUBPARTITION
northwestVALUES ('OR', 'WA'),      SUBPARTITION southwest VALUES ('AZ', 'UT', 'NM'),      SUBPARTITION northeast VALUES
('NY', 'VM', 'NJ'),      SUBPARTITION southeast VALUES ('FL', 'GA'),      SUBPARTITION northcentral VALUES ('SD',
'WI'),     SUBPARTITION southcentral VALUES ('OK', 'TX')   )      (start (date '2001-01-01') end (date '2010-01-01')
 every (interval '3 months')   )
 

For each of the 36 odd partitions we create here, each is subpartitioned
into geographical areas.

Data management with ALTER
--------------------------

These are all arguments to ALTER TABLE. All of these require validation
against the existing specification.

ADD
---

For range and list partitioning, it's important to be able to add
partitions for data not covered by the existing specification. So, we
propose:

...  ADD PARTITION q1_2008 end (date '2008-04-01')

COALESCE (maybe)
----------------

For hash partitions, remove a partition from the number of hash
partitions and distribute its data to the remaining partitions.

... COALESCE PARTITION [name];

I'm not sure if this is really used but other systems we looked at have
it. Thoughts?

DROP
----

For list and range partitions, drop a specified partition from the set
of partitions.

... DROP PARTITION minny;

This drops a named partition. Often, it will be difficult for users to
know partition names, and they might be unnamed. So, we allow this
syntax:

... DROP PARTITION FOR(date '2007-01-01');

for range partitions; and:

... DROP PARTITION FOR(VALUES('CA'));

for list partitions.

We've also discussed something like:

... DROP PARTITION FOR(POSITION(1));

so that users can easily drop a specific partition in an array of range
partitions. It seems to me, though, that the use case is generally to
drop the oldest partition so perhaps we should have a more explicit
syntax. Thoughts?

EXCHANGE
--------

This sub-clause allows us to make a table a partition in a set of
partitions or take a partition out of a set but keep it as a table. IBM
uses ATTACH and DETACH, which is explicit but Oracle uses EXCHANGE. I'll
explain the latter:

... EXCHANGE <partition identifier> WITH TABLE <table name>

partition identifier is one of PARTITION <name> or PARTITION FOR(...).
The partition in the partition set 'becomes' the table <table name> and
vice-versa. Essentially, we'd swap the relfilenodes. This means that we
have to first ADD PARTITION then swap the table and the partition.
Thoughts?

MERGE
-----

You can merge and list partitions and any two range partitions:

... MERGE <partition id>, <partition id> [INTO PARTITION <partition name>]

For range partitions:

... MERGE PARTITION FOR(date '2006_01_01'), PARTITION FOR(date '2007-01-01');

For list partitions:

... MERGE PARTITION FOR(VALUES('CA', 'MA')

This begs the question of why we have COALESCE for hash partitioning. I
don't know, it just seems like the right thing since you can't merge two
hash partitions together (well, you shouldn't want to).

RENAME
------

Rename a partition. We can use partition name or FOR clause.

SPLIT
-----

Split is used to divide a partition in two. It is designed for list and
range partitioning but I guess we could/should support hash. I need to
think about that. For RANGE partitions:

... SPLIT <partition id> <AT-clause> [INTO (PARTITION <partition name1>,    PARTITION <partition name2>)];

AT clause specifies the point at which the partition is split in two:

... SPLIT PARTITION FOR(2000) AT 1000 INTO PARTITION (part1000,    part2000)

We might want ways to do this with unnamed partitions, it seems to me.
Thoughts?

For list:

... SPLIT PARTITION region_east AT( VALUES ('CT', 'MA', 'MD') )    INTO     (      PARTITION region_east_1,
PARTITIONregion_east_2    );
 

In this case, values from region_east specified in the AT() list are put in
region_east_1 and the rest are put in region_east_2.

I think a better way for supporting split with hash is via ADD. I'm sure
some people think that ugly so I'd like feedback.

TRUNCATE
--------

Truncate a specified partition:

... TRUNCATE PARTITION FOR ('2005-01-01')

We could specify a name too.

This will use truncate internally.


pgsql-hackers by date:

Previous
From: "Gokulakannan Somasundaram"
Date:
Subject: Re: Some ideas about Vacuum
Next
From: Chris Browne
Date:
Subject: Re: Transaction Snapshot Cloning