Thread: Declarative partitioning grammar

Declarative partitioning grammar

From

Gavin Sherry

Date:

11 January 2008, 19:13:45

Hi all,

Many of you will have read the dynamic partitioning thread here:

http://archives.postgresql.org/pgsql-hackers/2008-01/msg00028.php

I've proposed an alternative approach, which we've called declarative
partitioning which is grammar based. This grammar was developed by Jeff
Cohen at Greenplum with some assistance from myself. It is to be
completely open source.

The grammar will need some refinement and we'd appreciate comments,
suggestions, etc. The grammar is designed to address the range of use
cases we have out there.

Basics
------

CREATE TABLE is modified to accept a PARTITION BY clause. This clause
contains one or more partition declarations. The syntax is as follows:

PARTITION BY {partition_type} (column_name[, column_name...])
[PARTITIONS number] (      partition_declaration[, partition_declaration...]
 )

The partition type can be one of HASH, RANGE or LIST. The column names
are the partitioning key. The PARTITIONS sub clause instructs the system
on the number of partitions to create if we're doing HASH or, in the
case of LIST or RANGE can act as a safe guard for users who want to
ensure that they do not generate more than a certain number of
partitions.

We have discussed adding the partition type REFERENCE which is akin to
the LIKE clause in CREATE TABLE. That is, it duplicates the partition
configuration of the specified table. Input would be appreciated.

Partition declarations
----------------------

Hash
----

...   PARTITION BY HASH(order_date) PARTITIONS 5;

This will create 5 partitions on the column order_date. Inserts will be
distributed roughly evenly across the 5 partitions.

List
----

...    PARTITION BY LIST (state)      (PARTITION q1_northwest VALUES ('OR', 'WA'),    PARTITION q1_southwest VALUES
('AZ','UT', 'NM'),    PARTITION q1_northeast VALUES  ('NY', 'VM', 'NJ'),    PARTITION q1_southeast VALUES ('FL', 'GA'),
  PARTITION q1_northcentral VALUES ('SD', 'WI'),    PARTITION q1_southcentral VALUES ('OK', 'TX'));
 

Here, we produce 6 different partitions. The first partition groups
states in the North West of the USA. We introduce here the named
partition concept for clarity.

Range
-----

Range has the most expressive grammar. I'll introduce it in steps:

...    PARTITION BY RANGE (b)
(                              
PARTITION aa start (date '2007-01-01') end (date '2008-01-01'),
PARTITION bb start (date '2008-01-01') end (date '2009-01-01') 
);

Here, we create 2 partitions: aa and bb. Partition aa has the range
2007-01-01 to 2008-01-01; partition bb has the range 2008-01-01 to
2009-01-01. Intervals always have this problem: are the bounds included
in the range? To deal with this we define: the start of the range is
included in the range. The ending bound is not. This can be modified
with the keywords INCLUSIVE and EXCLUSIVE, which modify this property on
a rule by rule basis.

It is common that these partitions follow a pattern, such as following
every week, month or year. So, we support the following specification:

...   PARTITION BY RANGE(order_date)     (    START (date '2005-12-01') end (date '2007-12-01')        EVERY(interval
'2months')     );
 

If we like, we can mix the specification a little:

...   PARTITION BY RANGE(order_date)    ( PARTITION Q1_2005 end (date '2005-04-01'),      PARTITION Q2_2005 end (date
'2005-07-01'),     PARTITION Q3_2005 end (date '2005-10-10'),      PARTITION Q4_2005 end (date '2006-01-01'),
START(date '2006-02-01') end (date '2008-04-01')             EVERY (interval '2 weeks')    );
 

an interesting result of the flexibility of the grammar we've come up
with is that you can do something like this:

...   PARTITION BY RANGE(order_date)     ( PARTITION minny end date '2004-12-01'),       end (date '2006-12-01'),
PARTITIONmaxny start (date '2006-12-01')     ); 
 

Here, when order_date is less than 2004-12-01, we put the data in minny,
when it is between 2004-12-01 and 2006-12-01 we put it in an unnamed
partition and after this we put it in maxny.

Tablespaces
-----------

We allow inline tablespace specification, such as:

...   PARTITION BY RANGE(order_date)     (       PARTITION minny TABLESPACE compress,       start (date '2004-12-01')
end(date '2006-12-01') TABLESPACE hot,       PARTITION maxny TABLESPACE compress     );
 

I've used the term compress here intentionally. A number of operating
systems now ship file systems which can compress partitions. Users with
issues with the amount of data they want to keep online can delay the
time until they need new storage to a future date by compressing less
regularly used data with this technique, for a performance cost. Data
being used heavily can live on an uncompressed file system, affected.

Multi-column support
--------------------

We can have multi-column partitions.

...  PARTITION BY LIST (state, deptno)     (        VALUES ('OR', 1, 'WA', 1),       VALUES ('AZ', 1, 'UT', 1, 'NM',
1),      VALUES ('OR', 2, 'WA', 2),       VALUES ('AZ', 2, 'UT', 2, 'NM', 2),       PARTITION region_null VALUES (NULL,
NULL),      PARTITION region_other     );
 

Looking at this syntax now, I think I prefer:

VALUES ('OR', 1),('WA', 1)

To specify keys for the same partition. Thoughts?

Composite partition support
---------------------------

Given that we're talking about systems with potentially very large
amounts of data, power users may want to combine range partitioning with
hash or list partitioning. For example, your analysis might always be on
a date range but also be broken down by sales office. So, this would
combine range and list partitioning (if the sales offices were known) or
hash partitioning (if they weren't known).

To do this, we introduce the SUBPARTITION clause:

...    PARTITION BY RANGE(order_date) SUBPARTITION BY HASH (office_id)       SUBPARTITIONS 8       (        start (date
'2005-12-01')end (date '2007-12-01')                  every (interval '3 months'),           start (date '2007-12-01')
           end (date '2008-12-01') every (interval '1 month')    );
 

The first partition specification covers 8 partitions, the second 12 for
20 partitions in total. Once we add the subpartitioning we have 160
partitions in total (20 * 8).

Subpartitioning by list can look like this (see templates below):

...    PARTITION BY RANGE(order_date) SUBPARTITION BY LIST (customer_id)       (       partition minny (subpartition c1
values(1), subpartition                           c2 values (2)),        start (date '2004-12-01') end (date
'2006-12-01')            (subpartition c1 values (1), subpartition c2 values (2)),        partition maxy (values (1),
values(2)   )
 

So, the list parameters of each sub partition look like arguments to the
primary partition. Again, see templates below if you think this looks
cumbersome.

We do not preclude subpartitions of subpartitions. So, the following is
valid:

...   PARTITION BY HASH(b)
PARTITIONS 2
SUBPARTITION BY HASH(d)
SUBPARTITIONS 2,
SUBPARTITION BY HASH(e) SUBPARTITIONS 2,
SUBPARTITION BY HASH(f) SUBPARTITIONS 2,
SUBPARTITION BY HASH(g) SUBPARTITIONS 2,
SUBPARTITION BY HASH(h) SUBPARTITIONS 2;

Subpartition templates
----------------------

There are times we want subpartitions to be laid out in a specific way
for all partitions. To do this, we use templates:

...   PARTITION BY RANGE (order_date)      SUBPARTITION BY LIST (state)   SUBPARTITION TEMPLATE   (      SUBPARTITION
northwestVALUES ('OR', 'WA'),      SUBPARTITION southwest VALUES ('AZ', 'UT', 'NM'),      SUBPARTITION northeast VALUES
('NY', 'VM', 'NJ'),      SUBPARTITION southeast VALUES ('FL', 'GA'),      SUBPARTITION northcentral VALUES ('SD',
'WI'),     SUBPARTITION southcentral VALUES ('OK', 'TX')   )      (start (date '2001-01-01') end (date '2010-01-01')
 every (interval '3 months')   )
 

For each of the 36 odd partitions we create here, each is subpartitioned
into geographical areas.

Data management with ALTER
--------------------------

These are all arguments to ALTER TABLE. All of these require validation
against the existing specification.

ADD
---

For range and list partitioning, it's important to be able to add
partitions for data not covered by the existing specification. So, we
propose:

...  ADD PARTITION q1_2008 end (date '2008-04-01')

COALESCE (maybe)
----------------

For hash partitions, remove a partition from the number of hash
partitions and distribute its data to the remaining partitions.

... COALESCE PARTITION [name];

I'm not sure if this is really used but other systems we looked at have
it. Thoughts?

DROP
----

For list and range partitions, drop a specified partition from the set
of partitions.

... DROP PARTITION minny;

This drops a named partition. Often, it will be difficult for users to
know partition names, and they might be unnamed. So, we allow this
syntax:

... DROP PARTITION FOR(date '2007-01-01');

for range partitions; and:

... DROP PARTITION FOR(VALUES('CA'));

for list partitions.

We've also discussed something like:

... DROP PARTITION FOR(POSITION(1));

so that users can easily drop a specific partition in an array of range
partitions. It seems to me, though, that the use case is generally to
drop the oldest partition so perhaps we should have a more explicit
syntax. Thoughts?

EXCHANGE
--------

This sub-clause allows us to make a table a partition in a set of
partitions or take a partition out of a set but keep it as a table. IBM
uses ATTACH and DETACH, which is explicit but Oracle uses EXCHANGE. I'll
explain the latter:

... EXCHANGE <partition identifier> WITH TABLE <table name>

partition identifier is one of PARTITION <name> or PARTITION FOR(...).
The partition in the partition set 'becomes' the table <table name> and
vice-versa. Essentially, we'd swap the relfilenodes. This means that we
have to first ADD PARTITION then swap the table and the partition.
Thoughts?

MERGE
-----

You can merge and list partitions and any two range partitions:

... MERGE <partition id>, <partition id> [INTO PARTITION <partition name>]

For range partitions:

... MERGE PARTITION FOR(date '2006_01_01'), PARTITION FOR(date '2007-01-01');

For list partitions:

... MERGE PARTITION FOR(VALUES('CA', 'MA')

This begs the question of why we have COALESCE for hash partitioning. I
don't know, it just seems like the right thing since you can't merge two
hash partitions together (well, you shouldn't want to).

RENAME
------

Rename a partition. We can use partition name or FOR clause.

SPLIT
-----

Split is used to divide a partition in two. It is designed for list and
range partitioning but I guess we could/should support hash. I need to
think about that. For RANGE partitions:

... SPLIT <partition id> <AT-clause> [INTO (PARTITION <partition name1>,    PARTITION <partition name2>)];

AT clause specifies the point at which the partition is split in two:

... SPLIT PARTITION FOR(2000) AT 1000 INTO PARTITION (part1000,    part2000)

We might want ways to do this with unnamed partitions, it seems to me.
Thoughts?

For list:

... SPLIT PARTITION region_east AT( VALUES ('CT', 'MA', 'MD') )    INTO     (      PARTITION region_east_1,
PARTITIONregion_east_2    );
 

In this case, values from region_east specified in the AT() list are put in
region_east_1 and the rest are put in region_east_2.

I think a better way for supporting split with hash is via ADD. I'm sure
some people think that ugly so I'd like feedback.

TRUNCATE
--------

Truncate a specified partition:

... TRUNCATE PARTITION FOR ('2005-01-01')

We could specify a name too.

This will use truncate internally.

Re: Declarative partitioning grammar

From

Ron Mayer

Date:

11 January 2008, 19:53:05

Gavin Sherry wrote:
> CREATE TABLE is modified to accept a PARTITION BY clause. This clause
> contains one or more partition declarations. The syntax is as follows:
> PARTITION BY {partition_type} (column_name[, column_name...])
> [PARTITIONS number]
>   (
>        partition_declaration[, partition_declaration...]
> 
>   )
> The partition type can be one of HASH, RANGE or LIST.

What would be the drawbacks of CREATE TABLE tablename(...) PARTITION BY function_taking_row_returning_partition_name
instead of the explicit types?

It seems that with my own function I could pretty easily emulate
the HASH,RANGE,or LIST types.   It seems a function returning a
partition name would largely avoid the need for the sub-partition stuff
too -- at least for the cases when the only reason you wanted
sub-partitions was for composite partition support.

I'm not sure if a function would give more flexibility, but
it sure seems it'd be easier for me to remember than the various
PARTITION BY LIST (a) (VALUES ('L') SUBPARTITION BY RANGE (b) (VALUES('x'),VALUES('y')),VALUES ('M') SUBPARTITION BY
RANGE(b) (VALUES('u'),VALUES('t')))

or whowever it'd look.

Re: Declarative partitioning grammar

From

Mike

Date:

11 January 2008, 20:04:02

Pardon my ignorance as I've never actually used partitioning before but
plan to in the near future, but couldn't the grammar resemble a common
WHERE clause more closely?

> ...   PARTITION BY RANGE(order_date)
>       (
>         START (date '2005-12-01') end (date '2007-12-01')
>            EVERY(interval '2 months')
>       );
>

PARTITION BY RANGE(order_date) ( WHERE order_date >= '2005-12-01' AND order_date < '2007-12-01' EVERY interval '2
months') 

OR

PARTITION BY RANGE(order_date) ( WHERE order_date BETWEEN '2005-12-01' AND '2007-12-01' )

Of course using '>','>=','<','<=' instead of start/end eliminates
any ambiguity along with the need for INCLUSIVE/EXCLUSIVE.

> ...  PARTITION BY LIST (state, deptno)
>       (
>         VALUES ('OR', 1, 'WA', 1),
>         VALUES ('AZ', 1, 'UT', 1, 'NM', 1),
>         VALUES ('OR', 2, 'WA', 2),
>         VALUES ('AZ', 2, 'UT', 2, 'NM', 2),
>         PARTITION region_null VALUES (NULL, NULL),
>         PARTITION region_other
>       );

PARTITION BY LIST (state,deptno) (PARTITION one WHERE state in ('OR', WA') AND deptno = 1 PARTITION two WHERE state
in('AZ', UT') AND deptno IN (1,2) PARTITION region_null WHERE state is null OR deptno is NULLPARTITION region_other 
);

Do you even need to list the columns in the PARTITION BY part?

PARTITION BY LIST (PARTITION one WHERE state in ('OR', WA') AND deptno = 1 PARTITION two WHERE state in ('AZ', UT')
ANDdeptno IN (1,2) PARTITION region_null WHERE state is null OR deptno is NULLPARTITION region_other 
);

Is there really a reason to not have a named partition as well? Sure it
saves a few keystrokes, but it makes trying to do anything with them at
a later date that much more difficult.

Your originally suggested grammar might be shorter to type, but using
WHERE clause syntax we are all familiar with seems a lot more intuitive
to me on the surface at least. Why not try to reuse grammar that already
exists as much as possible?

On Sat, 2008-01-12 at 00:19 +0100, Gavin Sherry wrote:

> CREATE TABLE is modified to accept a PARTITION BY clause. This clause
> contains one or more partition declarations. The syntax is as follows:
>
> PARTITION BY {partition_type} (column_name[, column_name...])
> [PARTITIONS number]
>   (
>        partition_declaration[, partition_declaration...]
>
>   )

> List
> ----
>
> ...    PARTITION BY LIST (state)
>        (PARTITION q1_northwest VALUES ('OR', 'WA'),
>         PARTITION q1_southwest VALUES ('AZ', 'UT', 'NM'),
>         PARTITION q1_northeast VALUES  ('NY', 'VM', 'NJ'),
>         PARTITION q1_southeast VALUES ('FL', 'GA'),
>         PARTITION q1_northcentral VALUES ('SD', 'WI'),
>         PARTITION q1_southcentral VALUES ('OK', 'TX'));
>
> Here, we produce 6 different partitions. The first partition groups
> states in the North West of the USA. We introduce here the named
> partition concept for clarity.
>
> Range
> -----
>
> Range has the most expressive grammar. I'll introduce it in steps:
>
> ...    PARTITION BY RANGE (b)
> (
> PARTITION aa start (date '2007-01-01') end (date '2008-01-01'),
> PARTITION bb start (date '2008-01-01') end (date '2009-01-01')
> );
>
> Here, we create 2 partitions: aa and bb. Partition aa has the range
> 2007-01-01 to 2008-01-01; partition bb has the range 2008-01-01 to
> 2009-01-01. Intervals always have this problem: are the bounds included
> in the range? To deal with this we define: the start of the range is
> included in the range. The ending bound is not. This can be modified
> with the keywords INCLUSIVE and EXCLUSIVE, which modify this property on
> a rule by rule basis.
>
> It is common that these partitions follow a pattern, such as following
> every week, month or year. So, we support the following specification:
>
> ...   PARTITION BY RANGE(order_date)
>       (
>         START (date '2005-12-01') end (date '2007-12-01')
>            EVERY(interval '2 months')
>       );
>
> If we like, we can mix the specification a little:
>
> ...   PARTITION BY RANGE(order_date)
>      ( PARTITION Q1_2005 end (date '2005-04-01'),
>        PARTITION Q2_2005 end (date '2005-07-01'),
>        PARTITION Q3_2005 end (date '2005-10-10'),
>        PARTITION Q4_2005 end (date '2006-01-01'),
>        START (date '2006-02-01') end (date '2008-04-01')
>              EVERY (interval '2 weeks')
>      );
>
> an interesting result of the flexibility of the grammar we've come up
> with is that you can do something like this:
>
> ...   PARTITION BY RANGE(order_date)
>       ( PARTITION minny end date '2004-12-01'),
>         end (date '2006-12-01'),
>         PARTITION maxny start (date '2006-12-01')
>       );
>
> Here, when order_date is less than 2004-12-01, we put the data in minny,
> when it is between 2004-12-01 and 2006-12-01 we put it in an unnamed
> partition and after this we put it in maxny.
>
> Tablespaces
> -----------
>
> We allow inline tablespace specification, such as:
>
> ...   PARTITION BY RANGE(order_date)
>       (
>         PARTITION minny TABLESPACE compress,
>         start (date '2004-12-01') end (date '2006-12-01') TABLESPACE hot,
>         PARTITION maxny TABLESPACE compress
>       );
>
> I've used the term compress here intentionally. A number of operating
> systems now ship file systems which can compress partitions. Users with
> issues with the amount of data they want to keep online can delay the
> time until they need new storage to a future date by compressing less
> regularly used data with this technique, for a performance cost. Data
> being used heavily can live on an uncompressed file system, affected.
>
> Multi-column support
> --------------------
>
> We can have multi-column partitions.
>
> ...  PARTITION BY LIST (state, deptno)
>       (
>         VALUES ('OR', 1, 'WA', 1),
>         VALUES ('AZ', 1, 'UT', 1, 'NM', 1),
>         VALUES ('OR', 2, 'WA', 2),
>         VALUES ('AZ', 2, 'UT', 2, 'NM', 2),
>         PARTITION region_null VALUES (NULL, NULL),
>         PARTITION region_other
>       );
>
> Looking at this syntax now, I think I prefer:
>
> VALUES ('OR', 1),('WA', 1)
>
> To specify keys for the same partition. Thoughts?
>
> Composite partition support
> ---------------------------
>
> Given that we're talking about systems with potentially very large
> amounts of data, power users may want to combine range partitioning with
> hash or list partitioning. For example, your analysis might always be on
> a date range but also be broken down by sales office. So, this would
> combine range and list partitioning (if the sales offices were known) or
> hash partitioning (if they weren't known).
>
> To do this, we introduce the SUBPARTITION clause:
>
> ...    PARTITION BY RANGE(order_date) SUBPARTITION BY HASH (office_id)
>         SUBPARTITIONS 8
>         (
>             start (date '2005-12-01') end (date '2007-12-01')
>                   every (interval '3 months'),
>             start (date '2007-12-01')
>                   end (date '2008-12-01') every (interval '1 month')
>         );
>
> The first partition specification covers 8 partitions, the second 12 for
> 20 partitions in total. Once we add the subpartitioning we have 160
> partitions in total (20 * 8).
>
> Subpartitioning by list can look like this (see templates below):
>
> ...    PARTITION BY RANGE(order_date) SUBPARTITION BY LIST (customer_id)
>        (
>            partition minny (subpartition c1 values (1), subpartition
>                             c2 values (2)),
>            start (date '2004-12-01') end (date '2006-12-01')
>                  (subpartition c1 values (1), subpartition c2 values (2)),
>            partition maxy (values (1), values (2)
>        )
>
> So, the list parameters of each sub partition look like arguments to the
> primary partition. Again, see templates below if you think this looks
> cumbersome.
>
> We do not preclude subpartitions of subpartitions. So, the following is
> valid:
>
> ...   PARTITION BY HASH(b)
> PARTITIONS 2
> SUBPARTITION BY HASH(d)
> SUBPARTITIONS 2,
> SUBPARTITION BY HASH(e) SUBPARTITIONS 2,
> SUBPARTITION BY HASH(f) SUBPARTITIONS 2,
> SUBPARTITION BY HASH(g) SUBPARTITIONS 2,
> SUBPARTITION BY HASH(h) SUBPARTITIONS 2;
>
> Subpartition templates
> ----------------------
>
> There are times we want subpartitions to be laid out in a specific way
> for all partitions. To do this, we use templates:
>
> ...   PARTITION BY RANGE (order_date)
>        SUBPARTITION BY LIST (state)
>        SUBPARTITION TEMPLATE
>        (
>        SUBPARTITION northwest VALUES ('OR', 'WA'),
>        SUBPARTITION southwest VALUES ('AZ', 'UT', 'NM'),
>        SUBPARTITION northeast VALUES  ('NY', 'VM', 'NJ'),
>        SUBPARTITION southeast VALUES ('FL', 'GA'),
>        SUBPARTITION northcentral VALUES ('SD', 'WI'),
>        SUBPARTITION southcentral VALUES ('OK', 'TX')
>        )
>        (start (date '2001-01-01') end (date '2010-01-01')
>           every (interval '3 months')
>        )
>
> For each of the 36 odd partitions we create here, each is subpartitioned
> into geographical areas.
>
> Data management with ALTER
> --------------------------
>
> These are all arguments to ALTER TABLE. All of these require validation
> against the existing specification.
>
> ADD
> ---
>
> For range and list partitioning, it's important to be able to add
> partitions for data not covered by the existing specification. So, we
> propose:
>
> ...  ADD PARTITION q1_2008 end (date '2008-04-01')
>
> COALESCE (maybe)
> ----------------
>
> For hash partitions, remove a partition from the number of hash
> partitions and distribute its data to the remaining partitions.
>
> ... COALESCE PARTITION [name];
>
> I'm not sure if this is really used but other systems we looked at have
> it. Thoughts?
>
> DROP
> ----
>
> For list and range partitions, drop a specified partition from the set
> of partitions.
>
> ... DROP PARTITION minny;
>
> This drops a named partition. Often, it will be difficult for users to
> know partition names, and they might be unnamed. So, we allow this
> syntax:
>
> ... DROP PARTITION FOR(date '2007-01-01');
>
> for range partitions; and:
>
> ... DROP PARTITION FOR(VALUES('CA'));
>
> for list partitions.
>
> We've also discussed something like:
>
> ... DROP PARTITION FOR(POSITION(1));
>
> so that users can easily drop a specific partition in an array of range
> partitions. It seems to me, though, that the use case is generally to
> drop the oldest partition so perhaps we should have a more explicit
> syntax. Thoughts?
>
> EXCHANGE
> --------
>
> This sub-clause allows us to make a table a partition in a set of
> partitions or take a partition out of a set but keep it as a table. IBM
> uses ATTACH and DETACH, which is explicit but Oracle uses EXCHANGE. I'll
> explain the latter:
>
> ... EXCHANGE <partition identifier> WITH TABLE <table name>
>
> partition identifier is one of PARTITION <name> or PARTITION FOR(...).
> The partition in the partition set 'becomes' the table <table name> and
> vice-versa. Essentially, we'd swap the relfilenodes. This means that we
> have to first ADD PARTITION then swap the table and the partition.
> Thoughts?
>
> MERGE
> -----
>
> You can merge and list partitions and any two range partitions:
>
> ... MERGE <partition id>, <partition id> [INTO PARTITION <partition name>]
>
> For range partitions:
>
> ... MERGE PARTITION FOR(date '2006_01_01'), PARTITION FOR(date '2007-01-01');
>
> For list partitions:
>
> ... MERGE PARTITION FOR(VALUES('CA', 'MA')
>
> This begs the question of why we have COALESCE for hash partitioning. I
> don't know, it just seems like the right thing since you can't merge two
> hash partitions together (well, you shouldn't want to).
>
> RENAME
> ------
>
> Rename a partition. We can use partition name or FOR clause.
>
> SPLIT
> -----
>
> Split is used to divide a partition in two. It is designed for list and
> range partitioning but I guess we could/should support hash. I need to
> think about that. For RANGE partitions:
>
> ... SPLIT <partition id> <AT-clause> [INTO (PARTITION <partition name1>,
>      PARTITION <partition name2>)];
>
> AT clause specifies the point at which the partition is split in two:
>
> ... SPLIT PARTITION FOR(2000) AT 1000 INTO PARTITION (part1000,
>      part2000)
>
> We might want ways to do this with unnamed partitions, it seems to me.
> Thoughts?
>
> For list:
>
> ... SPLIT PARTITION region_east AT( VALUES ('CT', 'MA', 'MD') )
>      INTO
>      (
>       PARTITION region_east_1,
>       PARTITION region_east_2
>      );
>
> In this case, values from region_east specified in the AT() list are put in
> region_east_1 and the rest are put in region_east_2.
>
> I think a better way for supporting split with hash is via ADD. I'm sure
> some people think that ugly so I'd like feedback.
>
> TRUNCATE
> --------
>
> Truncate a specified partition:
>
> ... TRUNCATE PARTITION FOR ('2005-01-01')
>
> We could specify a name too.
>
> This will use truncate internally.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
--
Mike <ipso@snappymail.ca>

Re: Declarative partitioning grammar

From

"Warren Turkal"

Date:

11 January 2008, 20:16:35

On Jan 11, 2008 3:42 PM, Ron Mayer <rm_pg@cheapcomplexdevices.com> wrote:
> What would be the drawbacks of
>   CREATE TABLE tablename(...)
>   PARTITION BY function_taking_row_returning_partition_name
> instead of the explicit types?

Would that still allow the optimizer to work as well as it could? It
seems that an arbitrary map like that can't be optimized very well as
it might be too general.

wt

Re: Declarative partitioning grammar

From

Jeff Cohen

Date:

11 January 2008, 20:40:36

On Jan 11, 2008, at 4:16 PM, Warren Turkal wrote:

> On Jan 11, 2008 3:42 PM, Ron Mayer <rm_pg@cheapcomplexdevices.com>  
> wrote:
>> What would be the drawbacks of
>>   CREATE TABLE tablename(...)
>>   PARTITION BY function_taking_row_returning_partition_name
>> instead of the explicit types?
>
> Would that still allow the optimizer to work as well as it could? It
> seems that an arbitrary map like that can't be optimized very well as
> it might be too general.

We did look at allowing general functions for partitioning and this  
was one concern.  The other is that we want to enforce that a row  
only gets inserted into a single partition, so we wanted a  
declarative syntax where it was relatively easy to check that range  
and list specifications don't overlap.

kind regards,

Jeff Cohen

Re: Declarative partitioning grammar

From

"Merlin Moncure"

Date:

11 January 2008, 20:46:44

On Jan 11, 2008 6:19 PM, Gavin Sherry <swm@alcove.com.au> wrote:
> Hi all,
>
> Many of you will have read the dynamic partitioning thread here:
>
> http://archives.postgresql.org/pgsql-hackers/2008-01/msg00028.php
>
> I've proposed an alternative approach, which we've called declarative
> partitioning which is grammar based. This grammar was developed by Jeff
> Cohen at Greenplum with some assistance from myself. It is to be
> completely open source.

I like the syntax, but what I really want to know is how well this is
going to work with the query planner.  The current 8.x table
partitioning mechanism has a lot of issues...it causes more problems
than it solves unless you are willing to strictly control how you
query the tables...I usually end up rolling my own.

Are you confident that what you propose will integrate well with the
planner and produce (as much as possible) fully optimized queries?

merlin

Re: Declarative partitioning grammar

From

Gavin Sherry

Date:

11 January 2008, 20:53:52

On Fri, Jan 11, 2008 at 07:46:36PM -0500, Merlin Moncure wrote:
> On Jan 11, 2008 6:19 PM, Gavin Sherry <swm@alcove.com.au> wrote:
> > Hi all,
> >
> > Many of you will have read the dynamic partitioning thread here:
> >
> > http://archives.postgresql.org/pgsql-hackers/2008-01/msg00028.php
> >
> > I've proposed an alternative approach, which we've called declarative
> > partitioning which is grammar based. This grammar was developed by Jeff
> > Cohen at Greenplum with some assistance from myself. It is to be
> > completely open source.
> 
> 
> I like the syntax, but what I really want to know is how well this is
> going to work with the query planner.  The current 8.x table
> partitioning mechanism has a lot of issues...it causes more problems
> than it solves unless you are willing to strictly control how you
> query the tables...I usually end up rolling my own.

The syntax is half the problem, performance is the other. I will bring
the performance issues up in another thread. Yes, we are confident that
we can address the performance issues that rule out the existing
partitioning for many applications. We need it for our own stuff! :P

Thanks,

Gavin

Re: Declarative partitioning grammar

From

Jeff Cohen

Date:

11 January 2008, 21:07:39

On Jan 11, 2008, at 4:03 PM, Mike wrote:

> Pardon my ignorance as I've never actually used partitioning before  
> but
> plan to in the near future, but couldn't the grammar resemble a common
> WHERE clause more closely?
>>

Hi Mike,

Thanks for your suggestions.  The current syntax we chose is similar  
to syntax used by IBM, Oracle, and mysql, so it is familiar to folks  
who have used partitioning with other databases.  A WHERE clause  
would of course be understandable by everyone, but it makes error  
checking more difficult, since we want to ensure that partition  
specifications don't overlap.  In order to make such error checking  
feasible, we would have to restrict the set of predicates you can use  
in the WHERE clause, so it wouldn't be completely general anyway.

> Is there really a reason to not have a named partition as well?  
> Sure it
> saves a few keystrokes, but it makes trying to do anything with  
> them at
> a later date that much more difficult.

For the case of partition by HASH, you can just specify the number of  
buckets, so it might not be meaningful to name the partitions.  For  
range partitions, many users perform "rolling upgrades" on a regular  
basis, where they drop the oldest data and add a new partition with  
the latest data, so they might just refer to partitions by  
"position" (either an ordinal number or using a keyword like FIRST/ 
LAST).

kind regards,

Jeff

Re: Declarative partitioning grammar

From

NikhilS

Date:

12 January 2008, 06:24:49

Hi,

I've proposed an alternative approach, which we've called declarative
partitioning which is grammar based. This grammar was developed by Jeff
Cohen at Greenplum with some assistance from myself. It is to be
completely open source.

<..>

FWIW, I had done some very initial work on declarative partitioning (no where as exhaustive as this proposal) and submitted a wip patch here:

http://momjian.us/mhonarc/patches_hold/msg00006.html

Kindly take a look at the patch, to see if would be useful to you folks in any way.

<..>

Range
-----

Range has the most expressive grammar. I'll introduce it in steps:

... PARTITION BY RANGE (b)
(
PARTITION aa start (date '2007-01-01') end (date '2008-01-01'),
PARTITION bb start (date '2008-01-01') end (date '2009-01-01')
);

It is common that these partitions follow a pattern, such as following
every week, month or year. So, we support the following specification:

... PARTITION BY RANGE(order_date)
(
START (date '2005-12-01') end (date '2007-12-01')
EVERY(interval '2 months')
);

<..>

It will be interesting to see how this start,end, interval usage accomodates data types other than dates. I hope, this specification is not influenced overlty just by dates-like partitions.

<..>

ADD
---

For range and list partitioning, it's important to be able to add
partitions for data not covered by the existing specification. So, we
propose:

... ADD PARTITION q1_2008 end (date '2008-04-01')

<..>

What about data that does not match any existing partition specification? It might make sense to have a dummy partition which handles all these cases.

<..>

DROP
----

For list and range partitions, drop a specified partition from the set
of partitions.

... DROP PARTITION minny;

This drops a named partition. Often, it will be difficult for users to
know partition names, and they might be unnamed. So, we allow this
syntax:

... DROP PARTITION FOR(date '2007-01-01');

for range partitions; and:

... DROP PARTITION FOR(VALUES('CA'));

for list partitions.

We've also discussed something like:

... DROP PARTITION FOR(POSITION(1));

so that users can easily drop a specific partition in an array of range
partitions. It seems to me, though, that the use case is generally to
drop the oldest partition so perhaps we should have a more explicit
syntax. Thoughts?

<..>

Surely, the partitions will get (default, parent inferred) names when they get created? Do we expect the users to remember FOR() specifications like the ones mentioned above? It might make sense to have a "\d in psql" e.g to present a parent with all its named partitions alongwith the partition clauses to facilitate drop partition using partition names.

<..>

EXCHANGE
--------

This sub-clause allows us to make a table a partition in a set of
partitions or take a partition out of a set but keep it as a table. IBM
uses ATTACH and DETACH, which is explicit but Oracle uses EXCHANGE. I'll
explain the latter:

... EXCHANGE <partition identifier> WITH TABLE <table name>

partition identifier is one of PARTITION <name> or PARTITION FOR(...).
The partition in the partition set 'becomes' the table <table name> and
vice-versa. Essentially, we'd swap the relfilenodes. This means that we
have to first ADD PARTITION then swap the table and the partition.
Thoughts?

<..>

Surely this wont be instantaneous?

<..>

Regards,

Nikhils

--
EnterpriseDB http://www.enterprisedb.com

Re: Declarative partitioning grammar

From

NikhilS

Date:

12 January 2008, 06:31:28

Hi,

We did look at allowing general functions for partitioning and this
was one concern. The other is that we want to enforce that a row
only gets inserted into a single partition, so we wanted a
declarative syntax where it was relatively easy to check that range
and list specifications don't overlap.

Detection of mutually exclusive ranges might not turn out to be so easy afterall. I think there is some code in the constraint_exclusion area which might help out in this.

Regards,

Nikhils
--
EnterpriseDB http://www.enterprisedb.com

Re: Declarative partitioning grammar

From

NikhilS

Date:

12 January 2008, 06:40:58

Hi,

On Jan 12, 2008 6:29 AM, Gavin Sherry <swm@alcove.com.au> wrote:

The syntax is half the problem, performance is the other. I will bring
the performance issues up in another thread. Yes, we are confident that
we can address the performance issues that rule out the existing
partitioning for many applications. We need it for our own stuff! :P

Agreed, syntax is just the sugar.

Also other than performance, how are updates involving partition keys causing the resultant tuple to end up in a new partition handled here?

Regards,

Nikhils

--
EnterpriseDB http://www.enterprisedb.com

Re: Declarative partitioning grammar

From

Peter Eisentraut

Date:

12 January 2008, 13:34:58

Jeff Cohen wrote:
> In order to make such error checking 
> feasible, we would have to restrict the set of predicates you can use
> in the WHERE clause, so it wouldn't be completely general anyway.

Well, with an extensible system such as PostgreSQL you will need to have a 
partitioning scheme that can deal with extensions.  Perhaps people want to 
partition by XML, GIS, text-search data, or whatever someone might come up 
with in the future.

One possible way to achieve that might be to redefine your concepts of hash, 
list, and range in terms of operator classes (or operator families or other 
operator structures?).  Those have well-defined properties as to how the 
operators behave relative to each other, so checking the partition 
definitions for mutual exclusivity and other properties would be possible.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

Re: Declarative partitioning grammar

From

Simon Riggs

Date:

12 January 2008, 13:47:08

On Sat, 2008-01-12 at 01:59 +0100, Gavin Sherry wrote:
> The syntax is half the problem, performance is the other.

The syntax looks great to me, but I think it is about 5% of the problem,
maybe less. I don't really have any questions about the syntax, but I
may have thoughts when the implementation details emerge.

I'm not sure you'll be able to use PARTITION BY since its part of the
SQL Standard for Windowed grouping, which we do hope to implement one
day. It will be confusing to have two completely separate meanings for
the one phrase in our grammar.

The burning questions from my perspective are:

What is a partition?

How will the syntax be implemented within the backend?

--  Simon Riggs 2ndQuadrant  http://www.2ndQuadrant.com

Re: Declarative partitioning grammar

From

Gavin Sherry

Date:

12 January 2008, 14:40:48

On Sat, Jan 12, 2008 at 05:47:30PM +0000, Simon Riggs wrote:
> On Sat, 2008-01-12 at 01:59 +0100, Gavin Sherry wrote:
> > The syntax is half the problem, performance is the other.
> 
> The syntax looks great to me, but I think it is about 5% of the problem,
> maybe less. I don't really have any questions about the syntax, but I
> may have thoughts when the implementation details emerge.

Yes, that's for another thread. Since the discussion was abot using
grammar to control partitions I wanted to get some grammar out. More
details on other stuff soon.

> 
> I'm not sure you'll be able to use PARTITION BY since its part of the
> SQL Standard for Windowed grouping, which we do hope to implement one
> day. It will be confusing to have two completely separate meanings for
> the one phrase in our grammar.

I think it's fine. It doesn't cause conflicts in the grammar (in fact,
the Greenplum grammar implements both meanings right now with no
confusion).

Thanks,

Gavin

Re: Declarative partitioning grammar

From

Markus Schiltknecht

Date:

14 January 2008, 05:52:10

Hi,

Jeff Cohen wrote:
> We did look at allowing general functions for partitioning and this was 
> one concern.  The other is that we want to enforce that a row only gets 
> inserted into a single partition, so we wanted a declarative syntax 
> where it was relatively easy to check that range and list specifications 
> don't overlap.

Why do you need to define a split point so ambiguously at all? Why not 
just give the DBA exactly *one* place to define the split point?

I don't think the separation into list, hash and range partitioning is 
adequate. What is the system supposed to do, if you try to insert a row 
which doesn't fit any of the values in your list or doesn't fit any of 
the ranges you defined?

I prefer a partitioning grammar which doesn't interfere with 
constraints. We all know how to define constraints. Please don't 
introduce a new, ambiguous way. A partitioning definition should be able 
to tell the target partition for *every* row which satisfies the 
constraints (the real ones, not ambiguous ones).

IMO, a single DDL command should only touch a single split point, i.e. 
split a table into two partitions, move the split point or remove the 
split point (joining the partitions again). Those are the only basic 
commands you need to be able to handle partitioning.

Sorry, but for my taste, the proposed grammar is too long per command, 
not flexible enough and instead ambiguous for split points as well as 
for constraints. To me it looks like repeating the mistakes of others.

Regards

Markus

Re: Declarative partitioning grammar

From

Gavin Sherry

Date:

14 January 2008, 19:35:20

On Sat, Jan 12, 2008 at 04:01:19PM +0530, NikhilS wrote:
> Hi,
> 
> > We did look at allowing general functions for partitioning and this
> > was one concern.  The other is that we want to enforce that a row
> > only gets inserted into a single partition, so we wanted a
> > declarative syntax where it was relatively easy to check that range
> > and list specifications don't overlap.
> >
> 
> Detection of mutually exclusive ranges might not turn out to be so easy
> afterall. I think there is some code in the constraint_exclusion area which
> might help out in this.

In some prototyping code it didn't seem too difficult but if we've made
a mistake we might have to look at the CE code.

Thanks,

Gavin

Re: Declarative partitioning grammar

From

Jeff Cohen

Date:

14 January 2008, 23:20:02

On Jan 14, 2008, at 1:49 AM, Markus Schiltknecht wrote:

> I don't think the separation into list, hash and range partitioning  
> is adequate. What is the system supposed to do, if you try to  
> insert a row which doesn't fit any of the values in your list or  
> doesn't fit any of the ranges you defined?

Hi Markus,

If you don't define a "default" partition to handle outliers,  the  
insert should fail with an error.

> I prefer a partitioning grammar which doesn't interfere with  
> constraints. We all know how to define constraints. Please don't  
> introduce a new, ambiguous way. A partitioning definition should be  
> able to tell the target partition for *every* row which satisfies  
> the constraints (the real ones, not ambiguous ones).
>
> IMO, a single DDL command should only touch a single split point,  
> i.e. split a table into two partitions, move the split point or  
> remove the split point (joining the partitions again). Those are  
> the only basic commands you need to be able to handle partitioning.

I can certainly appreciate the simplicity of this approach.  It lets  
us use a generic check constraint to perform partitioning, so it is  
more general than partitioning using hash, list, and range.  However,  
it achieves this generality at the expense of usability for typical  
customer cases.  For example, let's look at the case of a table of 1  
year of sales data, where we want to create 12 partitions -- one for  
each month.

With the generic approach, you start with a single table, and start  
by splitting it into two six-month partitions:

ALTER TABLE sales  SPLIT where sales_date > date '2007-06-01'   INTO    (     PARTITION first_half     PARTITION
second_half    );

We could implement this approach using check constraints and table  
inheritance: the partition second_half is a child table where  
sales_date > date '2007-06-01', and the partition first_half has the  
complementary constraint NOT(sales_date > date '2007-06-01').

Next, you split each partition:

ALTER TABLE sales  SPLIT PARTITION first_half where sales_date > date '2007-03-01'   INTO    (     PARTITION
first_quarter    PARTITION second_quarter     );

So now the child table for first_half itself has two children.  As  
you continue this process you construct a binary tree of table  
inheritance using 12 ALTER statements.

In the "long" grammar you can create and partition the table in one  
statement:

CREATE TABLE sales
...
PARTITION BY sales_date
(
start (date '2007-01-01') end (date '2008-01-01')
every (interval '1 month')
);

> Sorry, but for my taste, the proposed grammar is too long per  
> command, not flexible enough and instead ambiguous for split points  
> as well as for constraints. To me it looks like repeating the  
> mistakes of others.

Thanks for your feedback.  Partitioning the table using series of  
splits is a clever solution for situations where the partitioning  
operation cannot be described using simple equality (like list,hash)  
or ordered comparison (range).  But for many common business cases,  
the "long" grammar is easier to specify.

kind regards,

Jeff

Re: Declarative partitioning grammar

From

Jeff Cohen

Date:

14 January 2008, 23:27:32

On Jan 12, 2008, at 9:34 AM, Peter Eisentraut wrote:

> Well, with an extensible system such as PostgreSQL you will need to  
> have a
> partitioning scheme that can deal with extensions.  Perhaps people  
> want to
> partition by XML, GIS, text-search data, or whatever someone might  
> come up
> with in the future.

Hi Peter,

In the proposed solution, hash and list partitions work for all types  
that support an equality operator, and range partitions work for all  
types that support fully-ordered comparison.

kind regards,

Jeff

Re: Declarative partitioning grammar

From

Tom Lane

Date:

14 January 2008, 23:45:41

Jeff Cohen <jcohen@greenplum.com> writes:
> In the proposed solution, hash and list partitions work for all types  
> that support an equality operator, and range partitions work for all  
> types that support fully-ordered comparison.

Surely a hashing method would require a *hashable* equality operator,
ie a hash opclass; likewise range partitions would demand a matching
btree opclass.  You could do list partitions with an equality operator
of either kind.

Essentially all of the system's current knowledge about the properties
of specific operators is encoded as operator classes for one of these
two built-in index types.  If you want to make assumptions about the
behavior of an operator, it really needs to be founded on these types of
opclasses --- or else you're buying into inventing a comparable amount
of infrastructure for some other organizational concept.

I think Peter's point was that you might want to think about
generalizing your concepts so that other kinds of operator classes could
someday serve as the foundations for other kinds of partitioning rules.
        regards, tom lane

Re: Declarative partitioning grammar

From

Markus Schiltknecht

Date:

15 January 2008, 04:43:05

Hi Jeff,

Jeff Cohen wrote:
> If you don't define a "default" partition to handle outliers,  the 
> insert should fail with an error.

IMO, you should always have a "default" partition, then, so as not to 
violate the constraints (by rejecting tuples which are correct according 
to the constraints).

> With the generic approach, you start with a single table, and start by 
> splitting it into two six-month partitions:
> 
> ALTER TABLE sales
>   SPLIT where sales_date > date '2007-06-01'
>    INTO
>     (
>      PARTITION first_half
>      PARTITION second_half
>      );
> 
> We could implement this approach using check constraints and table 
> inheritance: the partition second_half is a child table where sales_date 
>  > date '2007-06-01', and the partition first_half has the complementary 
> constraint NOT(sales_date > date '2007-06-01').
> 
> Next, you split each partition:
> 
> ALTER TABLE sales
>   SPLIT PARTITION first_half where sales_date > date '2007-03-01'
>    INTO
>     (
>      PARTITION first_quarter
>      PARTITION second_quarter
>      );
> 
> So now the child table for first_half itself has two children.  As you 
> continue this process you construct a binary tree of table inheritance 
> using 12 ALTER statements.

<nitpicking>There are just 11 splits between 12 months, otherwise 
correct, yes.</nitpicking>

> In the "long" grammar you can create and partition the table in one 
> statement:
> 
> CREATE TABLE sales
> ...
> PARTITION BY sales_date
> (
> start (date '2007-01-01') end (date '2008-01-01')
> every (interval '1 month')
> );

To be fair, you should add the 12 partition names here as well.

I can certainly see merit in letting the database system handle the 
binary tree.

> Thanks for your feedback.  Partitioning the table using series of splits 
> is a clever solution for situations where the partitioning operation 
> cannot be described using simple equality (like list,hash) or ordered 
> comparison (range).  But for many common business cases, the "long" 
> grammar is easier to specify.

Easier to specify initially, maybe, yes. But how about managing it 
afterwards? Having seen all the different options for merging, 
splitting, exchanging, coalescing and adding, all of them with small 
little differences for hash, range and list partitioning - let alone 
sub-partitioning - with all of that, the proposed grammar doesn't look 
particularly easy to me.

Let's at least drop the differences for list, hash and range 
partitioning, those are pretty unneeded, IMO.

Regards

Markus

Re: Declarative partitioning grammar

From

Hannu Krosing

Date:

15 January 2008, 09:18:12

Ühel kenal päeval, E, 2008-01-14 kell 10:49, kirjutas Markus
Schiltknecht:
> Hi,
> 
> Jeff Cohen wrote:
> > We did look at allowing general functions for partitioning and this was 
> > one concern.  The other is that we want to enforce that a row only gets 
> > inserted into a single partition, so we wanted a declarative syntax 
> > where it was relatively easy to check that range and list specifications 
> > don't overlap.
> 
> Why do you need to define a split point so ambiguously at all? Why not 
> just give the DBA exactly *one* place to define the split point?
> 
> I don't think the separation into list, hash and range partitioning is 
> adequate. What is the system supposed to do, if you try to insert a row 
> which doesn't fit any of the values in your list or doesn't fit any of 
> the ranges you defined?

I guess it would go to some "default" partition ?

...

> IMO, a single DDL command should only touch a single split point, i.e. 
> split a table into two partitions, move the split point or remove the 
> split point (joining the partitions again). Those are the only basic 
> commands you need to be able to handle partitioning.

sure, but this can become really tedious for 1024 partitions, not to
mention hard for optimiser.

> Sorry, but for my taste, the proposed grammar is too long per command, 
> not flexible enough and instead ambiguous for split points as well as 
> for constraints. To me it looks like repeating the mistakes of others.

what mistakes ?

-----------------
Hannu

Re: Declarative partitioning grammar

From

Markus Schiltknecht

Date:

15 January 2008, 09:40:47

Hi,

Hannu Krosing wrote:
> I guess it would go to some "default" partition ?

Which doesn't have a name so far, which prevents from addressing that 
partition. Nor is it pretty well defined, it's just a rest.

> sure, but this can become really tedious for 1024 partitions,

Well, managing 1024 partitions manually is a tedious job, no matter what 
grammar you take: You'll have to deal with 1024 different partition names.

What do you need so many partitions for?

> not to
> mention hard for optimiser.

It's pretty much the same for the optimizer: a binary tree.  Granted, 
that binary tree should better be balanced by the RDBMS.

Regards

Markus

Re: Declarative partitioning grammar

From

Hans-Juergen Schoenig

Date:

15 January 2008, 10:16:16

sure, but this can become really tedious for 1024 partitions,

Well, managing 1024 partitions manually is a tedious job, no matter what grammar you take: You'll have to deal with 1024 different partition names.

What do you need so many partitions for?

imagine a structure which is partitioned by day. if you keep data for 3 years (which is perfectly reasonable) you already have 1000 partitions.

some applications produce so much data that splitting it into days is perfectly reasonable.

if your get 30 GB a day making monthly tables is not too funny anymore ...

just think of CREATE INDEX or VACUUM ...

having so many tables is not funny but it can be the only reasonable choice.

best regards,

hans

Cybertec Schönig & Schönig GmbH

PostgreSQL Solutions and Support

Gröhrmühlgasse 26, 2700 Wiener Neustadt

Tel: +43/1/205 10 35 / 340

www.postgresql.at, www.cybertec.at

Re: Declarative partitioning grammar

From

Markus Schiltknecht

Date:

15 January 2008, 10:26:46

Hi,

Hans-Juergen Schoenig wrote:
>> What do you need so many partitions for?
> 
> having so many tables is not funny but it can be the only reasonable choice.

Well, what do you do with all those partitions? Most of them will end up 
on the same storage subsystem. So, if you don't partition to spread your 
data across storage with different characteristics, why do you need 
partitioning at all? Isn't an index better in most cases?

Or are you using it as a form of CLUSTERing? Where you expect to reduce 
time for sequential scans over a range? Simon's Segment Exclusion 
proposal looks like a much better fit to that purpose, IMO. It would 
prevent you from having to handle all those partitions manually.

Regards

Markus

Re: Declarative partitioning grammar

From

Tom Lane

Date:

15 January 2008, 11:36:48

Markus Schiltknecht <markus@bluegap.ch> writes:
> Jeff Cohen wrote:
>> If you don't define a "default" partition to handle outliers,  the 
>> insert should fail with an error.

> IMO, you should always have a "default" partition, then, so as not to 
> violate the constraints (by rejecting tuples which are correct according 
> to the constraints).

I don't agree with that at all.  I can imagine plenty of situations
where a tuple falling outside the range of available partitions *should*
be treated as an error.  For instance, consider timestamped observations
--- data in the future is certainly bogus, and data further back than
you want to deal with must be an entry error as well.

I agree that there needs to be a way to have a "default" partition,
but there needs to be a way to not have one, too.
        regards, tom lane

Re: Declarative partitioning grammar

From

Markus Schiltknecht

Date:

15 January 2008, 11:49:44

Hi,

Tom Lane wrote:
> I don't agree with that at all.  I can imagine plenty of situations
> where a tuple falling outside the range of available partitions *should*
> be treated as an error.  For instance, consider timestamped observations
> --- data in the future is certainly bogus, and data further back than
> you want to deal with must be an entry error as well.

Isn't it better to have these constraints as table constraints, instead 
of burying them in the partitioning definition? Mixing those two 
concepts seems very wired to me.

Or am I missing any benefit of mixing them?

Regards

Markus

Re: Declarative partitioning grammar

From

Gregory Stark

Date:

15 January 2008, 12:12:46

"Markus Schiltknecht" <markus@bluegap.ch> writes:

> Hi,
>
> Hans-Juergen Schoenig wrote:
>>> What do you need so many partitions for?
>>
>> having so many tables is not funny but it can be the only reasonable choice.
>
> Well, what do you do with all those partitions? 

In a previous life I had a database which had daily partitions. I assure you
it was unquestionably the right decision. Each day's data wasn't just
distinguished by the timestamp but actually was entirely separate from the
previous day's data. Both the archiving strategy and the many reports which
were ran all depended specifically on the day the data was collected on.

Much like partial indexes the partition key effectively gives you a free index
key element which can be combined with any other index without extra i/o.
Often this lets you avoid a sort letting you perform some queries as OLTP
queries with no startup cost which otherwise would have DSS style plans with
sorts. Better yet it can be used even in a sequential scan which performs
better than even a perfectly clustered index.

Daily partitioned let us go from a weekly batch job which was threatening to
overrun the maintenance window to a daily batch job which completed in
minutes. As a result data warehouse reports could be delivered the same night
instead of being hours, and in some cases days, delayed.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services!

Re: Declarative partitioning grammar

From

Gavin Sherry

Date:

15 January 2008, 12:30:53

On Tue, Jan 15, 2008 at 10:36:17AM -0500, Tom Lane wrote:
> Markus Schiltknecht <markus@bluegap.ch> writes:
> > Jeff Cohen wrote:
> >> If you don't define a "default" partition to handle outliers,  the 
> >> insert should fail with an error.
> 
> > IMO, you should always have a "default" partition, then, so as not to 
> > violate the constraints (by rejecting tuples which are correct according 
> > to the constraints).
> 
> I don't agree with that at all.  I can imagine plenty of situations
> where a tuple falling outside the range of available partitions *should*
> be treated as an error.  For instance, consider timestamped observations
> --- data in the future is certainly bogus, and data further back than
> you want to deal with must be an entry error as well.
> 
> I agree that there needs to be a way to have a "default" partition,
> but there needs to be a way to not have one, too.

Jeff and I discussed this and we came to the same conclusion. We will
propose grammar for handling it. Many users we talk to would fall into the
class of people who would want an error if the data fell outside the
defined partitions.

Thanks,

Gavin

Re: Declarative partitioning grammar

From

Tom Lane

Date:

15 January 2008, 12:38:53

Markus Schiltknecht <markus@bluegap.ch> writes:
> Tom Lane wrote:
>> I don't agree with that at all.  I can imagine plenty of situations
>> where a tuple falling outside the range of available partitions *should*
>> be treated as an error.

> Isn't it better to have these constraints as table constraints, instead 
> of burying them in the partitioning definition? Mixing those two 
> concepts seems very wired to me.

DBAs tend to be belt *and* suspenders guys, no?  I'd think a lot of them
would want a table constraint, plus a partitioning rule that rejects
anything outside the intended partitions.
        regards, tom lane

Re: Declarative partitioning grammar

From

Markus Schiltknecht

Date:

15 January 2008, 12:39:41

Hi,

Gregory Stark wrote:
> In a previous life I had a database which had daily partitions. I assure you
> it was unquestionably the right decision. Each day's data wasn't just
> distinguished by the timestamp but actually was entirely separate from the
> previous day's data. Both the archiving strategy and the many reports which
> were ran all depended specifically on the day the data was collected on.

Wouldn't Segment Exclusion (maybe together with a specialized form of 
CLUSTERing) handle that case much better than partitioning? Without the 
need to name all those thousands of partitions and manage them manually.

What I would want in such a case, is exactly not manual management of 
partitions, but rather a performance optimization for scanning a range 
of rows, which is something in between indexes (for very few rows) and a 
seq scan (for almost all rows of a table).

I know, this now sounds like I've turned sides to Simon's proposal. And 
yes, in a way, that's true. I certainly see merit for Segment Exclusion, 
more and more. OTOH I'm still skeptical about it replacing declarative 
partitioning entirely. But declarative partitioning only really makes 
sense, if you partition into different storage subsystems, IMO.

Everything happening on the same storage subsystem shouldn't need manual 
partitioning, but should be optimized pretty automatically. As Simon 
proposed, that's well possible in many cases.

Regards

Markus

Re: Declarative partitioning grammar

From

"Zeugswetter Andreas ADI SD"

Date:

15 January 2008, 12:45:40

> > I don't agree with that at all.  I can imagine plenty of situations
> > where a tuple falling outside the range of available partitions
*should*
> > be treated as an error.  For instance, consider timestamped
observations
> > --- data in the future is certainly bogus, and data further back
than
> > you want to deal with must be an entry error as well.
>
> Isn't it better to have these constraints as table constraints,
instead
> of burying them in the partitioning definition? Mixing those two
> concepts seems very wired to me.

Yes, but the problem with the timestamp partitioned tables is, that the
window is sliding. Thus you would need two alter tables for each new
period. One that changes the constraint + one that creates the new
partition. So it seems natural to join the two concepts for such a
partitioning syntax.

Personally I find the automatic partition idea intriguing, where you
only have to choose an expression that equates to one value (value
group) per partition (and possibly a way to derive a partition name).
Then a partition is automatically created when a new row arrives for a
new value. That does not however address Tom's concern of rejecting data
that is outside the acceptable window, but maybe that is better dealt
with in the application anyways.

Andreas

Re: Declarative partitioning grammar

From

Markus Schiltknecht

Date:

15 January 2008, 12:53:00

Hi,

Tom Lane wrote:
> DBAs tend to be belt *and* suspenders guys, no?

I rather know those admins with stupid looking faces who are wondering 
why their transactions fail. Often enough, that can have a lot of 
different reasons. Extending the set of possible traps doesn't seem like 
a clever idea for those admins.

> I'd think a lot of them
> would want a table constraint, plus a partitioning rule that rejects
> anything outside the intended partitions.

I'm rather a fan of the DRY principle (don't repeat yourself). Because 
having to maintain redundant constraints smells suspiciously like a 
maintenance nightmare.

And where's the real use of making the database system check twice? Want 
to protect against memory corruption in between the two checks, eh? :-)

Regards

Markus

Re: Declarative partitioning grammar

From

Markus Schiltknecht

Date:

15 January 2008, 12:56:45

Hi,

Zeugswetter Andreas ADI SD wrote:
> Yes, but the problem with the timestamp partitioned tables is, that the
> window is sliding. Thus you would need two alter tables for each new
> period. One that changes the constraint + one that creates the new
> partition. So it seems natural to join the two concepts for such a
> partitioning syntax.

If you think in terms of split points, having to alter two table is not 
true. It's better

> Personally I find the automatic partition idea intriguing, where you
> only have to choose an expression that equates to one value (value
> group) per partition (and possibly a way to derive a partition name).
> Then a partition is automatically created when a new row arrives for a
> new value. That does not however address Tom's concern of rejecting data
> that is outside the acceptable window, but maybe that is better dealt
> with in the application anyways.
> 
> Andreas
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match

Re: Declarative partitioning grammar

From

Markus Schiltknecht

Date:

15 January 2008, 13:02:52

Hi,

(sorry for the previous one, if delivered, that went of too early...)

Zeugswetter Andreas ADI SD wrote:
> Yes, but the problem with the timestamp partitioned tables is, that the
> window is sliding. Thus you would need two alter tables for each new
> period. One that changes the constraint + one that creates the new
> partition. So it seems natural to join the two concepts for such a
> partitioning syntax.

If you think in terms of split points, having to alter two partitions 
isn't true, you just add a split point.

Of course, that also alters the "constraints" of the partitions, but I 
think we all agree that the system should maintain those constraints 
automatically, anyway. As such, they don't even have to be visible to 
the DBA.

> Personally I find the automatic partition idea intriguing, where you
> only have to choose an expression that equates to one value (value
> group) per partition (and possibly a way to derive a partition name).

IMO, better go right to a fully automated approach. Or why would you 
need partition names in such a case?

Regards

Markus

Re: Declarative partitioning grammar

From

Gregory Stark

Date:

15 January 2008, 13:12:37

"Markus Schiltknecht" <markus@bluegap.ch> writes:

> Hi,
>
> Gregory Stark wrote:
>> In a previous life I had a database which had daily partitions. I assure you
>> it was unquestionably the right decision. Each day's data wasn't just
>> distinguished by the timestamp but actually was entirely separate from the
>> previous day's data. Both the archiving strategy and the many reports which
>> were ran all depended specifically on the day the data was collected on.
>
> Wouldn't Segment Exclusion (maybe together with a specialized form of
> CLUSTERing) handle that case much better than partitioning? Without the need to
> name all those thousands of partitions and manage them manually.

Firstly we have a philosophical difference here. I think crossing my fingers
and hoping for a performance boost from some low level infrastructure that I
can't measure or monitor is not better than managing things so I know
precisely what's going on and can see it for myself in the plan.

Secondly, no, it wouldn't handle it better. Consider a use case:

select * from data where user_id = ? and data_set = ?

Where I partition by data_set once a day. Now consider a user which has a
dozen entries every day for the past 10 years. That's 40k records or so. In
the segment exclusion case I have to descend a huge btree index which manages
every record for every user for the past 10 years. (nevermind trying to manage
such an index when it comes time to archive data.) Take the 40k records I pull
from it, and for each one look up the physical location in some kind of
segment endpoint data structure, presumably using a binary search. Once I've
done the binary search for each of the 40k index pointers I can compare the
data_set parameter with the low and high end point of the partition. They'll
all be rejected except the desired partition and I'll then have to go look up
the records for the dozen or so index pointers in that partition.

The alternative is a plan which says to do a normal index lookup on a small
manageable index for a single partition. The index will contain 12 index
pointers which are precisely the records I'm interested in. I look up those 12
records and I'm done.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support!

Re: Declarative partitioning grammar

From

"Zeugswetter Andreas ADI SD"

Date:

16 January 2008, 09:13:58

> > Personally I find the automatic partition idea intriguing, where you
> > only have to choose an expression that equates to one value (value
> > group) per partition (and possibly a way to derive a
> partition name).
>
> IMO, better go right to a fully automated approach. Or why would you
> need partition names in such a case?

Yes, I tend to aggree on that, with the exception that I think the
partition borders should be declarative expressions (equal in syntax to
a GROUP BY clause).

Names are only for a short and crisp way to identify the partition for
the following operations:
- drop/detach data in a partition (as opposed to a normal delete)
- move to/create in other tablespace (I don't value that one, but others
have)
- reorg, create index

The drop can probably be done straight from a normal delete (a problem
is RI and triggers though).
The normal delete would need to have the smarts, that a delete covers a
whole partition, and thus mark the whole partition dead instead of it's
individual rows.

A detach would need some extra syntax, but could also be based on a
where clause that specifies which partitions are to be detached. I am
not so sure about how to usefully do the reorg part with where clauses.

Andreas

Re: Declarative partitioning grammar

From

Gavin Sherry

Date:

18 January 2008, 03:36:09

On Mon, Jan 14, 2008 at 10:45:28PM -0500, Tom Lane wrote:
> Jeff Cohen <jcohen@greenplum.com> writes:
> > In the proposed solution, hash and list partitions work for all types  
> > that support an equality operator, and range partitions work for all  
> > types that support fully-ordered comparison.
> 
> Surely a hashing method would require a *hashable* equality operator,
> ie a hash opclass; likewise range partitions would demand a matching
> btree opclass.  You could do list partitions with an equality operator
> of either kind.

Right.

> 
> Essentially all of the system's current knowledge about the properties
> of specific operators is encoded as operator classes for one of these
> two built-in index types.  If you want to make assumptions about the
> behavior of an operator, it really needs to be founded on these types of
> opclasses --- or else you're buying into inventing a comparable amount
> of infrastructure for some other organizational concept.

Right, we obviously don't want to do that.

> 
> I think Peter's point was that you might want to think about
> generalizing your concepts so that other kinds of operator classes could
> someday serve as the foundations for other kinds of partitioning rules.

Let me see if I've understood: certain operator classes either describe
or allow certain kinds of partitioning: hash is obvious, btree allows
equality and range based approaches, gist allows users a whole range of
possibilities. So, a truly extensible system would define the
partitioning type in the catalog?

That's an interesting idea. It presents problems, I think, for the
grammar I've proposed because some grammatical constructs are tied to
range, some to hash, some to list. Any insights into how we could
achieve both?

Thanks,

Gavin

PS: Heading off into the French country side for a little while and responses
may be a little slow.

Re: Declarative partitioning grammar

From

Simon Riggs

Date:

20 January 2008, 08:57:58

On Tue, 2008-01-15 at 17:12 +0000, Gregory Stark wrote:

> > Wouldn't Segment Exclusion (maybe together with a specialized form of
> > CLUSTERing) handle that case much better than partitioning? Without the need to
> > name all those thousands of partitions and manage them manually.
> 
> Firstly we have a philosophical difference here. I think crossing my fingers
> and hoping for a performance boost from some low level infrastructure that I
> can't measure or monitor is not better than managing things so I know
> precisely what's going on and can see it for myself in the plan.

The arguments put forward here apply equally to visibility maps of any
kind, so should equally be applied to both Heikki's VM proposal and
Itagaki's DSM approach. Both of those proposals rely on the idea that
the writes to tables are either low or isolated to particular areas. If
we do anything that relies upon dynamic state information then the
results are, well...dynamic, i.e. they give a variable performance
boost.

I'm not sure why you think it could not be measured or monitored.
Crossing fingers and hoping isn't really a fair comment, unless your
workload varied dramatically from massively random one day to isolated
the next day. Most workloads don't, including your example case.

Sure, they won't help in all cases, but as I've said elsewhere we can
construct dysfunctional test cases that will defeat other tuning
measures such as HOT, TOAST and many other things. If we want to apply
further tuning to Postgres then we should accept that not all tuning
will apply to all cases.

I've accepted that we should have declarative partitioning to cover many
of the cases people have cited, but that doesn't mean a dynamic
partitioning approach is worthless because they aren't mutually
exclusive techniques.

I think we need to see a design for how the internals of partitioning
might work, if it differs from the existing table-level model. Only then
can we see the downsides of both approaches.

--  Simon Riggs 2ndQuadrant  http://www.2ndQuadrant.com