Re: On partitioning - Mailing list pgsql-hackers

From Amit Langote
Subject Re: On partitioning
Date
Msg-id 007b01d015c6$373c21e0$a5b465a0$@lab.ntt.co.jp
Whole thread Raw
In response to Re: On partitioning  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: On partitioning  (Robert Haas <robertmhaas@gmail.com>)
Re: On partitioning  (José Luis Tallón <jltallon@adv-solutions.net>)
List pgsql-hackers
> -----Original Message-----
> From: Robert Haas [mailto:robertmhaas@gmail.com]
> On Thu, Dec 11, 2014 at 12:00 AM, Amit Kapila <amit.kapila16@gmail.com>
> wrote:
> > Yeah either this way or what Josh has suggested upthread, the main
> > point was that if at all we want to support multi-column list partitioning
> > then we need to have slightly different syntax, however I feel that we
> > can leave multi-column list partitioning for first version.
>
> Yeah, possibly.
>
> I think we could stand to have a lot more discussion about the syntax
> here.  So far the idea seems to be to copy what Oracle has, but it's
> not clear if we're going to have exactly what Oracle has or something
> subtly different.  I personally don't find the Oracle syntax very
> PostgreSQL-ish.  Stuff like "VALUES LESS THAN 500" doesn't sit
> especially well with me - less than according to which opclass?  Are
> we going to insist that partitioning must use the default btree
> opclass so that we can use that syntax?  That seems kind of lame.
>

Syntax like VALUES LESS THAN 500 also means, we then have to go figure out what's that partition's lower bound based on
upperbound of the previous one. Forget holes in the range if they matter. I expressed that concern elsewhere in favour
ofhaving available both a range's lower and upper bounds. 

> There are lots of interesting things we could do here, e.g.:
>
> CREATE TABLE parent_name PARTITION ON (column [ USING opclass ] [, ... ]);

So, no PARTITION BY [RANGE | LIST] clause huh?

What we are calling pg_partitioned_rel would obtain following bits of information from such a definition of a
partitionedrelation: 
* column(s) to partition on and respective opclass(es)* the level this partitioned relation lies in the partitioning
hierarchy   (determining its relkind and storage qualification) 

By the way, I am not sure how we define a partitioning key on a partition (in other words, a subpartitioning key on the
correspondingpartitioned relation). Perhaps (only) via ALTER TABLE on a partition relation? 

> CREATE TABLE child_name PARTITION OF parent_name
>    FOR { (value, ...) [ TO (value, ...) ] } [, ...];
>

So it's still a CREATE "TABLE" but the part 'PARTITION OF' turns this "table" into something having characteristics of
apartition relation getting all kinds of new treatments at various places. It appears there is a redistribution of
table-characteristicsbetween a partitioned relation and its partition. We take away storage from the former and instead
giveit to the latter. On the other hand, the latter's data is only accessible through the former perhaps with escape
routesfor direct access via some special syntax attached to various access commands. We also stand to lose certain
abilitieswith a partitioned relation such as not able to define a unique constraint (other than what partition key
couldpotentially help ensure) or use it as target of foreign key constraint (just reiterating). 

What we call pg_partition_def obtains following bits of information from such a definition of a partition relation:
* parent relation (partitioned relation this is partition of)* partition kind (do we even want to keep carrying this
around as a separate field in catalog?)* values this partition holds 

The last part being the most important.

In case of what we would have called a 'LIST' partition, this could look like

... FOR VALUES (val1, val2, val3, ...)

Assuming we only support partition key to contain only one column in such a case.

In case of what we would have called a 'RANGE' partition, this could look like

... FOR VALUES (val1min, val2min, ...) TO (val1max, val2max, ...)

How about BETWEEN ... AND ... ?

Here we allow a partition key to contain more than one column.

> So instead of making a hard distinction between range and list
> partitioning, you can say:
>
> CREATE TABLE child_name PARTITION OF parent_name FOR (3), (5), (7);
> CREATE TABLE child2_name PARTITION OF parent_name FOR (8) TO (12);
> CREATE TABLE child2_name PARTITION OF parent_name FOR (20) TO (30),
> (120) TO (130);
>

I would include the noise keyword VALUES just for readability if anything.

> Now that might be a crappy idea for various reasons, but the point is
> there are a lot of details to be hammered out with the syntax, and
> there are several ways we can go wrong.  If we choose an
> overly-limiting syntax, we're needlessly restricting what can be done.
> If we choose an overly-permissive syntax, we'll restrict the
> optimization opportunities.
>

I am not sure but perhaps RANGE and LIST as partitioning kinds may as well just be noise keywords. We can parse those
valuesinto a parse node such that we don’t have to care about whether they describe partition as being one kind or the
other.Say a List of something like, 

typedef struct PartitionColumnValue
{   NodeTag    type,   Oid        *partitionid,   char       *partcolname,   Node       *partrangelower,   Node
*partrangeupper,  List       *partlistvalues 
};

Or we could still add a (char) partkind just to say which of the fields matter.

We don't need any defining values here for hash partitions if and when we add support for the same. We would either be
usinga system-wide common hash function or we could add something with partitioning key definition. 

Thanks,
Amit





pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Compression of full-page-writes
Next
From: Amit Kapila
Date:
Subject: Re: On partitioning