Partitioning WIP patch (was: Partitioning: issues/ideas) - Mailing list pgsql-hackers

From Amit Langote
Subject Partitioning WIP patch (was: Partitioning: issues/ideas)
Date
Msg-id 54EC32B6.9070605@lab.ntt.co.jp
Whole thread Raw
Responses Re: Partitioning WIP patch (was: Partitioning: issues/ideas)  (Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>)
Re: Partitioning WIP patch (was: Partitioning: issues/ideas)  (Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>)
Re: Partitioning WIP patch (was: Partitioning: issues/ideas)  (Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>)
Re: Partitioning WIP patch (was: Partitioning: issues/ideas)  (Jim Nasby <Jim.Nasby@BlueTreble.com>)
List pgsql-hackers
On 21-01-2015 PM 07:26, Amit Langote wrote:
>
> Ok, I will limit myself to focusing on following things at the moment:
>
> * Provide syntax in CREATE TABLE to declare partition key
> * Provide syntax in CREATE TABLE to declare a table as partition of a
> partitioned table and values it contains
> * Arrange to have partition key and values stored in appropriate
> catalogs (existing or new)
> * Arrange to cache partitioning info of partitioned tables in relcache
>

Here is an experimental patch that attempts to implement this.

It implements the following syntax:

* Syntax for defining partition key:
CREATE TABLE table_name(columns)PARTITION BY {RANGE|LIST} ON (key_spec);

where key_spec consists of partition key column names and optional
operator class per column. Currently, there are restrictions on the
key_spec such as allowing only column names (not arbitrary expressions
of them), only one column for list strategy, etc.

* Syntax for declaring a table as partition of a partitioned table:
CREATE TABLE table_name PARTITION OF parent_name FOR VALUES values_clause;

where values_clause can be:

IN (list_of_values), or
BETWEEN (range_lower_bounds) AND (range_upper_bounds);

The semantics for a range is [range_lower_bounds,range_upper_bounds),
that is, lower inclusive, upper exclusive. (this might later change
subject to choice regarding preferred/desired syntax)

Additionally, a partition can itself be further partitioned (though I
have not worked on the implementation details of multilevel partitioning
yet):

CREATE TABLE table_name PARTITION OF parent_name PARTITION BY
{RANGE|LIST} ON(key_columns) FOR VALUES values_clause;

There are two system catalogs pg_partitioned_rel and pg_partition which
store partition key spec and partition value spec, respectively.
(remember to initdb if interested in trying)

Please note that the above syntax and/or catalog may not be very
appealing nor that they won't change/disappear. I am posting the patch
more for examining the approach of internal representation of the
metadata for partitioning and get some general comments.

The approach I have implemented in this patch consists of loading
the partition key info and a list of partitions into the relation
descriptor for a partitioned table. In case of range partitioning, this
list is sorted on the range max bound. To see if that works any good, I
hacked ExecInsert() to make it find a partition for a tuple by
adding a ExecFindPartition() just before heap_insert(). It accepts
resultRelInfo of the parent and a tuple slot. It binary-searches for and
returns the descriptor of the chosen partition which ExecInsert() then
uses to perform heap_insert() and inserting index tuples. If no
partition is found, parent relation itself is used. heap_insert() and
ExecInsertIndexTuples() are the only things for which partition relation
is used. All of this is just experimental and most probably wrong in
details; but is done just to see what kind of performance to expect from
the chosen internal representation. Another thing is the approach that
tuple-routing (& partition-pruning) is a function of partitioned
relation and the tuple (or restrict quals). It will be more significant
when we'll get to implementing a partition-pruning function.

See below an example to show that having an extra ExecFindPartition()
does not degrade the performance of inserting a tuple much:

-- a plain table
CREATE TABLE parent_monthly(year int, month int, day int);

-- a partitioned table
-- xxxxx: number of partitions
CREATE TABLE parent_monthly_xxxxx(LIKE parent_monthly) PARTITION BY
RANGE ON(year, month);

-- partitions
CREATE TABLE parent_monthly_xxxxx_201401 PARTITION OF
parent_monthly_00100_201401 FOR VALUES BETWEEN (2014, 1) AND (2014, 2);

CREATE TABLE parent_monthly_xxxxx_201402 PARTITION OF
parent_monthly_00100_201402 FOR VALUES BETWEEN (2014, 2) AND (2014, 3);

CREATE TABLE parent_monthly_xxxxx_201403 PARTITION OF
parent_monthly_00100_201403 FOR VALUES BETWEEN (2014, 3) AND (2014, 4);

<snip>

CREATE TABLE parent_monthly_xxxxx_yyyymm PARTITION OF
parent_monthly_00100_yyyymm FOR VALUES BETWEEN (yyyy, mm AND (yyyy, mm);

-- insert 1 tuple into the plain table
INSERT INTO parent_monthly VALUES (2013, 12, 01);
INSERT 0 1
Time: 3.303 ms


-- insert 1 tuple into the partitioned table
-- #part: number of partitions
-- case 1: find no valid partition
-- case 2: find a valid partition

 #parts      case 1      case 2
========    ========    ========
10        3.248 ms    3.509 ms
100        3.546 ms    3.269 ms
500        3.497 ms    3.048 ms
1000        3.364 ms    5.379 ms
10000        4.943 ms    5.076 ms

Thoughts?

Thanks,
Amit

Attachment

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: improving speed of make check-world
Next
From: Amit Langote
Date:
Subject: Re: Partitioning WIP patch (was: Partitioning: issues/ideas)