Partitioning WIP patch (was: Partitioning: issues/ideas) - Mailing list pgsql-hackers
From | Amit Langote |
---|---|
Subject | Partitioning WIP patch (was: Partitioning: issues/ideas) |
Date | |
Msg-id | 54EC32B6.9070605@lab.ntt.co.jp Whole thread Raw |
Responses |
Re: Partitioning WIP patch (was: Partitioning: issues/ideas)
Re: Partitioning WIP patch (was: Partitioning: issues/ideas) Re: Partitioning WIP patch (was: Partitioning: issues/ideas) Re: Partitioning WIP patch (was: Partitioning: issues/ideas) |
List | pgsql-hackers |
On 21-01-2015 PM 07:26, Amit Langote wrote: > > Ok, I will limit myself to focusing on following things at the moment: > > * Provide syntax in CREATE TABLE to declare partition key > * Provide syntax in CREATE TABLE to declare a table as partition of a > partitioned table and values it contains > * Arrange to have partition key and values stored in appropriate > catalogs (existing or new) > * Arrange to cache partitioning info of partitioned tables in relcache > Here is an experimental patch that attempts to implement this. It implements the following syntax: * Syntax for defining partition key: CREATE TABLE table_name(columns)PARTITION BY {RANGE|LIST} ON (key_spec); where key_spec consists of partition key column names and optional operator class per column. Currently, there are restrictions on the key_spec such as allowing only column names (not arbitrary expressions of them), only one column for list strategy, etc. * Syntax for declaring a table as partition of a partitioned table: CREATE TABLE table_name PARTITION OF parent_name FOR VALUES values_clause; where values_clause can be: IN (list_of_values), or BETWEEN (range_lower_bounds) AND (range_upper_bounds); The semantics for a range is [range_lower_bounds,range_upper_bounds), that is, lower inclusive, upper exclusive. (this might later change subject to choice regarding preferred/desired syntax) Additionally, a partition can itself be further partitioned (though I have not worked on the implementation details of multilevel partitioning yet): CREATE TABLE table_name PARTITION OF parent_name PARTITION BY {RANGE|LIST} ON(key_columns) FOR VALUES values_clause; There are two system catalogs pg_partitioned_rel and pg_partition which store partition key spec and partition value spec, respectively. (remember to initdb if interested in trying) Please note that the above syntax and/or catalog may not be very appealing nor that they won't change/disappear. I am posting the patch more for examining the approach of internal representation of the metadata for partitioning and get some general comments. The approach I have implemented in this patch consists of loading the partition key info and a list of partitions into the relation descriptor for a partitioned table. In case of range partitioning, this list is sorted on the range max bound. To see if that works any good, I hacked ExecInsert() to make it find a partition for a tuple by adding a ExecFindPartition() just before heap_insert(). It accepts resultRelInfo of the parent and a tuple slot. It binary-searches for and returns the descriptor of the chosen partition which ExecInsert() then uses to perform heap_insert() and inserting index tuples. If no partition is found, parent relation itself is used. heap_insert() and ExecInsertIndexTuples() are the only things for which partition relation is used. All of this is just experimental and most probably wrong in details; but is done just to see what kind of performance to expect from the chosen internal representation. Another thing is the approach that tuple-routing (& partition-pruning) is a function of partitioned relation and the tuple (or restrict quals). It will be more significant when we'll get to implementing a partition-pruning function. See below an example to show that having an extra ExecFindPartition() does not degrade the performance of inserting a tuple much: -- a plain table CREATE TABLE parent_monthly(year int, month int, day int); -- a partitioned table -- xxxxx: number of partitions CREATE TABLE parent_monthly_xxxxx(LIKE parent_monthly) PARTITION BY RANGE ON(year, month); -- partitions CREATE TABLE parent_monthly_xxxxx_201401 PARTITION OF parent_monthly_00100_201401 FOR VALUES BETWEEN (2014, 1) AND (2014, 2); CREATE TABLE parent_monthly_xxxxx_201402 PARTITION OF parent_monthly_00100_201402 FOR VALUES BETWEEN (2014, 2) AND (2014, 3); CREATE TABLE parent_monthly_xxxxx_201403 PARTITION OF parent_monthly_00100_201403 FOR VALUES BETWEEN (2014, 3) AND (2014, 4); <snip> CREATE TABLE parent_monthly_xxxxx_yyyymm PARTITION OF parent_monthly_00100_yyyymm FOR VALUES BETWEEN (yyyy, mm AND (yyyy, mm); -- insert 1 tuple into the plain table INSERT INTO parent_monthly VALUES (2013, 12, 01); INSERT 0 1 Time: 3.303 ms -- insert 1 tuple into the partitioned table -- #part: number of partitions -- case 1: find no valid partition -- case 2: find a valid partition #parts case 1 case 2 ======== ======== ======== 10 3.248 ms 3.509 ms 100 3.546 ms 3.269 ms 500 3.497 ms 3.048 ms 1000 3.364 ms 5.379 ms 10000 4.943 ms 5.076 ms Thoughts? Thanks, Amit
Attachment
pgsql-hackers by date: