Thread: Declarative partitioning grammar
Hi all, Many of you will have read the dynamic partitioning thread here: http://archives.postgresql.org/pgsql-hackers/2008-01/msg00028.php I've proposed an alternative approach, which we've called declarative partitioning which is grammar based. This grammar was developed by Jeff Cohen at Greenplum with some assistance from myself. It is to be completely open source. The grammar will need some refinement and we'd appreciate comments, suggestions, etc. The grammar is designed to address the range of use cases we have out there. Basics ------ CREATE TABLE is modified to accept a PARTITION BY clause. This clause contains one or more partition declarations. The syntax is as follows: PARTITION BY {partition_type} (column_name[, column_name...]) [PARTITIONS number] ( partition_declaration[, partition_declaration...] ) The partition type can be one of HASH, RANGE or LIST. The column names are the partitioning key. The PARTITIONS sub clause instructs the system on the number of partitions to create if we're doing HASH or, in the case of LIST or RANGE can act as a safe guard for users who want to ensure that they do not generate more than a certain number of partitions. We have discussed adding the partition type REFERENCE which is akin to the LIKE clause in CREATE TABLE. That is, it duplicates the partition configuration of the specified table. Input would be appreciated. Partition declarations ---------------------- Hash ---- ... PARTITION BY HASH(order_date) PARTITIONS 5; This will create 5 partitions on the column order_date. Inserts will be distributed roughly evenly across the 5 partitions. List ---- ... PARTITION BY LIST (state) (PARTITION q1_northwest VALUES ('OR', 'WA'), PARTITION q1_southwest VALUES ('AZ','UT', 'NM'), PARTITION q1_northeast VALUES ('NY', 'VM', 'NJ'), PARTITION q1_southeast VALUES ('FL', 'GA'), PARTITION q1_northcentral VALUES ('SD', 'WI'), PARTITION q1_southcentral VALUES ('OK', 'TX')); Here, we produce 6 different partitions. The first partition groups states in the North West of the USA. We introduce here the named partition concept for clarity. Range ----- Range has the most expressive grammar. I'll introduce it in steps: ... PARTITION BY RANGE (b) ( PARTITION aa start (date '2007-01-01') end (date '2008-01-01'), PARTITION bb start (date '2008-01-01') end (date '2009-01-01') ); Here, we create 2 partitions: aa and bb. Partition aa has the range 2007-01-01 to 2008-01-01; partition bb has the range 2008-01-01 to 2009-01-01. Intervals always have this problem: are the bounds included in the range? To deal with this we define: the start of the range is included in the range. The ending bound is not. This can be modified with the keywords INCLUSIVE and EXCLUSIVE, which modify this property on a rule by rule basis. It is common that these partitions follow a pattern, such as following every week, month or year. So, we support the following specification: ... PARTITION BY RANGE(order_date) ( START (date '2005-12-01') end (date '2007-12-01') EVERY(interval '2months') ); If we like, we can mix the specification a little: ... PARTITION BY RANGE(order_date) ( PARTITION Q1_2005 end (date '2005-04-01'), PARTITION Q2_2005 end (date '2005-07-01'), PARTITION Q3_2005 end (date '2005-10-10'), PARTITION Q4_2005 end (date '2006-01-01'), START(date '2006-02-01') end (date '2008-04-01') EVERY (interval '2 weeks') ); an interesting result of the flexibility of the grammar we've come up with is that you can do something like this: ... PARTITION BY RANGE(order_date) ( PARTITION minny end date '2004-12-01'), end (date '2006-12-01'), PARTITIONmaxny start (date '2006-12-01') ); Here, when order_date is less than 2004-12-01, we put the data in minny, when it is between 2004-12-01 and 2006-12-01 we put it in an unnamed partition and after this we put it in maxny. Tablespaces ----------- We allow inline tablespace specification, such as: ... PARTITION BY RANGE(order_date) ( PARTITION minny TABLESPACE compress, start (date '2004-12-01') end(date '2006-12-01') TABLESPACE hot, PARTITION maxny TABLESPACE compress ); I've used the term compress here intentionally. A number of operating systems now ship file systems which can compress partitions. Users with issues with the amount of data they want to keep online can delay the time until they need new storage to a future date by compressing less regularly used data with this technique, for a performance cost. Data being used heavily can live on an uncompressed file system, affected. Multi-column support -------------------- We can have multi-column partitions. ... PARTITION BY LIST (state, deptno) ( VALUES ('OR', 1, 'WA', 1), VALUES ('AZ', 1, 'UT', 1, 'NM', 1), VALUES ('OR', 2, 'WA', 2), VALUES ('AZ', 2, 'UT', 2, 'NM', 2), PARTITION region_null VALUES (NULL, NULL), PARTITION region_other ); Looking at this syntax now, I think I prefer: VALUES ('OR', 1),('WA', 1) To specify keys for the same partition. Thoughts? Composite partition support --------------------------- Given that we're talking about systems with potentially very large amounts of data, power users may want to combine range partitioning with hash or list partitioning. For example, your analysis might always be on a date range but also be broken down by sales office. So, this would combine range and list partitioning (if the sales offices were known) or hash partitioning (if they weren't known). To do this, we introduce the SUBPARTITION clause: ... PARTITION BY RANGE(order_date) SUBPARTITION BY HASH (office_id) SUBPARTITIONS 8 ( start (date '2005-12-01')end (date '2007-12-01') every (interval '3 months'), start (date '2007-12-01') end (date '2008-12-01') every (interval '1 month') ); The first partition specification covers 8 partitions, the second 12 for 20 partitions in total. Once we add the subpartitioning we have 160 partitions in total (20 * 8). Subpartitioning by list can look like this (see templates below): ... PARTITION BY RANGE(order_date) SUBPARTITION BY LIST (customer_id) ( partition minny (subpartition c1 values(1), subpartition c2 values (2)), start (date '2004-12-01') end (date '2006-12-01') (subpartition c1 values (1), subpartition c2 values (2)), partition maxy (values (1), values(2) ) So, the list parameters of each sub partition look like arguments to the primary partition. Again, see templates below if you think this looks cumbersome. We do not preclude subpartitions of subpartitions. So, the following is valid: ... PARTITION BY HASH(b) PARTITIONS 2 SUBPARTITION BY HASH(d) SUBPARTITIONS 2, SUBPARTITION BY HASH(e) SUBPARTITIONS 2, SUBPARTITION BY HASH(f) SUBPARTITIONS 2, SUBPARTITION BY HASH(g) SUBPARTITIONS 2, SUBPARTITION BY HASH(h) SUBPARTITIONS 2; Subpartition templates ---------------------- There are times we want subpartitions to be laid out in a specific way for all partitions. To do this, we use templates: ... PARTITION BY RANGE (order_date) SUBPARTITION BY LIST (state) SUBPARTITION TEMPLATE ( SUBPARTITION northwestVALUES ('OR', 'WA'), SUBPARTITION southwest VALUES ('AZ', 'UT', 'NM'), SUBPARTITION northeast VALUES ('NY', 'VM', 'NJ'), SUBPARTITION southeast VALUES ('FL', 'GA'), SUBPARTITION northcentral VALUES ('SD', 'WI'), SUBPARTITION southcentral VALUES ('OK', 'TX') ) (start (date '2001-01-01') end (date '2010-01-01') every (interval '3 months') ) For each of the 36 odd partitions we create here, each is subpartitioned into geographical areas. Data management with ALTER -------------------------- These are all arguments to ALTER TABLE. All of these require validation against the existing specification. ADD --- For range and list partitioning, it's important to be able to add partitions for data not covered by the existing specification. So, we propose: ... ADD PARTITION q1_2008 end (date '2008-04-01') COALESCE (maybe) ---------------- For hash partitions, remove a partition from the number of hash partitions and distribute its data to the remaining partitions. ... COALESCE PARTITION [name]; I'm not sure if this is really used but other systems we looked at have it. Thoughts? DROP ---- For list and range partitions, drop a specified partition from the set of partitions. ... DROP PARTITION minny; This drops a named partition. Often, it will be difficult for users to know partition names, and they might be unnamed. So, we allow this syntax: ... DROP PARTITION FOR(date '2007-01-01'); for range partitions; and: ... DROP PARTITION FOR(VALUES('CA')); for list partitions. We've also discussed something like: ... DROP PARTITION FOR(POSITION(1)); so that users can easily drop a specific partition in an array of range partitions. It seems to me, though, that the use case is generally to drop the oldest partition so perhaps we should have a more explicit syntax. Thoughts? EXCHANGE -------- This sub-clause allows us to make a table a partition in a set of partitions or take a partition out of a set but keep it as a table. IBM uses ATTACH and DETACH, which is explicit but Oracle uses EXCHANGE. I'll explain the latter: ... EXCHANGE <partition identifier> WITH TABLE <table name> partition identifier is one of PARTITION <name> or PARTITION FOR(...). The partition in the partition set 'becomes' the table <table name> and vice-versa. Essentially, we'd swap the relfilenodes. This means that we have to first ADD PARTITION then swap the table and the partition. Thoughts? MERGE ----- You can merge and list partitions and any two range partitions: ... MERGE <partition id>, <partition id> [INTO PARTITION <partition name>] For range partitions: ... MERGE PARTITION FOR(date '2006_01_01'), PARTITION FOR(date '2007-01-01'); For list partitions: ... MERGE PARTITION FOR(VALUES('CA', 'MA') This begs the question of why we have COALESCE for hash partitioning. I don't know, it just seems like the right thing since you can't merge two hash partitions together (well, you shouldn't want to). RENAME ------ Rename a partition. We can use partition name or FOR clause. SPLIT ----- Split is used to divide a partition in two. It is designed for list and range partitioning but I guess we could/should support hash. I need to think about that. For RANGE partitions: ... SPLIT <partition id> <AT-clause> [INTO (PARTITION <partition name1>, PARTITION <partition name2>)]; AT clause specifies the point at which the partition is split in two: ... SPLIT PARTITION FOR(2000) AT 1000 INTO PARTITION (part1000, part2000) We might want ways to do this with unnamed partitions, it seems to me. Thoughts? For list: ... SPLIT PARTITION region_east AT( VALUES ('CT', 'MA', 'MD') ) INTO ( PARTITION region_east_1, PARTITIONregion_east_2 ); In this case, values from region_east specified in the AT() list are put in region_east_1 and the rest are put in region_east_2. I think a better way for supporting split with hash is via ADD. I'm sure some people think that ugly so I'd like feedback. TRUNCATE -------- Truncate a specified partition: ... TRUNCATE PARTITION FOR ('2005-01-01') We could specify a name too. This will use truncate internally.
Gavin Sherry wrote: > CREATE TABLE is modified to accept a PARTITION BY clause. This clause > contains one or more partition declarations. The syntax is as follows: > PARTITION BY {partition_type} (column_name[, column_name...]) > [PARTITIONS number] > ( > partition_declaration[, partition_declaration...] > > ) > The partition type can be one of HASH, RANGE or LIST. What would be the drawbacks of CREATE TABLE tablename(...) PARTITION BY function_taking_row_returning_partition_name instead of the explicit types? It seems that with my own function I could pretty easily emulate the HASH,RANGE,or LIST types. It seems a function returning a partition name would largely avoid the need for the sub-partition stuff too -- at least for the cases when the only reason you wanted sub-partitions was for composite partition support. I'm not sure if a function would give more flexibility, but it sure seems it'd be easier for me to remember than the various PARTITION BY LIST (a) (VALUES ('L') SUBPARTITION BY RANGE (b) (VALUES('x'),VALUES('y')),VALUES ('M') SUBPARTITION BY RANGE(b) (VALUES('u'),VALUES('t'))) or whowever it'd look.
Pardon my ignorance as I've never actually used partitioning before but plan to in the near future, but couldn't the grammar resemble a common WHERE clause more closely? > ... PARTITION BY RANGE(order_date) > ( > START (date '2005-12-01') end (date '2007-12-01') > EVERY(interval '2 months') > ); > PARTITION BY RANGE(order_date) ( WHERE order_date >= '2005-12-01' AND order_date < '2007-12-01' EVERY interval '2 months') OR PARTITION BY RANGE(order_date) ( WHERE order_date BETWEEN '2005-12-01' AND '2007-12-01' ) Of course using '>','>=','<','<=' instead of start/end eliminates any ambiguity along with the need for INCLUSIVE/EXCLUSIVE. > ... PARTITION BY LIST (state, deptno) > ( > VALUES ('OR', 1, 'WA', 1), > VALUES ('AZ', 1, 'UT', 1, 'NM', 1), > VALUES ('OR', 2, 'WA', 2), > VALUES ('AZ', 2, 'UT', 2, 'NM', 2), > PARTITION region_null VALUES (NULL, NULL), > PARTITION region_other > ); PARTITION BY LIST (state,deptno) (PARTITION one WHERE state in ('OR', WA') AND deptno = 1 PARTITION two WHERE state in('AZ', UT') AND deptno IN (1,2) PARTITION region_null WHERE state is null OR deptno is NULLPARTITION region_other ); Do you even need to list the columns in the PARTITION BY part? PARTITION BY LIST (PARTITION one WHERE state in ('OR', WA') AND deptno = 1 PARTITION two WHERE state in ('AZ', UT') ANDdeptno IN (1,2) PARTITION region_null WHERE state is null OR deptno is NULLPARTITION region_other ); Is there really a reason to not have a named partition as well? Sure it saves a few keystrokes, but it makes trying to do anything with them at a later date that much more difficult. Your originally suggested grammar might be shorter to type, but using WHERE clause syntax we are all familiar with seems a lot more intuitive to me on the surface at least. Why not try to reuse grammar that already exists as much as possible? On Sat, 2008-01-12 at 00:19 +0100, Gavin Sherry wrote: > CREATE TABLE is modified to accept a PARTITION BY clause. This clause > contains one or more partition declarations. The syntax is as follows: > > PARTITION BY {partition_type} (column_name[, column_name...]) > [PARTITIONS number] > ( > partition_declaration[, partition_declaration...] > > ) > List > ---- > > ... PARTITION BY LIST (state) > (PARTITION q1_northwest VALUES ('OR', 'WA'), > PARTITION q1_southwest VALUES ('AZ', 'UT', 'NM'), > PARTITION q1_northeast VALUES ('NY', 'VM', 'NJ'), > PARTITION q1_southeast VALUES ('FL', 'GA'), > PARTITION q1_northcentral VALUES ('SD', 'WI'), > PARTITION q1_southcentral VALUES ('OK', 'TX')); > > Here, we produce 6 different partitions. The first partition groups > states in the North West of the USA. We introduce here the named > partition concept for clarity. > > Range > ----- > > Range has the most expressive grammar. I'll introduce it in steps: > > ... PARTITION BY RANGE (b) > ( > PARTITION aa start (date '2007-01-01') end (date '2008-01-01'), > PARTITION bb start (date '2008-01-01') end (date '2009-01-01') > ); > > Here, we create 2 partitions: aa and bb. Partition aa has the range > 2007-01-01 to 2008-01-01; partition bb has the range 2008-01-01 to > 2009-01-01. Intervals always have this problem: are the bounds included > in the range? To deal with this we define: the start of the range is > included in the range. The ending bound is not. This can be modified > with the keywords INCLUSIVE and EXCLUSIVE, which modify this property on > a rule by rule basis. > > It is common that these partitions follow a pattern, such as following > every week, month or year. So, we support the following specification: > > ... PARTITION BY RANGE(order_date) > ( > START (date '2005-12-01') end (date '2007-12-01') > EVERY(interval '2 months') > ); > > If we like, we can mix the specification a little: > > ... PARTITION BY RANGE(order_date) > ( PARTITION Q1_2005 end (date '2005-04-01'), > PARTITION Q2_2005 end (date '2005-07-01'), > PARTITION Q3_2005 end (date '2005-10-10'), > PARTITION Q4_2005 end (date '2006-01-01'), > START (date '2006-02-01') end (date '2008-04-01') > EVERY (interval '2 weeks') > ); > > an interesting result of the flexibility of the grammar we've come up > with is that you can do something like this: > > ... PARTITION BY RANGE(order_date) > ( PARTITION minny end date '2004-12-01'), > end (date '2006-12-01'), > PARTITION maxny start (date '2006-12-01') > ); > > Here, when order_date is less than 2004-12-01, we put the data in minny, > when it is between 2004-12-01 and 2006-12-01 we put it in an unnamed > partition and after this we put it in maxny. > > Tablespaces > ----------- > > We allow inline tablespace specification, such as: > > ... PARTITION BY RANGE(order_date) > ( > PARTITION minny TABLESPACE compress, > start (date '2004-12-01') end (date '2006-12-01') TABLESPACE hot, > PARTITION maxny TABLESPACE compress > ); > > I've used the term compress here intentionally. A number of operating > systems now ship file systems which can compress partitions. Users with > issues with the amount of data they want to keep online can delay the > time until they need new storage to a future date by compressing less > regularly used data with this technique, for a performance cost. Data > being used heavily can live on an uncompressed file system, affected. > > Multi-column support > -------------------- > > We can have multi-column partitions. > > ... PARTITION BY LIST (state, deptno) > ( > VALUES ('OR', 1, 'WA', 1), > VALUES ('AZ', 1, 'UT', 1, 'NM', 1), > VALUES ('OR', 2, 'WA', 2), > VALUES ('AZ', 2, 'UT', 2, 'NM', 2), > PARTITION region_null VALUES (NULL, NULL), > PARTITION region_other > ); > > Looking at this syntax now, I think I prefer: > > VALUES ('OR', 1),('WA', 1) > > To specify keys for the same partition. Thoughts? > > Composite partition support > --------------------------- > > Given that we're talking about systems with potentially very large > amounts of data, power users may want to combine range partitioning with > hash or list partitioning. For example, your analysis might always be on > a date range but also be broken down by sales office. So, this would > combine range and list partitioning (if the sales offices were known) or > hash partitioning (if they weren't known). > > To do this, we introduce the SUBPARTITION clause: > > ... PARTITION BY RANGE(order_date) SUBPARTITION BY HASH (office_id) > SUBPARTITIONS 8 > ( > start (date '2005-12-01') end (date '2007-12-01') > every (interval '3 months'), > start (date '2007-12-01') > end (date '2008-12-01') every (interval '1 month') > ); > > The first partition specification covers 8 partitions, the second 12 for > 20 partitions in total. Once we add the subpartitioning we have 160 > partitions in total (20 * 8). > > Subpartitioning by list can look like this (see templates below): > > ... PARTITION BY RANGE(order_date) SUBPARTITION BY LIST (customer_id) > ( > partition minny (subpartition c1 values (1), subpartition > c2 values (2)), > start (date '2004-12-01') end (date '2006-12-01') > (subpartition c1 values (1), subpartition c2 values (2)), > partition maxy (values (1), values (2) > ) > > So, the list parameters of each sub partition look like arguments to the > primary partition. Again, see templates below if you think this looks > cumbersome. > > We do not preclude subpartitions of subpartitions. So, the following is > valid: > > ... PARTITION BY HASH(b) > PARTITIONS 2 > SUBPARTITION BY HASH(d) > SUBPARTITIONS 2, > SUBPARTITION BY HASH(e) SUBPARTITIONS 2, > SUBPARTITION BY HASH(f) SUBPARTITIONS 2, > SUBPARTITION BY HASH(g) SUBPARTITIONS 2, > SUBPARTITION BY HASH(h) SUBPARTITIONS 2; > > Subpartition templates > ---------------------- > > There are times we want subpartitions to be laid out in a specific way > for all partitions. To do this, we use templates: > > ... PARTITION BY RANGE (order_date) > SUBPARTITION BY LIST (state) > SUBPARTITION TEMPLATE > ( > SUBPARTITION northwest VALUES ('OR', 'WA'), > SUBPARTITION southwest VALUES ('AZ', 'UT', 'NM'), > SUBPARTITION northeast VALUES ('NY', 'VM', 'NJ'), > SUBPARTITION southeast VALUES ('FL', 'GA'), > SUBPARTITION northcentral VALUES ('SD', 'WI'), > SUBPARTITION southcentral VALUES ('OK', 'TX') > ) > (start (date '2001-01-01') end (date '2010-01-01') > every (interval '3 months') > ) > > For each of the 36 odd partitions we create here, each is subpartitioned > into geographical areas. > > Data management with ALTER > -------------------------- > > These are all arguments to ALTER TABLE. All of these require validation > against the existing specification. > > ADD > --- > > For range and list partitioning, it's important to be able to add > partitions for data not covered by the existing specification. So, we > propose: > > ... ADD PARTITION q1_2008 end (date '2008-04-01') > > COALESCE (maybe) > ---------------- > > For hash partitions, remove a partition from the number of hash > partitions and distribute its data to the remaining partitions. > > ... COALESCE PARTITION [name]; > > I'm not sure if this is really used but other systems we looked at have > it. Thoughts? > > DROP > ---- > > For list and range partitions, drop a specified partition from the set > of partitions. > > ... DROP PARTITION minny; > > This drops a named partition. Often, it will be difficult for users to > know partition names, and they might be unnamed. So, we allow this > syntax: > > ... DROP PARTITION FOR(date '2007-01-01'); > > for range partitions; and: > > ... DROP PARTITION FOR(VALUES('CA')); > > for list partitions. > > We've also discussed something like: > > ... DROP PARTITION FOR(POSITION(1)); > > so that users can easily drop a specific partition in an array of range > partitions. It seems to me, though, that the use case is generally to > drop the oldest partition so perhaps we should have a more explicit > syntax. Thoughts? > > EXCHANGE > -------- > > This sub-clause allows us to make a table a partition in a set of > partitions or take a partition out of a set but keep it as a table. IBM > uses ATTACH and DETACH, which is explicit but Oracle uses EXCHANGE. I'll > explain the latter: > > ... EXCHANGE <partition identifier> WITH TABLE <table name> > > partition identifier is one of PARTITION <name> or PARTITION FOR(...). > The partition in the partition set 'becomes' the table <table name> and > vice-versa. Essentially, we'd swap the relfilenodes. This means that we > have to first ADD PARTITION then swap the table and the partition. > Thoughts? > > MERGE > ----- > > You can merge and list partitions and any two range partitions: > > ... MERGE <partition id>, <partition id> [INTO PARTITION <partition name>] > > For range partitions: > > ... MERGE PARTITION FOR(date '2006_01_01'), PARTITION FOR(date '2007-01-01'); > > For list partitions: > > ... MERGE PARTITION FOR(VALUES('CA', 'MA') > > This begs the question of why we have COALESCE for hash partitioning. I > don't know, it just seems like the right thing since you can't merge two > hash partitions together (well, you shouldn't want to). > > RENAME > ------ > > Rename a partition. We can use partition name or FOR clause. > > SPLIT > ----- > > Split is used to divide a partition in two. It is designed for list and > range partitioning but I guess we could/should support hash. I need to > think about that. For RANGE partitions: > > ... SPLIT <partition id> <AT-clause> [INTO (PARTITION <partition name1>, > PARTITION <partition name2>)]; > > AT clause specifies the point at which the partition is split in two: > > ... SPLIT PARTITION FOR(2000) AT 1000 INTO PARTITION (part1000, > part2000) > > We might want ways to do this with unnamed partitions, it seems to me. > Thoughts? > > For list: > > ... SPLIT PARTITION region_east AT( VALUES ('CT', 'MA', 'MD') ) > INTO > ( > PARTITION region_east_1, > PARTITION region_east_2 > ); > > In this case, values from region_east specified in the AT() list are put in > region_east_1 and the rest are put in region_east_2. > > I think a better way for supporting split with hash is via ADD. I'm sure > some people think that ugly so I'd like feedback. > > TRUNCATE > -------- > > Truncate a specified partition: > > ... TRUNCATE PARTITION FOR ('2005-01-01') > > We could specify a name too. > > This will use truncate internally. > > ---------------------------(end of broadcast)--------------------------- > TIP 6: explain analyze is your friend -- Mike <ipso@snappymail.ca>
On Jan 11, 2008 3:42 PM, Ron Mayer <rm_pg@cheapcomplexdevices.com> wrote: > What would be the drawbacks of > CREATE TABLE tablename(...) > PARTITION BY function_taking_row_returning_partition_name > instead of the explicit types? Would that still allow the optimizer to work as well as it could? It seems that an arbitrary map like that can't be optimized very well as it might be too general. wt
On Jan 11, 2008, at 4:16 PM, Warren Turkal wrote: > On Jan 11, 2008 3:42 PM, Ron Mayer <rm_pg@cheapcomplexdevices.com> > wrote: >> What would be the drawbacks of >> CREATE TABLE tablename(...) >> PARTITION BY function_taking_row_returning_partition_name >> instead of the explicit types? > > Would that still allow the optimizer to work as well as it could? It > seems that an arbitrary map like that can't be optimized very well as > it might be too general. We did look at allowing general functions for partitioning and this was one concern. The other is that we want to enforce that a row only gets inserted into a single partition, so we wanted a declarative syntax where it was relatively easy to check that range and list specifications don't overlap. kind regards, Jeff Cohen
On Jan 11, 2008 6:19 PM, Gavin Sherry <swm@alcove.com.au> wrote: > Hi all, > > Many of you will have read the dynamic partitioning thread here: > > http://archives.postgresql.org/pgsql-hackers/2008-01/msg00028.php > > I've proposed an alternative approach, which we've called declarative > partitioning which is grammar based. This grammar was developed by Jeff > Cohen at Greenplum with some assistance from myself. It is to be > completely open source. I like the syntax, but what I really want to know is how well this is going to work with the query planner. The current 8.x table partitioning mechanism has a lot of issues...it causes more problems than it solves unless you are willing to strictly control how you query the tables...I usually end up rolling my own. Are you confident that what you propose will integrate well with the planner and produce (as much as possible) fully optimized queries? merlin
On Fri, Jan 11, 2008 at 07:46:36PM -0500, Merlin Moncure wrote: > On Jan 11, 2008 6:19 PM, Gavin Sherry <swm@alcove.com.au> wrote: > > Hi all, > > > > Many of you will have read the dynamic partitioning thread here: > > > > http://archives.postgresql.org/pgsql-hackers/2008-01/msg00028.php > > > > I've proposed an alternative approach, which we've called declarative > > partitioning which is grammar based. This grammar was developed by Jeff > > Cohen at Greenplum with some assistance from myself. It is to be > > completely open source. > > > I like the syntax, but what I really want to know is how well this is > going to work with the query planner. The current 8.x table > partitioning mechanism has a lot of issues...it causes more problems > than it solves unless you are willing to strictly control how you > query the tables...I usually end up rolling my own. The syntax is half the problem, performance is the other. I will bring the performance issues up in another thread. Yes, we are confident that we can address the performance issues that rule out the existing partitioning for many applications. We need it for our own stuff! :P Thanks, Gavin
On Jan 11, 2008, at 4:03 PM, Mike wrote: > Pardon my ignorance as I've never actually used partitioning before > but > plan to in the near future, but couldn't the grammar resemble a common > WHERE clause more closely? >> Hi Mike, Thanks for your suggestions. The current syntax we chose is similar to syntax used by IBM, Oracle, and mysql, so it is familiar to folks who have used partitioning with other databases. A WHERE clause would of course be understandable by everyone, but it makes error checking more difficult, since we want to ensure that partition specifications don't overlap. In order to make such error checking feasible, we would have to restrict the set of predicates you can use in the WHERE clause, so it wouldn't be completely general anyway. > Is there really a reason to not have a named partition as well? > Sure it > saves a few keystrokes, but it makes trying to do anything with > them at > a later date that much more difficult. For the case of partition by HASH, you can just specify the number of buckets, so it might not be meaningful to name the partitions. For range partitions, many users perform "rolling upgrades" on a regular basis, where they drop the oldest data and add a new partition with the latest data, so they might just refer to partitions by "position" (either an ordinal number or using a keyword like FIRST/ LAST). kind regards, Jeff
Hi,
EXCHANGE
--------
This sub-clause allows us to make a table a partition in a set of
partitions or take a partition out of a set but keep it as a table. IBM
uses ATTACH and DETACH, which is explicit but Oracle uses EXCHANGE. I'll
explain the latter:
... EXCHANGE <partition identifier> WITH TABLE <table name>
partition identifier is one of PARTITION <name> or PARTITION FOR(...).
The partition in the partition set 'becomes' the table <table name> and
vice-versa. Essentially, we'd swap the relfilenodes. This means that we
have to first ADD PARTITION then swap the table and the partition.
Thoughts?
Regards,
I've proposed an alternative approach, which we've called declarative
partitioning which is grammar based. This grammar was developed by Jeff
Cohen at Greenplum with some assistance from myself. It is to be
completely open source.
<..>
FWIW, I had done some very initial work on declarative partitioning (no where as exhaustive as this proposal) and submitted a wip patch here:
Kindly take a look at the patch, to see if would be useful to you folks in any way.
<..>
Range
-----
Range has the most expressive grammar. I'll introduce it in steps:
... PARTITION BY RANGE (b)
(
PARTITION aa start (date '2007-01-01') end (date '2008-01-01'),
PARTITION bb start (date '2008-01-01') end (date '2009-01-01')
);
It is common that these partitions follow a pattern, such as following
every week, month or year. So, we support the following specification:
... PARTITION BY RANGE(order_date)
(
START (date '2005-12-01') end (date '2007-12-01')
EVERY(interval '2 months')
);
-----
Range has the most expressive grammar. I'll introduce it in steps:
... PARTITION BY RANGE (b)
(
PARTITION aa start (date '2007-01-01') end (date '2008-01-01'),
PARTITION bb start (date '2008-01-01') end (date '2009-01-01')
);
It is common that these partitions follow a pattern, such as following
every week, month or year. So, we support the following specification:
... PARTITION BY RANGE(order_date)
(
START (date '2005-12-01') end (date '2007-12-01')
EVERY(interval '2 months')
);
<..>
It will be interesting to see how this start,end, interval usage accomodates data types other than dates. I hope, this specification is not influenced overlty just by dates-like partitions.
<..>
ADD
---
For range and list partitioning, it's important to be able to add
partitions for data not covered by the existing specification. So, we
propose:
... ADD PARTITION q1_2008 end (date '2008-04-01')
---
For range and list partitioning, it's important to be able to add
partitions for data not covered by the existing specification. So, we
propose:
... ADD PARTITION q1_2008 end (date '2008-04-01')
<..>
What about data that does not match any existing partition specification? It might make sense to have a dummy partition which handles all these cases.
<..>
DROP
----
For list and range partitions, drop a specified partition from the set
of partitions.
... DROP PARTITION minny;
This drops a named partition. Often, it will be difficult for users to
know partition names, and they might be unnamed. So, we allow this
syntax:
... DROP PARTITION FOR(date '2007-01-01');
for range partitions; and:
... DROP PARTITION FOR(VALUES('CA'));
for list partitions.
We've also discussed something like:
... DROP PARTITION FOR(POSITION(1));
so that users can easily drop a specific partition in an array of range
partitions. It seems to me, though, that the use case is generally to
drop the oldest partition so perhaps we should have a more explicit
syntax. Thoughts?
----
For list and range partitions, drop a specified partition from the set
of partitions.
... DROP PARTITION minny;
This drops a named partition. Often, it will be difficult for users to
know partition names, and they might be unnamed. So, we allow this
syntax:
... DROP PARTITION FOR(date '2007-01-01');
for range partitions; and:
... DROP PARTITION FOR(VALUES('CA'));
for list partitions.
We've also discussed something like:
... DROP PARTITION FOR(POSITION(1));
so that users can easily drop a specific partition in an array of range
partitions. It seems to me, though, that the use case is generally to
drop the oldest partition so perhaps we should have a more explicit
syntax. Thoughts?
<..>
Surely, the partitions will get (default, parent inferred) names when they get created? Do we expect the users to remember FOR() specifications like the ones mentioned above? It might make sense to have a "\d in psql" e.g to present a parent with all its named partitions alongwith the partition clauses to facilitate drop partition using partition names.
<..>
EXCHANGE
--------
This sub-clause allows us to make a table a partition in a set of
partitions or take a partition out of a set but keep it as a table. IBM
uses ATTACH and DETACH, which is explicit but Oracle uses EXCHANGE. I'll
explain the latter:
... EXCHANGE <partition identifier> WITH TABLE <table name>
partition identifier is one of PARTITION <name> or PARTITION FOR(...).
The partition in the partition set 'becomes' the table <table name> and
vice-versa. Essentially, we'd swap the relfilenodes. This means that we
have to first ADD PARTITION then swap the table and the partition.
Thoughts?
<..>
Surely this wont be instantaneous?
<..>
Regards,
Hi,
We did look at allowing general functions for partitioning and this
was one concern. The other is that we want to enforce that a row
only gets inserted into a single partition, so we wanted a
declarative syntax where it was relatively easy to check that range
and list specifications don't overlap.
Detection of mutually exclusive ranges might not turn out to be so easy afterall. I think there is some code in the constraint_exclusion area which might help out in this.
Regards,
Hi,
On Jan 12, 2008 6:29 AM, Gavin Sherry <swm@alcove.com.au> wrote:
-- The syntax is half the problem, performance is the other. I will bring
the performance issues up in another thread. Yes, we are confident that
we can address the performance issues that rule out the existing
partitioning for many applications. We need it for our own stuff! :P
Agreed, syntax is just the sugar.
Also other than performance, how are updates involving partition keys causing the resultant tuple to end up in a new partition handled here?
Regards,
Nikhils
EnterpriseDB http://www.enterprisedb.com
Jeff Cohen wrote: > In order to make such error checking > feasible, we would have to restrict the set of predicates you can use > in the WHERE clause, so it wouldn't be completely general anyway. Well, with an extensible system such as PostgreSQL you will need to have a partitioning scheme that can deal with extensions. Perhaps people want to partition by XML, GIS, text-search data, or whatever someone might come up with in the future. One possible way to achieve that might be to redefine your concepts of hash, list, and range in terms of operator classes (or operator families or other operator structures?). Those have well-defined properties as to how the operators behave relative to each other, so checking the partition definitions for mutual exclusivity and other properties would be possible. -- Peter Eisentraut http://developer.postgresql.org/~petere/
On Sat, 2008-01-12 at 01:59 +0100, Gavin Sherry wrote: > The syntax is half the problem, performance is the other. The syntax looks great to me, but I think it is about 5% of the problem, maybe less. I don't really have any questions about the syntax, but I may have thoughts when the implementation details emerge. I'm not sure you'll be able to use PARTITION BY since its part of the SQL Standard for Windowed grouping, which we do hope to implement one day. It will be confusing to have two completely separate meanings for the one phrase in our grammar. The burning questions from my perspective are: What is a partition? How will the syntax be implemented within the backend? -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com
On Sat, Jan 12, 2008 at 05:47:30PM +0000, Simon Riggs wrote: > On Sat, 2008-01-12 at 01:59 +0100, Gavin Sherry wrote: > > The syntax is half the problem, performance is the other. > > The syntax looks great to me, but I think it is about 5% of the problem, > maybe less. I don't really have any questions about the syntax, but I > may have thoughts when the implementation details emerge. Yes, that's for another thread. Since the discussion was abot using grammar to control partitions I wanted to get some grammar out. More details on other stuff soon. > > I'm not sure you'll be able to use PARTITION BY since its part of the > SQL Standard for Windowed grouping, which we do hope to implement one > day. It will be confusing to have two completely separate meanings for > the one phrase in our grammar. I think it's fine. It doesn't cause conflicts in the grammar (in fact, the Greenplum grammar implements both meanings right now with no confusion). Thanks, Gavin
Hi, Jeff Cohen wrote: > We did look at allowing general functions for partitioning and this was > one concern. The other is that we want to enforce that a row only gets > inserted into a single partition, so we wanted a declarative syntax > where it was relatively easy to check that range and list specifications > don't overlap. Why do you need to define a split point so ambiguously at all? Why not just give the DBA exactly *one* place to define the split point? I don't think the separation into list, hash and range partitioning is adequate. What is the system supposed to do, if you try to insert a row which doesn't fit any of the values in your list or doesn't fit any of the ranges you defined? I prefer a partitioning grammar which doesn't interfere with constraints. We all know how to define constraints. Please don't introduce a new, ambiguous way. A partitioning definition should be able to tell the target partition for *every* row which satisfies the constraints (the real ones, not ambiguous ones). IMO, a single DDL command should only touch a single split point, i.e. split a table into two partitions, move the split point or remove the split point (joining the partitions again). Those are the only basic commands you need to be able to handle partitioning. Sorry, but for my taste, the proposed grammar is too long per command, not flexible enough and instead ambiguous for split points as well as for constraints. To me it looks like repeating the mistakes of others. Regards Markus
On Sat, Jan 12, 2008 at 04:01:19PM +0530, NikhilS wrote: > Hi, > > > We did look at allowing general functions for partitioning and this > > was one concern. The other is that we want to enforce that a row > > only gets inserted into a single partition, so we wanted a > > declarative syntax where it was relatively easy to check that range > > and list specifications don't overlap. > > > > Detection of mutually exclusive ranges might not turn out to be so easy > afterall. I think there is some code in the constraint_exclusion area which > might help out in this. In some prototyping code it didn't seem too difficult but if we've made a mistake we might have to look at the CE code. Thanks, Gavin
On Jan 14, 2008, at 1:49 AM, Markus Schiltknecht wrote: > I don't think the separation into list, hash and range partitioning > is adequate. What is the system supposed to do, if you try to > insert a row which doesn't fit any of the values in your list or > doesn't fit any of the ranges you defined? Hi Markus, If you don't define a "default" partition to handle outliers, the insert should fail with an error. > I prefer a partitioning grammar which doesn't interfere with > constraints. We all know how to define constraints. Please don't > introduce a new, ambiguous way. A partitioning definition should be > able to tell the target partition for *every* row which satisfies > the constraints (the real ones, not ambiguous ones). > > IMO, a single DDL command should only touch a single split point, > i.e. split a table into two partitions, move the split point or > remove the split point (joining the partitions again). Those are > the only basic commands you need to be able to handle partitioning. I can certainly appreciate the simplicity of this approach. It lets us use a generic check constraint to perform partitioning, so it is more general than partitioning using hash, list, and range. However, it achieves this generality at the expense of usability for typical customer cases. For example, let's look at the case of a table of 1 year of sales data, where we want to create 12 partitions -- one for each month. With the generic approach, you start with a single table, and start by splitting it into two six-month partitions: ALTER TABLE sales SPLIT where sales_date > date '2007-06-01' INTO ( PARTITION first_half PARTITION second_half ); We could implement this approach using check constraints and table inheritance: the partition second_half is a child table where sales_date > date '2007-06-01', and the partition first_half has the complementary constraint NOT(sales_date > date '2007-06-01'). Next, you split each partition: ALTER TABLE sales SPLIT PARTITION first_half where sales_date > date '2007-03-01' INTO ( PARTITION first_quarter PARTITION second_quarter ); So now the child table for first_half itself has two children. As you continue this process you construct a binary tree of table inheritance using 12 ALTER statements. In the "long" grammar you can create and partition the table in one statement: CREATE TABLE sales ... PARTITION BY sales_date ( start (date '2007-01-01') end (date '2008-01-01') every (interval '1 month') ); > Sorry, but for my taste, the proposed grammar is too long per > command, not flexible enough and instead ambiguous for split points > as well as for constraints. To me it looks like repeating the > mistakes of others. Thanks for your feedback. Partitioning the table using series of splits is a clever solution for situations where the partitioning operation cannot be described using simple equality (like list,hash) or ordered comparison (range). But for many common business cases, the "long" grammar is easier to specify. kind regards, Jeff
On Jan 12, 2008, at 9:34 AM, Peter Eisentraut wrote: > Well, with an extensible system such as PostgreSQL you will need to > have a > partitioning scheme that can deal with extensions. Perhaps people > want to > partition by XML, GIS, text-search data, or whatever someone might > come up > with in the future. Hi Peter, In the proposed solution, hash and list partitions work for all types that support an equality operator, and range partitions work for all types that support fully-ordered comparison. kind regards, Jeff
Jeff Cohen <jcohen@greenplum.com> writes: > In the proposed solution, hash and list partitions work for all types > that support an equality operator, and range partitions work for all > types that support fully-ordered comparison. Surely a hashing method would require a *hashable* equality operator, ie a hash opclass; likewise range partitions would demand a matching btree opclass. You could do list partitions with an equality operator of either kind. Essentially all of the system's current knowledge about the properties of specific operators is encoded as operator classes for one of these two built-in index types. If you want to make assumptions about the behavior of an operator, it really needs to be founded on these types of opclasses --- or else you're buying into inventing a comparable amount of infrastructure for some other organizational concept. I think Peter's point was that you might want to think about generalizing your concepts so that other kinds of operator classes could someday serve as the foundations for other kinds of partitioning rules. regards, tom lane
Hi Jeff, Jeff Cohen wrote: > If you don't define a "default" partition to handle outliers, the > insert should fail with an error. IMO, you should always have a "default" partition, then, so as not to violate the constraints (by rejecting tuples which are correct according to the constraints). > With the generic approach, you start with a single table, and start by > splitting it into two six-month partitions: > > ALTER TABLE sales > SPLIT where sales_date > date '2007-06-01' > INTO > ( > PARTITION first_half > PARTITION second_half > ); > > We could implement this approach using check constraints and table > inheritance: the partition second_half is a child table where sales_date > > date '2007-06-01', and the partition first_half has the complementary > constraint NOT(sales_date > date '2007-06-01'). > > Next, you split each partition: > > ALTER TABLE sales > SPLIT PARTITION first_half where sales_date > date '2007-03-01' > INTO > ( > PARTITION first_quarter > PARTITION second_quarter > ); > > So now the child table for first_half itself has two children. As you > continue this process you construct a binary tree of table inheritance > using 12 ALTER statements. <nitpicking>There are just 11 splits between 12 months, otherwise correct, yes.</nitpicking> > In the "long" grammar you can create and partition the table in one > statement: > > CREATE TABLE sales > ... > PARTITION BY sales_date > ( > start (date '2007-01-01') end (date '2008-01-01') > every (interval '1 month') > ); To be fair, you should add the 12 partition names here as well. I can certainly see merit in letting the database system handle the binary tree. > Thanks for your feedback. Partitioning the table using series of splits > is a clever solution for situations where the partitioning operation > cannot be described using simple equality (like list,hash) or ordered > comparison (range). But for many common business cases, the "long" > grammar is easier to specify. Easier to specify initially, maybe, yes. But how about managing it afterwards? Having seen all the different options for merging, splitting, exchanging, coalescing and adding, all of them with small little differences for hash, range and list partitioning - let alone sub-partitioning - with all of that, the proposed grammar doesn't look particularly easy to me. Let's at least drop the differences for list, hash and range partitioning, those are pretty unneeded, IMO. Regards Markus
Ühel kenal päeval, E, 2008-01-14 kell 10:49, kirjutas Markus Schiltknecht: > Hi, > > Jeff Cohen wrote: > > We did look at allowing general functions for partitioning and this was > > one concern. The other is that we want to enforce that a row only gets > > inserted into a single partition, so we wanted a declarative syntax > > where it was relatively easy to check that range and list specifications > > don't overlap. > > Why do you need to define a split point so ambiguously at all? Why not > just give the DBA exactly *one* place to define the split point? > > I don't think the separation into list, hash and range partitioning is > adequate. What is the system supposed to do, if you try to insert a row > which doesn't fit any of the values in your list or doesn't fit any of > the ranges you defined? I guess it would go to some "default" partition ? ... > IMO, a single DDL command should only touch a single split point, i.e. > split a table into two partitions, move the split point or remove the > split point (joining the partitions again). Those are the only basic > commands you need to be able to handle partitioning. sure, but this can become really tedious for 1024 partitions, not to mention hard for optimiser. > Sorry, but for my taste, the proposed grammar is too long per command, > not flexible enough and instead ambiguous for split points as well as > for constraints. To me it looks like repeating the mistakes of others. what mistakes ? ----------------- Hannu
Hi, Hannu Krosing wrote: > I guess it would go to some "default" partition ? Which doesn't have a name so far, which prevents from addressing that partition. Nor is it pretty well defined, it's just a rest. > sure, but this can become really tedious for 1024 partitions, Well, managing 1024 partitions manually is a tedious job, no matter what grammar you take: You'll have to deal with 1024 different partition names. What do you need so many partitions for? > not to > mention hard for optimiser. It's pretty much the same for the optimizer: a binary tree. Granted, that binary tree should better be balanced by the RDBMS. Regards Markus
sure, but this can become really tedious for 1024 partitions,Well, managing 1024 partitions manually is a tedious job, no matter what grammar you take: You'll have to deal with 1024 different partition names.What do you need so many partitions for?
imagine a structure which is partitioned by day. if you keep data for 3 years (which is perfectly reasonable) you already have 1000 partitions.
some applications produce so much data that splitting it into days is perfectly reasonable.
if your get 30 GB a day making monthly tables is not too funny anymore ...
just think of CREATE INDEX or VACUUM ...
having so many tables is not funny but it can be the only reasonable choice.
best regards,
hans
--
Cybertec Schönig & Schönig GmbH
PostgreSQL Solutions and Support
Gröhrmühlgasse 26, 2700 Wiener Neustadt
Tel: +43/1/205 10 35 / 340
www.postgresql.at, www.cybertec.at
Hi, Hans-Juergen Schoenig wrote: >> What do you need so many partitions for? > > having so many tables is not funny but it can be the only reasonable choice. Well, what do you do with all those partitions? Most of them will end up on the same storage subsystem. So, if you don't partition to spread your data across storage with different characteristics, why do you need partitioning at all? Isn't an index better in most cases? Or are you using it as a form of CLUSTERing? Where you expect to reduce time for sequential scans over a range? Simon's Segment Exclusion proposal looks like a much better fit to that purpose, IMO. It would prevent you from having to handle all those partitions manually. Regards Markus
Markus Schiltknecht <markus@bluegap.ch> writes: > Jeff Cohen wrote: >> If you don't define a "default" partition to handle outliers, the >> insert should fail with an error. > IMO, you should always have a "default" partition, then, so as not to > violate the constraints (by rejecting tuples which are correct according > to the constraints). I don't agree with that at all. I can imagine plenty of situations where a tuple falling outside the range of available partitions *should* be treated as an error. For instance, consider timestamped observations --- data in the future is certainly bogus, and data further back than you want to deal with must be an entry error as well. I agree that there needs to be a way to have a "default" partition, but there needs to be a way to not have one, too. regards, tom lane
Hi, Tom Lane wrote: > I don't agree with that at all. I can imagine plenty of situations > where a tuple falling outside the range of available partitions *should* > be treated as an error. For instance, consider timestamped observations > --- data in the future is certainly bogus, and data further back than > you want to deal with must be an entry error as well. Isn't it better to have these constraints as table constraints, instead of burying them in the partitioning definition? Mixing those two concepts seems very wired to me. Or am I missing any benefit of mixing them? Regards Markus
"Markus Schiltknecht" <markus@bluegap.ch> writes: > Hi, > > Hans-Juergen Schoenig wrote: >>> What do you need so many partitions for? >> >> having so many tables is not funny but it can be the only reasonable choice. > > Well, what do you do with all those partitions? In a previous life I had a database which had daily partitions. I assure you it was unquestionably the right decision. Each day's data wasn't just distinguished by the timestamp but actually was entirely separate from the previous day's data. Both the archiving strategy and the many reports which were ran all depended specifically on the day the data was collected on. Much like partial indexes the partition key effectively gives you a free index key element which can be combined with any other index without extra i/o. Often this lets you avoid a sort letting you perform some queries as OLTP queries with no startup cost which otherwise would have DSS style plans with sorts. Better yet it can be used even in a sequential scan which performs better than even a perfectly clustered index. Daily partitioned let us go from a weekly batch job which was threatening to overrun the maintenance window to a daily batch job which completed in minutes. As a result data warehouse reports could be delivered the same night instead of being hours, and in some cases days, delayed. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services!
On Tue, Jan 15, 2008 at 10:36:17AM -0500, Tom Lane wrote: > Markus Schiltknecht <markus@bluegap.ch> writes: > > Jeff Cohen wrote: > >> If you don't define a "default" partition to handle outliers, the > >> insert should fail with an error. > > > IMO, you should always have a "default" partition, then, so as not to > > violate the constraints (by rejecting tuples which are correct according > > to the constraints). > > I don't agree with that at all. I can imagine plenty of situations > where a tuple falling outside the range of available partitions *should* > be treated as an error. For instance, consider timestamped observations > --- data in the future is certainly bogus, and data further back than > you want to deal with must be an entry error as well. > > I agree that there needs to be a way to have a "default" partition, > but there needs to be a way to not have one, too. Jeff and I discussed this and we came to the same conclusion. We will propose grammar for handling it. Many users we talk to would fall into the class of people who would want an error if the data fell outside the defined partitions. Thanks, Gavin
Markus Schiltknecht <markus@bluegap.ch> writes: > Tom Lane wrote: >> I don't agree with that at all. I can imagine plenty of situations >> where a tuple falling outside the range of available partitions *should* >> be treated as an error. > Isn't it better to have these constraints as table constraints, instead > of burying them in the partitioning definition? Mixing those two > concepts seems very wired to me. DBAs tend to be belt *and* suspenders guys, no? I'd think a lot of them would want a table constraint, plus a partitioning rule that rejects anything outside the intended partitions. regards, tom lane
Hi, Gregory Stark wrote: > In a previous life I had a database which had daily partitions. I assure you > it was unquestionably the right decision. Each day's data wasn't just > distinguished by the timestamp but actually was entirely separate from the > previous day's data. Both the archiving strategy and the many reports which > were ran all depended specifically on the day the data was collected on. Wouldn't Segment Exclusion (maybe together with a specialized form of CLUSTERing) handle that case much better than partitioning? Without the need to name all those thousands of partitions and manage them manually. What I would want in such a case, is exactly not manual management of partitions, but rather a performance optimization for scanning a range of rows, which is something in between indexes (for very few rows) and a seq scan (for almost all rows of a table). I know, this now sounds like I've turned sides to Simon's proposal. And yes, in a way, that's true. I certainly see merit for Segment Exclusion, more and more. OTOH I'm still skeptical about it replacing declarative partitioning entirely. But declarative partitioning only really makes sense, if you partition into different storage subsystems, IMO. Everything happening on the same storage subsystem shouldn't need manual partitioning, but should be optimized pretty automatically. As Simon proposed, that's well possible in many cases. Regards Markus
> > I don't agree with that at all. I can imagine plenty of situations > > where a tuple falling outside the range of available partitions *should* > > be treated as an error. For instance, consider timestamped observations > > --- data in the future is certainly bogus, and data further back than > > you want to deal with must be an entry error as well. > > Isn't it better to have these constraints as table constraints, instead > of burying them in the partitioning definition? Mixing those two > concepts seems very wired to me. Yes, but the problem with the timestamp partitioned tables is, that the window is sliding. Thus you would need two alter tables for each new period. One that changes the constraint + one that creates the new partition. So it seems natural to join the two concepts for such a partitioning syntax. Personally I find the automatic partition idea intriguing, where you only have to choose an expression that equates to one value (value group) per partition (and possibly a way to derive a partition name). Then a partition is automatically created when a new row arrives for a new value. That does not however address Tom's concern of rejecting data that is outside the acceptable window, but maybe that is better dealt with in the application anyways. Andreas
Hi, Tom Lane wrote: > DBAs tend to be belt *and* suspenders guys, no? I rather know those admins with stupid looking faces who are wondering why their transactions fail. Often enough, that can have a lot of different reasons. Extending the set of possible traps doesn't seem like a clever idea for those admins. > I'd think a lot of them > would want a table constraint, plus a partitioning rule that rejects > anything outside the intended partitions. I'm rather a fan of the DRY principle (don't repeat yourself). Because having to maintain redundant constraints smells suspiciously like a maintenance nightmare. And where's the real use of making the database system check twice? Want to protect against memory corruption in between the two checks, eh? :-) Regards Markus
Hi, Zeugswetter Andreas ADI SD wrote: > Yes, but the problem with the timestamp partitioned tables is, that the > window is sliding. Thus you would need two alter tables for each new > period. One that changes the constraint + one that creates the new > partition. So it seems natural to join the two concepts for such a > partitioning syntax. If you think in terms of split points, having to alter two table is not true. It's better > Personally I find the automatic partition idea intriguing, where you > only have to choose an expression that equates to one value (value > group) per partition (and possibly a way to derive a partition name). > Then a partition is automatically created when a new row arrives for a > new value. That does not however address Tom's concern of rejecting data > that is outside the acceptable window, but maybe that is better dealt > with in the application anyways. > > Andreas > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match
Hi, (sorry for the previous one, if delivered, that went of too early...) Zeugswetter Andreas ADI SD wrote: > Yes, but the problem with the timestamp partitioned tables is, that the > window is sliding. Thus you would need two alter tables for each new > period. One that changes the constraint + one that creates the new > partition. So it seems natural to join the two concepts for such a > partitioning syntax. If you think in terms of split points, having to alter two partitions isn't true, you just add a split point. Of course, that also alters the "constraints" of the partitions, but I think we all agree that the system should maintain those constraints automatically, anyway. As such, they don't even have to be visible to the DBA. > Personally I find the automatic partition idea intriguing, where you > only have to choose an expression that equates to one value (value > group) per partition (and possibly a way to derive a partition name). IMO, better go right to a fully automated approach. Or why would you need partition names in such a case? Regards Markus
"Markus Schiltknecht" <markus@bluegap.ch> writes: > Hi, > > Gregory Stark wrote: >> In a previous life I had a database which had daily partitions. I assure you >> it was unquestionably the right decision. Each day's data wasn't just >> distinguished by the timestamp but actually was entirely separate from the >> previous day's data. Both the archiving strategy and the many reports which >> were ran all depended specifically on the day the data was collected on. > > Wouldn't Segment Exclusion (maybe together with a specialized form of > CLUSTERing) handle that case much better than partitioning? Without the need to > name all those thousands of partitions and manage them manually. Firstly we have a philosophical difference here. I think crossing my fingers and hoping for a performance boost from some low level infrastructure that I can't measure or monitor is not better than managing things so I know precisely what's going on and can see it for myself in the plan. Secondly, no, it wouldn't handle it better. Consider a use case: select * from data where user_id = ? and data_set = ? Where I partition by data_set once a day. Now consider a user which has a dozen entries every day for the past 10 years. That's 40k records or so. In the segment exclusion case I have to descend a huge btree index which manages every record for every user for the past 10 years. (nevermind trying to manage such an index when it comes time to archive data.) Take the 40k records I pull from it, and for each one look up the physical location in some kind of segment endpoint data structure, presumably using a binary search. Once I've done the binary search for each of the 40k index pointers I can compare the data_set parameter with the low and high end point of the partition. They'll all be rejected except the desired partition and I'll then have to go look up the records for the dozen or so index pointers in that partition. The alternative is a plan which says to do a normal index lookup on a small manageable index for a single partition. The index will contain 12 index pointers which are precisely the records I'm interested in. I look up those 12 records and I'm done. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support!
> > Personally I find the automatic partition idea intriguing, where you > > only have to choose an expression that equates to one value (value > > group) per partition (and possibly a way to derive a > partition name). > > IMO, better go right to a fully automated approach. Or why would you > need partition names in such a case? Yes, I tend to aggree on that, with the exception that I think the partition borders should be declarative expressions (equal in syntax to a GROUP BY clause). Names are only for a short and crisp way to identify the partition for the following operations: - drop/detach data in a partition (as opposed to a normal delete) - move to/create in other tablespace (I don't value that one, but others have) - reorg, create index The drop can probably be done straight from a normal delete (a problem is RI and triggers though). The normal delete would need to have the smarts, that a delete covers a whole partition, and thus mark the whole partition dead instead of it's individual rows. A detach would need some extra syntax, but could also be based on a where clause that specifies which partitions are to be detached. I am not so sure about how to usefully do the reorg part with where clauses. Andreas
On Mon, Jan 14, 2008 at 10:45:28PM -0500, Tom Lane wrote: > Jeff Cohen <jcohen@greenplum.com> writes: > > In the proposed solution, hash and list partitions work for all types > > that support an equality operator, and range partitions work for all > > types that support fully-ordered comparison. > > Surely a hashing method would require a *hashable* equality operator, > ie a hash opclass; likewise range partitions would demand a matching > btree opclass. You could do list partitions with an equality operator > of either kind. Right. > > Essentially all of the system's current knowledge about the properties > of specific operators is encoded as operator classes for one of these > two built-in index types. If you want to make assumptions about the > behavior of an operator, it really needs to be founded on these types of > opclasses --- or else you're buying into inventing a comparable amount > of infrastructure for some other organizational concept. Right, we obviously don't want to do that. > > I think Peter's point was that you might want to think about > generalizing your concepts so that other kinds of operator classes could > someday serve as the foundations for other kinds of partitioning rules. Let me see if I've understood: certain operator classes either describe or allow certain kinds of partitioning: hash is obvious, btree allows equality and range based approaches, gist allows users a whole range of possibilities. So, a truly extensible system would define the partitioning type in the catalog? That's an interesting idea. It presents problems, I think, for the grammar I've proposed because some grammatical constructs are tied to range, some to hash, some to list. Any insights into how we could achieve both? Thanks, Gavin PS: Heading off into the French country side for a little while and responses may be a little slow.
On Tue, 2008-01-15 at 17:12 +0000, Gregory Stark wrote: > > Wouldn't Segment Exclusion (maybe together with a specialized form of > > CLUSTERing) handle that case much better than partitioning? Without the need to > > name all those thousands of partitions and manage them manually. > > Firstly we have a philosophical difference here. I think crossing my fingers > and hoping for a performance boost from some low level infrastructure that I > can't measure or monitor is not better than managing things so I know > precisely what's going on and can see it for myself in the plan. The arguments put forward here apply equally to visibility maps of any kind, so should equally be applied to both Heikki's VM proposal and Itagaki's DSM approach. Both of those proposals rely on the idea that the writes to tables are either low or isolated to particular areas. If we do anything that relies upon dynamic state information then the results are, well...dynamic, i.e. they give a variable performance boost. I'm not sure why you think it could not be measured or monitored. Crossing fingers and hoping isn't really a fair comment, unless your workload varied dramatically from massively random one day to isolated the next day. Most workloads don't, including your example case. Sure, they won't help in all cases, but as I've said elsewhere we can construct dysfunctional test cases that will defeat other tuning measures such as HOT, TOAST and many other things. If we want to apply further tuning to Postgres then we should accept that not all tuning will apply to all cases. I've accepted that we should have declarative partitioning to cover many of the cases people have cited, but that doesn't mean a dynamic partitioning approach is worthless because they aren't mutually exclusive techniques. I think we need to see a design for how the internals of partitioning might work, if it differs from the existing table-level model. Only then can we see the downsides of both approaches. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com