Thread: Auto creation of Partitions
Hi, <br /><br />This is to get feedback to meet the following TODO:<br /><ul><li>Simplify ability to create partitioned tables<p> This would allow creation of partitioned tables without requiring creation of rules for INSERT/UPDATE/DELETE, andconstraints for rapid partition selection. Options could include range and hash partition selection. </ul><br />Therewas some discussion on the pgsql mailing lists, which lead to the above TODO: <br /><a href="http://archives.postgresql.org/pgsql-hackers/2006-09/msg00189.php" target="_blank">http://archives.postgresql.org/pgsql-hackers/2006-09/msg00189.php</a><br/><a href="http://archives.postgresql.org/pgsql-hackers/2006-08/msg01874.php" target="_blank">http://archives.postgresql.org/pgsql-hackers/2006-08/msg01874.php</a><br/><br />We can have the followingsyntax to support auto creation of partitions in Postgresql:<br /><br /> CREATE TABLE tabname ( <br /> ... <br/> ... <br /> ) PARTITION BY <br /> HASH(expr)<br /> | RANGE(expr)<br /> | LIST(expr)<br /> [PARTITIONS num_partitions]/* will apply to HASH only for now*/<br /> [PARTITION partition_name CHECK(...),<br /> PARTITION partition_nameCHECK(...) <br /> ...<br /> ];<br /><br />Here "expr" will be one of the column names as specified for themaster table. Once we finalize the syntax, the above statement would end up carrying out the following activities (disclaimer:we might add or remove some activities based on the discussion here). <br /><br /> i ) Create master table.<br/>ii) Create children tables based on the number of partitions specified and make them inherit from the master table.<br/>iii) Auto generate rules (or triggers?) using the checks mentioned for the partitions, to handle INSERTs/DELETEs/UPDATEsto navigate them to the appropriate child. Note that checks specified directly on the master tablewill get inherited automatically. <br />iv) Based on the PRIMARY, UNIQUE, REFERENCES information specified, pass iton to the children tables.<br />v) If possible add CHECK (false) to the master table to avoid any activity on it. <br /><br/> Some questions remain as to: <br /><br />1) Whether we should use triggers/rules for step number (iii) above. Mayberules is the way to go.<br />2) What other attributes (access permissions e.g.) of the master along with the ones specifiedin (iv) should be passed on to the children. <br />3) Some implementation specific issue e.g. whether SPI_executewould be a good way of creating these rules.<br /><br />Comments appreciated, <br />Regards,<br />Nikhils<br />EnterpriseDB <a href="http://www.enterprisedb.com" target="_blank">http://www.enterprisedb.com</a>
NikhilS wrote: > iv) Based on the PRIMARY, UNIQUE, REFERENCES information specified, > pass it on to the children tables. How will you maintain a primary key in such a table, considering that indexes can't span multiple tables? > 1) Whether we should use triggers/rules for step number (iii) above. > Maybe rules is the way to go. Since this would basically be a case of the updatable rules problem, you should review those discussions in the past to check whether the issues mentioned there don't interfere with that plan. > 2) What other attributes (access permissions e.g.) of the master > along with the ones specified in (iv) should be passed on to the > children. Moreover, how are later changes of those attributes propagated? -- Peter Eisentraut http://developer.postgresql.org/~petere/
Hi,
--
EnterpriseDB http://www.enterprisedb.com
On 3/6/07, Peter Eisentraut <peter_e@gmx.net> wrote:
We will not (I know its a hard thing to do :) ), the intention is to use this information from the parent and make it a property of the child table. This will avoid the step for the user having to manually specify CREATE INDEX and the likes on all the children tables one-by-one.
The rules mentioned here will be to specify that all the inserts/updates/deletes should go into proper children tables instead of the parent. I do not see the updateable rules problem with regards to this, but will check out the archives for discussion on this related to partitioning.
Once created, this will be a normal inheritance relationship between the tables and all the existing commands will apply to both the parent and the child.
The basic idea here is to automate as many things as possible at partition creation time. The user is free to make additional changes to the involved tables later too.
Regards,
Nikhils
NikhilS wrote:
> iv) Based on the PRIMARY, UNIQUE, REFERENCES information specified,
> pass it on to the children tables.
How will you maintain a primary key in such a table, considering that
indexes can't span multiple tables?
We will not (I know its a hard thing to do :) ), the intention is to use this information from the parent and make it a property of the child table. This will avoid the step for the user having to manually specify CREATE INDEX and the likes on all the children tables one-by-one.
> 1) Whether we should use triggers/rules for step number (iii) above.
> Maybe rules is the way to go.
Since this would basically be a case of the updatable rules problem, you
should review those discussions in the past to check whether the issues
mentioned there don't interfere with that plan.
The rules mentioned here will be to specify that all the inserts/updates/deletes should go into proper children tables instead of the parent. I do not see the updateable rules problem with regards to this, but will check out the archives for discussion on this related to partitioning.
> 2) What other attributes (access permissions e.g.) of the master
> along with the ones specified in (iv) should be passed on to the
> children.
Moreover, how are later changes of those attributes propagated?
Once created, this will be a normal inheritance relationship between the tables and all the existing commands will apply to both the parent and the child.
The basic idea here is to automate as many things as possible at partition creation time. The user is free to make additional changes to the involved tables later too.
Regards,
Nikhils
--
EnterpriseDB http://www.enterprisedb.com
"NikhilS" <nikkhils@gmail.com> writes: >the intention is to use this information from the parent and make it a >property of the child table. This will avoid the step for the user having to >manually specify CREATE INDEX and the likes on all the children tables >one-by-one. Missed the start of this thread. A while back I had intended to add WITH INDEXES to CREATE TABLE LIKE. That would let you create a tale LIKE parent WITH CONSTRAINTS WITH INDEXES and get pretty much a perfect table ready for adding to the inheritance structure. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
On 3/6/07, NikhilS <nikkhils@gmail.com> wrote:
And to add, maybe if there is consensus/demand for the WITH INDEXES idea mentioned above too, I could work on it as well.
Regards,
Nikhils
Hi,On 3/6/07, Gregory Stark < stark@enterprisedb.com> wrote:"NikhilS" <nikkhils@gmail.com> writes:
>the intention is to use this information from the parent and make it a
>property of the child table. This will avoid the step for the user having to
>manually specify CREATE INDEX and the likes on all the children tables
>one-by-one.
Missed the start of this thread. A while back I had intended to add WITH
INDEXES to CREATE TABLE LIKE. That would let you create a tale LIKE parent
WITH CONSTRAINTS WITH INDEXES and get pretty much a perfect table ready for
adding to the inheritance structure.
Yeah, this one aims to do pretty much the above as part of the auto creation of the inheritance-based partitions.
And to add, maybe if there is consensus/demand for the WITH INDEXES idea mentioned above too, I could work on it as well.
Regards,
Nikhils
--
EnterpriseDB http://www.enterprisedb.com
--
EnterpriseDB http://www.enterprisedb.com
Hi,
--
EnterpriseDB http://www.enterprisedb.com
On 3/6/07, Gregory Stark <stark@enterprisedb.com> wrote:
Yeah, this one aims to do pretty much the above as part of the auto creation of the inheritance-based partitions.
Regards,
Nikhils
"NikhilS" <nikkhils@gmail.com> writes:
>the intention is to use this information from the parent and make it a
>property of the child table. This will avoid the step for the user having to
>manually specify CREATE INDEX and the likes on all the children tables
>one-by-one.
Missed the start of this thread. A while back I had intended to add WITH
INDEXES to CREATE TABLE LIKE. That would let you create a tale LIKE parent
WITH CONSTRAINTS WITH INDEXES and get pretty much a perfect table ready for
adding to the inheritance structure.
Yeah, this one aims to do pretty much the above as part of the auto creation of the inheritance-based partitions.
Regards,
Nikhils
--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
--
EnterpriseDB http://www.enterprisedb.com
NikhilS wrote: > Hi, > > On 3/6/07, Peter Eisentraut <peter_e@gmx.net> wrote: >> >> NikhilS wrote: >> > iv) Based on the PRIMARY, UNIQUE, REFERENCES information specified, >> > pass it on to the children tables. >> >> How will you maintain a primary key in such a table, considering that >> indexes can't span multiple tables? > > > We will not (I know its a hard thing to do :) ), the intention is to use > this information from the parent and make it a property of the child table. > This will avoid the step for the user having to manually specify CREATE > INDEX and the likes on all the children tables one-by-one. I think a way can be devised to maintain the primary key and unique constraints. If a search is done on the parent table, the planner knows to rewrite the query as a union (or union all) of all child tables that relate to the where clause, or all child tables if the where clause is not on the column/s used to partition, then this concept should be able to be converted to indexes as well, so that when a primary or unique index from a child table is inserted to, then each of the related child indexes is consulted to ensure uniqueness. This would only apply if the partitioning was not done by the primary or unique column. >> 1) Whether we should use triggers/rules for step number (iii) above. >> > Maybe rules is the way to go. >> >> Since this would basically be a case of the updatable rules problem, you >> should review those discussions in the past to check whether the issues >> mentioned there don't interfere with that plan. > > The rules mentioned here will be to specify that all the > inserts/updates/deletes should go into proper children tables instead of > the > parent. I do not see the updateable rules problem with regards to this, but > will check out the archives for discussion on this related to partitioning. > I would think that a trigger would be a better choice as I see the need (or at least the possibility) for more than just a rewrite. When a value is inserted that is outside of a condition currently covered by an existing child table then a new child will need to be spawned to contain the new data. So say we partition by year and month of a date column? As new dates are added new child tables would be created each month. Or is this beyond the current plan and left to manual creation? Will ALTER TABLE be extended to handle partitions? This will allow partitioning existing tables (maybe without blocking access?) and allow things like ALTER TABLE mytable ADD PARTITION (mydate within 200703) and ALTER TABLE mytable DROP PARTITION (mydate within 199912) or would dropping be covered by DELETE FROM mytable where mydate <= 199912 ? Could such a syntax be devised for date columns? (month of mydate) or similar to auto create partitions based on the year and month of a date column? or will we just do CHECK(mydatecol >= 1/3/07 and mydatecol <= 31/3/07) for each month of data? Also (day of mydatecol) to partition based on year and day of year. Another syntax possibility - range(myserialcol of 500000) where new child tables are created every 500000 rows? Maybe I'm looking at auto-maintenance which is beyond any current planning? -- Shane Ambler pgSQL@Sheeky.Biz Get Sheeky @ http://Sheeky.Biz
peter_e@gmx.net (Peter Eisentraut) writes: > NikhilS wrote: >> iv) Based on the PRIMARY, UNIQUE, REFERENCES information specified, >> pass it on to the children tables. > > How will you maintain a primary key in such a table, considering that > indexes can't span multiple tables? On the one hand, I seem to recall seeing "multiple-table-spanning indices" being on the TODO list. On the other, it may be that this sort of partitioning is only usable for scenarios where it is only needful to maintain uniqueness on a partition-by-partition basis. >> 1) Whether we should use triggers/rules for step number (iii) above. >> Maybe rules is the way to go. > > Since this would basically be a case of the updatable rules problem, you > should review those discussions in the past to check whether the issues > mentioned there don't interfere with that plan. > >> 2) What other attributes (access permissions e.g.) of the master >> along with the ones specified in (iv) should be passed on to the >> children. > > Moreover, how are later changes of those attributes propagated? I hear rumour of there being a more comprehensive proposal on this in the works... -- (reverse (concatenate 'string "ofni.secnanifxunil" "@" "enworbbc")) http://linuxfinances.info/info/finances.html Rules of the Evil Overlord #189. "I will never tell the hero "Yes I was the one who did it, but you'll never be able to prove it to that incompetent old fool." Chances are, that incompetent old fool is standing behind the curtain." <http://www.eviloverlord.com/>
On Wed, Mar 07, 2007 at 02:43:39AM +1030, Shane Ambler wrote: > I think a way can be devised to maintain the primary key and unique > constraints. > If a search is done on the parent table, the planner knows to rewrite > the query as a union (or union all) of all child tables that relate to > the where clause, or all child tables if the where clause is not on the > column/s used to partition, then this concept should be able to be > converted to indexes as well, so that when a primary or unique index > from a child table is inserted to, then each of the related child > indexes is consulted to ensure uniqueness. But that's where it breaks down: you not only need to check that the row you're inserting is unique, you need to make sure that other people trying to insert the same value see it. After all, a unique index does contain multiple instances of any particular value, it's just that the DB has verified that none of these are visible to the same transaction. If there is a collision a lock is taken that waits until the colliding insert commits or rejects. With b-tree it's easy, every value can only appear in one place, so it's a very simple lock. The reason why GiST doesn't support unique indexes is due to the fact that any particular value could appear in many places, so to stop concurrent inserts you need to lock *every* page in the tree that another backend might want to insert the record. I fear this will also be a problem for indexes over multiple tables, you will need to lock the page in every index that some other DB might want to insert the value into. All while avoiding deadlocks. I think the problems are related. If you can solve it for multiple tables, you can solve it for GiST indexes also. > This would only apply if the partitioning was not done by the primary or > unique column. That's the easy case, no problem there. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Martijn van Oosterhout wrote: > On Wed, Mar 07, 2007 at 02:43:39AM +1030, Shane Ambler wrote: >> I think a way can be devised to maintain the primary key and unique >> constraints. >> If a search is done on the parent table, the planner knows to rewrite >> the query as a union (or union all) of all child tables that relate to >> the where clause, or all child tables if the where clause is not on the >> column/s used to partition, then this concept should be able to be >> converted to indexes as well, so that when a primary or unique index >> from a child table is inserted to, then each of the related child >> indexes is consulted to ensure uniqueness. > > But that's where it breaks down: you not only need to check that the > row you're inserting is unique, you need to make sure that other people > trying to insert the same value see it. This sounds like what is really needed is a way to lock a certain condition, namely the existance or non-existance of a record with certain values in certain fields. This would not only help this case, it would also help RI triggers, because those wouldn't have to acquire a share lock on the referenced rows anymore. As you pointed out, this would also make unique GiST indices possible No real idea how to do this, though :-( greetings, Florian Pfluge
NikhilS wrote: > We will not (I know its a hard thing to do :) ), the intention is to > use this information from the parent and make it a property of the > child table. This will avoid the step for the user having to manually > specify CREATE INDEX and the likes on all the children tables > one-by-one. But when I say CREATE TABLE ( a int PRIMARY KEY, ... ) PARTITION blah ... then I expect that the primary key will be enforced across all partitions. We currently sidestep that issue by not offering seemingly transparent partitioning. But if you are planning to offer that, the unique index issue needs to be solved, and I see nothing in your plan about that. -- Peter Eisentraut http://developer.postgresql.org/~petere/
Florian, > This sounds like what is really needed is a way to lock a certain > condition, namely the existance or non-existance of a record with > certain values in certain fields. This would not only help this case, > it would also help RI triggers, because those wouldn't have to acquire > a share lock on the referenced rows anymore. That's called "predicate locking" and it's very, very hard to do. -- Josh Berkus PostgreSQL @ Sun San Francisco
Peter Eisentraut <peter_e@gmx.net> writes: > But when I say > CREATE TABLE ( a int PRIMARY KEY, ... ) PARTITION blah ... > then I expect that the primary key will be enforced across all > partitions. We currently sidestep that issue by not offering seemingly > transparent partitioning. But if you are planning to offer that, the > unique index issue needs to be solved, and I see nothing in your plan > about that. Agreed, it needs to Just Work. I think it'd still be useful though if we only support auto-partitioning on the primary key, and that restriction avoids the indexing problem. regards, tom lane
> Agreed, it needs to Just Work. I think it'd still be useful though > if we only support auto-partitioning on the primary key, and that > restriction avoids the indexing problem. +1 -- Josh Berkus PostgreSQL @ Sun San Francisco
Tom Lane wrote: > Peter Eisentraut <peter_e@gmx.net> writes: > >> But when I say >> CREATE TABLE ( a int PRIMARY KEY, ... ) PARTITION blah ... >> then I expect that the primary key will be enforced across all >> partitions. We currently sidestep that issue by not offering seemingly >> transparent partitioning. But if you are planning to offer that, the >> unique index issue needs to be solved, and I see nothing in your plan >> about that. >> > > Agreed, it needs to Just Work. I think it'd still be useful though > if we only support auto-partitioning on the primary key, and that > restriction avoids the indexing problem. > > Maybe. The most obvious use for automatic partitioning that I can think of would be based in the value of a timestamptz field rather than any PK. Of course I tend to work more in the OLTP field than in DW type apps, where other considerations might apply. cheers andrew
>-----Original Message----- >From: pgsql-hackers-owner@postgresql.org >[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Josh Berkus >Sent: dinsdag 6 maart 2007 19:45 >To: pgsql-hackers@postgresql.org >Cc: Florian G. Pflug; Martijn van Oosterhout; Shane Ambler; >NikhilS; Peter Eisentraut >Subject: Re: [HACKERS] Auto creation of Partitions > >Florian, > >> This sounds like what is really needed is a way to lock a certain >> condition, namely the existance or non-existance of a record with >> certain values in certain fields. This would not only help >this case, >> it would also help RI triggers, because those wouldn't have >to acquire >> a share lock on the referenced rows anymore. > >That's called "predicate locking" and it's very, very hard to do. That's definitely not needed. Rather something good such that we can finally enforce RI ourselves in the general case. This is currently not possible to do easily, except in C code. This means we need to look at all the rows that exists, but are normally be invisible to our view of the database. Still I'm not sure about all cases, as the MVCC model is quite tricky and I'm not sure whether my idea's about it are valid. The basic idea is that you need to guarentee the constraint for the 'single underlaying model' (with everything visible) and for your view (under your visibility rules). I believe, but are not certain, that under these conditions any (valid) snapshot will obey the desired constraints. - Joris Dobbelsteen
Hi Shane,
Many of your suggestions are useful, but auto-maintenance will be beyond the current plan.
Regards,
Nikhils
EnterpriseDB http://www.enterprisedb.com
Maybe I'm looking at auto-maintenance which is beyond any current planning?
Many of your suggestions are useful, but auto-maintenance will be beyond the current plan.
Regards,
Nikhils
EnterpriseDB http://www.enterprisedb.com
Hi,
On 3/7/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Peter Eisentraut <peter_e@gmx.net> writes:
> But when I say
> CREATE TABLE ( a int PRIMARY KEY, ... ) PARTITION blah ...
> then I expect that the primary key will be enforced across all
> partitions. We currently sidestep that issue by not offering seemingly
> transparent partitioning. But if you are planning to offer that, the
> unique index issue needs to be solved, and I see nothing in your plan
> about that.
Agreed, it needs to Just Work. I think it'd still be useful though
if we only support auto-partitioning on the primary key, and that
restriction avoids the indexing problem.
regards, tom lane
Sure, but as Chris mentioned earlier, wouldn't it be useful to maintain uniqueness on a
partition-by-partition basis too?
Regards,
Nikhils
--
EnterpriseDB http://www.enterprisedb.com
Andrew Dunstan wrote: > Tom Lane wrote: >> Peter Eisentraut <peter_e@gmx.net> writes: >> >>> But when I say >>> CREATE TABLE ( a int PRIMARY KEY, ... ) PARTITION blah ... >>> then I expect that the primary key will be enforced across all >>> partitions. We currently sidestep that issue by not offering >>> seemingly transparent partitioning. But if you are planning to offer >>> that, the unique index issue needs to be solved, and I see nothing in >>> your plan about that. >>> >> >> Agreed, it needs to Just Work. I think it'd still be useful though >> if we only support auto-partitioning on the primary key, and that >> restriction avoids the indexing problem. >> >> > > Maybe. The most obvious use for automatic partitioning that I can think > of would be based in the value of a timestamptz field rather than any > PK. Of course I tend to work more in the OLTP field than in DW type > apps, where other considerations might apply. I second that - partitioning on some kind of timestamp field is a common usecase here too ... Stefan
> >> >> Maybe. The most obvious use for automatic partitioning that I can >> think of would be based in the value of a timestamptz field rather >> than any PK. Of course I tend to work more in the OLTP field than in >> DW type apps, where other considerations might apply. > > I second that - partitioning on some kind of timestamp field is a > common usecase here too ... Partitioning period needs to work. It doesn't matter what the user chooses as their partition key. Timestamp is an obvious choice but there are others such as serial where you just partition every million rows (for example) to keep things manageable. Joshua D. Drake > > > Stefan > > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly >
Am Mittwoch, 7. März 2007 07:44 schrieb NikhilS: > Sure, but as Chris mentioned earlier, wouldn't it be useful to maintain > uniqueness on a > partition-by-partition basis too? Many things might be useful, but the aim of the "table partitioning" venture is believed to be the provision of a transparent interface to the existing do-it-yourself partitioning facilities. Therefore, the logical definition of a table must continue to work unchanged (or alternatively, the use of the feature must be prohibited if that cannot be guaranteed in a particular case). Other features such as uniqueness on a partition basis might also be useful but you can do that today and you don't need partitioning for that. -- Peter Eisentraut http://developer.postgresql.org/~petere/
On Tue, 2007-03-06 at 18:31 +0530, NikhilS wrote: > On 3/6/07, Peter Eisentraut <peter_e@gmx.net> wrote: > NikhilS wrote: > > iv) Based on the PRIMARY, UNIQUE, REFERENCES information > specified, > > pass it on to the children tables. > > How will you maintain a primary key in such a table, > considering that > indexes can't span multiple tables? > > We will not (I know its a hard thing to do :) ), the intention is to > use this information from the parent and make it a property of the > child table. This will avoid the step for the user having to manually > specify CREATE INDEX and the likes on all the children tables > one-by-one. If you know that the constraints on each of the tables is distinct, then building a UNIQUE index on each of the partitions is sufficient to prove that all rows in the combined partitioned table are distinct also. The hard part there is checking that the partition constraints are distinct. If the partition constraints are added one at a time, you can use the predicate testing logic to compare the to-be-added partition's constraint against each of the already added constraints. That becomes an O(N) problem. What is really needed is a data structure that allows range partitions to be accessed more efficiently. This could make adding partitions and deciding in which partition a specific value goes an O(logN) operation. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
Hi,
If the unique constraint is supposed to be on a column which is NOT being used for the partitioning, then all the above becomes much more difficult.
While partitioning, the additional onus on the user is to specify non-conflicting CHECKs for the range/list partitions.
Regards,
Nikhils
--
EnterpriseDB http://www.enterprisedb.com
If you know that the constraints on each of the tables is distinct, then
building a UNIQUE index on each of the partitions is sufficient to prove
that all rows in the combined partitioned table are distinct also.
The hard part there is checking that the partition constraints are
distinct. If the partition constraints are added one at a time, you can
use the predicate testing logic to compare the to-be-added partition's
constraint against each of the already added constraints. That becomes
an O(N) problem.
What is really needed is a data structure that allows range partitions
to be accessed more efficiently. This could make adding partitions and
deciding in which partition a specific value goes an O(logN) operation.
If the unique constraint is supposed to be on a column which is NOT being used for the partitioning, then all the above becomes much more difficult.
While partitioning, the additional onus on the user is to specify non-conflicting CHECKs for the range/list partitions.
Regards,
Nikhils
--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com
--
EnterpriseDB http://www.enterprisedb.com
> > iv) Based on the PRIMARY, UNIQUE, REFERENCES information specified, > > pass it on to the children tables. > > How will you maintain a primary key in such a table, > considering that indexes can't span multiple tables? Many partitioning schemes have (or can be made to have) only one possible target partition for the primary key. Thus if you create separate unique indexes on each partition the problem is solved. For a first version I opt, that it is sufficient to disallow creation of a unique index on the master, when the constraints (on columns of this index) do not nail down a specific partition for each row (e.g. a hash or a range on one of the index columns that does not overlap). Later, global index, or indexes with separate partitioning rules can be implemented, that cover the other cases. Andreas
Hi,
Yes, I agree. For version 1, UNIQUE/PRIMARY indexes will cascade down to the child table, only if the indexed column is present as part of the partitioning rule.
Regards,
Nikhils
--
EnterpriseDB http://www.enterprisedb.com
On 3/7/07, Zeugswetter Andreas ADI SD <ZeugswetterA@spardat.at> wrote:
> > iv) Based on the PRIMARY, UNIQUE, REFERENCES information specified,
> > pass it on to the children tables.
>
> How will you maintain a primary key in such a table,
> considering that indexes can't span multiple tables?
Many partitioning schemes have (or can be made to have) only one
possible target partition for the primary key. Thus if you create
separate unique indexes on each partition the problem is solved.
For a first version I opt, that it is sufficient to disallow creation of
a unique index on the master, when the constraints (on columns of this
index) do not nail down a specific partition for each row (e.g. a hash
or a range on one of the index columns that does not overlap).
Later, global index, or indexes with separate partitioning rules can be
implemented, that cover the other cases.
Andreas
Yes, I agree. For version 1, UNIQUE/PRIMARY indexes will cascade down to the child table, only if the indexed column is present as part of the partitioning rule.
Regards,
Nikhils
--
EnterpriseDB http://www.enterprisedb.com
I am wondering if we can implement unique indexes across several tables (inheritance hierarchy) not by using a single, big index covering all the tables, but rather by inserting a dummy entry into each partition's unique index. This dummy entry would have an expanded CTID which would include the tableoid, so it's possible to check it (albeit there is a problem in that we may require the opening of another heap to do the actual checking). These dummy entries could be removed by bulkcleanup as soon as the inserting transaction is no longer running, to avoid bloating the index too much. All said dummy index entries would be located at either the rightmost or the leftmost leaf, or close to it, so another idea is to have future inserters reuse the entry for a different key. The obvious problem with this is, naturally, the excess I/O that extra index traversing causes. The not so obvious ones are locking, deadlocking and the opening of other heaps and indexes while you do the insertion, which may be too expensive. On the other hand, maybe this idea is easier to implement than full-fledged cross-table indexes, so we could have richer partitioning earlier than when somebody finally bites the bullet and implements cross-table indexes. Or maybe this is just a dumb idea, but I had to let it out anyway :-) -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Simon, On 3/7/07 5:26 AM, "Simon Riggs" <simon@2ndquadrant.com> wrote: > What is really needed is a data structure that allows range partitions > to be accessed more efficiently. This could make adding partitions and > deciding in which partition a specific value goes an O(logN) operation. I think we need to re-evaluate the inheritance mechanism for partitioning and implement something much closer to the storage layer, similar to Oracle. Having the constraint exclusion occur in the planner is not flexible enough to allow more advanced solutions. - Luke
On Wed, 2007-03-07 at 10:23 -0500, Luke Lonergan wrote: > Simon, > > On 3/7/07 5:26 AM, "Simon Riggs" <simon@2ndquadrant.com> wrote: > > > What is really needed is a data structure that allows range partitions > > to be accessed more efficiently. This could make adding partitions and > > deciding in which partition a specific value goes an O(logN) operation. > > I think we need to re-evaluate the inheritance mechanism for partitioning > and implement something much closer to the storage layer, similar to Oracle. Oracle's implementation is fairly close to ours, actually. Each partition is a segment. SQLServer's is fairly low level. > Having the constraint exclusion occur in the planner is not flexible enough > to allow more advanced solutions. It depends what those advanced solutions are. I definitely want to be able to do run-time exclusion, push down merge joins and parallelism, but I also think that being able to index only the first 3 partitions is a useful feature too. ISTM that making the Append node responsible for exclusion might be a way to go with this, but various ways are possible, I'm sure. The trick is to find one that does everything you need and that will take some deeper analysis. However you do it, you'll still need a way of deciding which partitions to include/exclude that doesn't involve a sequential scan of all partition constraints. So my comments about a different index structure are still going to be relevant, wherever that lives/executes. I'm not doing anything in this area personally for 8.3 though. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
> > What is really needed is a data structure that allows range partitions > > to be accessed more efficiently. This could make adding partitions and > > deciding in which partition a specific value goes an O(logN) operation. > > I think we need to re-evaluate the inheritance mechanism for > partitioning and implement something much closer to the > storage layer, similar to Oracle. > > Having the constraint exclusion occur in the planner is not > flexible enough to allow more advanced solutions. Whoa, do you have anything to back that up ? You would need to elaborate what you actually mean, but I think it is moot. Sure, the constraint technique can be further extended (e.g. during runtime), but imho the approach is very good. Andreas
>-----Original Message----- >From: pgsql-hackers-owner@postgresql.org >[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Alvaro Herrera >Sent: woensdag 7 maart 2007 15:59 >To: NikhilS >Cc: Zeugswetter Andreas ADI SD; Peter Eisentraut; >pgsql-hackers@postgresql.org >Subject: Re: [HACKERS] Auto creation of Partitions > >I am wondering if we can implement unique indexes across >several tables (inheritance hierarchy) not by using a single, >big index covering all the tables, but rather by inserting a >dummy entry into each partition's unique index. I think that one big index is much better in this case. You are already replicating the data and need a lot of work when updating the partition structure (e.g. removing one). [snip] Rather: If we have the partition relations (aka partition), then we can define a unique index on it. This guarentees that there is at most one tuple with the same value (for the specified columns) for every individual partition. Now for the inserts. We already guarentee that is unique within the partition it lives. So we must insert it first. Next its to check the other partitions (in order!) for existence of a row with a similar context. Of course we require full visibility of the data. We do the insert first, as this ensures other will find it. The order is also important, otherwise there can be situations where we can't guarentee the constraint. Updates are exactly the same. Deletes are trivial. Perhaps you can do it without an index, but this is nothing different than for a normal table. - Joris Dobbelsteen
Andreas, On 3/7/07 11:45 AM, "Zeugswetter Andreas ADI SD" <ZeugswetterA@spardat.at> wrote: > Whoa, do you have anything to back that up ? Sure - when we start to consider designs that implement advanced data management features, we run into problems with the architecture of "tables->tables->tables...". Here are some examples: 1 - people think of partitions as a logical building block for tables, they would like to move partitions around underneath a table without the table definition being involved. In the current implementation, there are explicit linkages between the table definition and the child tables - imagine an "ALTER TABLE foo_parent ADD COLUMN" and how it would need to cascade to 1,000 child tables and you get the beginning of it - this connection should not exist. 2 - INSERT/UPDATE/DELETE processing through the SQL rewrite layer (rules) is terribly slow and gets slower as you add more partitions. If done closer to the storage layer, this can be done in ways that use access methods shared with other storage entities, e.g. Indices, and the code path would flow more naturally. 3 - Parallel query can be accomplished more easily by separating scans across relations split among tablespaces. This is more natural than trying to parallelize APPEND nodes within existing plans > You would need to elaborate what you actually mean, but I think it is > moot. > Sure, the constraint technique can be further extended (e.g. during > runtime), but imho the approach is very good. Well, it's being used and that's good, but it needs to be better IMO and I think that before we go too far down the current path we should consider the alternatives more carefully. - Luke
On Mar 6, 2007, at 4:57 AM, NikhilS wrote: > iv) Based on the PRIMARY, UNIQUE, REFERENCES information specified, > pass it on to the children tables. Since we want to eventually support 'global' indexes, I think we need to be really careful here. There's actually 3 possible cases: 1) Index* should be global (spanning multiple tables) 2) Index* should be inherited by all partitions as they're created 3) Index* should exist only on the parent table * Note that there's really no reason this has to be limit to indexes; it could certainly apply to constraints, or even triggers. IIRC, Oracle has a special syntax for global indexes; any other index defined on a partitioned table is picked up *by newly created partitions*. If you want to add indexes to existing partitions, you have to explicitly add it to each partition. I'd like to eventually see us supporting all 3 options. I'm not sure how much we want to clutter the grammar, though. -- Jim Nasby jim@nasby.net EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
On Mar 6, 2007, at 9:13 AM, Shane Ambler wrote: > NikhilS wrote: >> On 3/6/07, Peter Eisentraut <peter_e@gmx.net> wrote: >>> 1) Whether we should use triggers/rules for step number (iii) above. >>> > Maybe rules is the way to go. >>> >>> Since this would basically be a case of the updatable rules >>> problem, you >>> should review those discussions in the past to check whether the >>> issues >>> mentioned there don't interfere with that plan. >> The rules mentioned here will be to specify that all the >> inserts/updates/deletes should go into proper children tables >> instead of the >> parent. I do not see the updateable rules problem with regards to >> this, but >> will check out the archives for discussion on this related to >> partitioning. > > I would think that a trigger would be a better choice as I see the > need (or at least the possibility) for more than just a rewrite. > When a value is inserted that is outside of a condition currently > covered by an existing child table then a new child will need to be > spawned to contain the new data. There's no reason a new child has to be spawned, and I don't think that having a DML statement 'automagically' generating DDL is such a hot idea, either. Also, there's nothing inherently wrong with having an 'overflow partition' (to use Oracle syntax) that holds values that don't fall in the range of any other tables. The obvious place to do that with our partitioning is in the parent table. There are 2 other reasons to favor triggers though: 1) People (Josh Drake comes to mind) have found that if you get over a tiny number of partitions, the performance of rules is abysmal. 2) I believe it should be possible to construct an update trigger that allows you to perform updates that will place the row in question into a new partition. While I can see cases for simply disallowing updates to the partitioning key, I think there are also times when being able to do that would be very useful. > Will ALTER TABLE be extended to handle partitions? This will allow > partitioning existing tables (maybe without blocking access?) and > allow things like ALTER TABLE mytable ADD PARTITION (mydate within > 200703) > and ALTER TABLE mytable DROP PARTITION (mydate within 199912) or > would dropping be covered by DELETE FROM mytable where mydate <= > 199912 ? I think it'd be great to make adding and removing partitions as simple as ALTER TABLE. I don't think that DELETE should be the mechanism to drop a partition, though. Again, DML statements shouldn't be performing DDL. > Could such a syntax be devised for date columns? (month of mydate) > or similar to auto create partitions based on the year and month of > a date column? or will we just do CHECK(mydatecol >= 1/3/07 and > mydatecol <= 31/3/07) for each month of data? Also (day of > mydatecol) to partition based on year and day of year. > > Another syntax possibility - range(myserialcol of 500000) where new > child tables are created every 500000 rows? > > Maybe I'm looking at auto-maintenance which is beyond any current > planning? I don't think it's worth it to burden the database with auto-creating time partitions; it's easy enough to setup a cron job to handle it. It might be more useful to have the database handle things like partitioning on a SERIAL column, though I agree with Nikhils that this should wait. Does any other database support 'automagic' partition creation? I know Oracle 9i doesn't... not sure about 10g or DB2 or MSSQL... -- Jim Nasby jim@nasby.net EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
On Mar 7, 2007, at 3:26 AM, Simon Riggs wrote: > If you know that the constraints on each of the tables is distinct, > then > building a UNIQUE index on each of the partitions is sufficient to > prove > that all rows in the combined partitioned table are distinct also. > > The hard part there is checking that the partition constraints are > distinct. If the partition constraints are added one at a time, you > can > use the predicate testing logic to compare the to-be-added partition's > constraint against each of the already added constraints. That becomes > an O(N) problem. > > What is really needed is a data structure that allows range partitions > to be accessed more efficiently. This could make adding partitions and > deciding in which partition a specific value goes an O(logN) > operation. Directing data to child tables with triggers pretty much necessitates having some way to codify what partition a particular row belongs in. IE: for partitioning by month, you'll see things like naming the partition tables "parent_table_name_$YEAR_$MONTH", so the 'partitioning function' takes a date or timestamp and then returns what partition it belongs to. Perhaps there is some way to use that mapping to drive the selection of what partitions could contain a given value? One possibility would be to require 3 functions for a partitioned table: one accepts the partitioning key and tells you what partition it's in, one that tells you what the minimum partitioning key for a partition would be, and one that tells you what the maximum would be. If the user supplied those 3 functions, I think it would be possibly to automatically generate code for the triggers and check constraints. The min/max partition key functions might allow you to more efficiently do partition elimination, too. -- Jim Nasby jim@nasby.net EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
Hi,
Maybe, the dummy entry could be extended to contain the bounds (max/min) for each of the other involved partitions and they could be updated each time a DML happens across the partitions. That ways, an update to a particular partition needs to lock out the others, examine the dummy entries in its own index and follow it up with dummy entries update into other partitions if the need be.
Ofcourse as you have mentioned all of this so needs to be done after a careful think on the locking/deadlocking etc issues.
Regards,
Nikhils
--
EnterpriseDB http://www.enterprisedb.com
Maybe, the dummy entry could be extended to contain the bounds (max/min) for each of the other involved partitions and they could be updated each time a DML happens across the partitions. That ways, an update to a particular partition needs to lock out the others, examine the dummy entries in its own index and follow it up with dummy entries update into other partitions if the need be.
Ofcourse as you have mentioned all of this so needs to be done after a careful think on the locking/deadlocking etc issues.
Regards,
Nikhils
On 3/7/07, Alvaro Herrera <alvherre@commandprompt.com> wrote:
I am wondering if we can implement unique indexes across several tables
(inheritance hierarchy) not by using a single, big index covering all
the tables, but rather by inserting a dummy entry into each partition's
unique index. This dummy entry would have an expanded CTID which would
include the tableoid, so it's possible to check it (albeit there is a
problem in that we may require the opening of another heap to do the
actual checking). These dummy entries could be removed by bulkcleanup
as soon as the inserting transaction is no longer running, to avoid
bloating the index too much. All said dummy index entries would be
located at either the rightmost or the leftmost leaf, or close to it, so
another idea is to have future inserters reuse the entry for a different
key.
The obvious problem with this is, naturally, the excess I/O that extra
index traversing causes. The not so obvious ones are locking,
deadlocking and the opening of other heaps and indexes while you do the
insertion, which may be too expensive. On the other hand, maybe this
idea is easier to implement than full-fledged cross-table indexes, so we
could have richer partitioning earlier than when somebody finally bites
the bullet and implements cross-table indexes.
Or maybe this is just a dumb idea, but I had to let it out anyway :-)
--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.
--
EnterpriseDB http://www.enterprisedb.com
Hi,
The consensus seems to be veering towards triggers.
Since partition is inheritance-based, a simple DROP or "NO INHERIT" will do the job to deal with the partition. Do we want to reinvent additional syntax when these are around and are documented?
Regards,
Nikhils
--
EnterpriseDB http://www.enterprisedb.com
There are 2 other reasons to favor triggers though:
1) People (Josh Drake comes to mind) have found that if you get over
a tiny number of partitions, the performance of rules is abysmal.
2) I believe it should be possible to construct an update trigger
that allows you to perform updates that will place the row in
question into a new partition. While I can see cases for simply
disallowing updates to the partitioning key, I think there are also
times when being able to do that would be very useful.
The consensus seems to be veering towards triggers.
I think it'd be great to make adding and removing partitions as
simple as ALTER TABLE. I don't think that DELETE should be the
mechanism to drop a partition, though. Again, DML statements
shouldn't be performing DDL.
Since partition is inheritance-based, a simple DROP or "NO INHERIT" will do the job to deal with the partition. Do we want to reinvent additional syntax when these are around and are documented?
Regards,
Nikhils
--
EnterpriseDB http://www.enterprisedb.com
On Wed, 2007-03-07 at 22:32 -0500, Luke Lonergan wrote: > Andreas, > > On 3/7/07 11:45 AM, "Zeugswetter Andreas ADI SD" <ZeugswetterA@spardat.at> > wrote: > > > Whoa, do you have anything to back that up ? > > Sure - when we start to consider designs that implement advanced data > management features, we run into problems with the architecture of > "tables->tables->tables...". Here are some examples: > 1 - people think of partitions as a logical building block for tables, they > would like to move partitions around underneath a table without the table > definition being involved. In the current implementation, there are > explicit linkages between the table definition and the child tables - > imagine an "ALTER TABLE foo_parent ADD COLUMN" and how it would need to > cascade to 1,000 child tables and you get the beginning of it - this > connection should not exist. The inheritance can work at multiple levels, so its up to you how you manage things. If you want to add a column to only the newest tables, you just add a new mid-level table, add the new column only to that and then make all new partitions inherit from that table rather than the main table. So I don't see the objection here, I see a benefit. > 2 - INSERT/UPDATE/DELETE processing through the SQL rewrite layer (rules) is > terribly slow and gets slower as you add more partitions. If done closer to > the storage layer, this can be done in ways that use access methods shared > with other storage entities, e.g. Indices, and the code path would flow more > naturally. That bit is just syntactic sugar and unnecessary, for most applications, in my experience - especially because it doesn't work with COPY. People do seem to want it, so as I said upthread, we need a way of speeding up the selection of the appropriate partition, so we can get this to work for individual INSERTs. This needs substantial improvement, but I don't see this challenging the partition == table assumption. > 3 - Parallel query can be accomplished more easily by separating scans > across relations split among tablespaces. This is more natural than trying > to parallelize APPEND nodes within existing plans Tables-> Partitions -> Tablespaces Changing Partitions from Tables to SomethingElse won't increase the current capability to define the disk layout for concurrent I/O. Parallel Query needs to be planner-aware, so if we change partitions from being tables, then we'd need to reintroduce them to the planner. APPEND is clearly not the only thing that's needed for parallel query. Note to Nikhil: Make sure the new syntax doesn't prevent partitions from being placed upon multiple tablespaces in some manner, at CREATE TABLE time. > > You would need to elaborate what you actually mean, but I think it is > > moot. > > Sure, the constraint technique can be further extended (e.g. during > > runtime), but imho the approach is very good. > > Well, it's being used and that's good, but it needs to be better IMO and I > think that before we go too far down the current path we should consider the > alternatives more carefully. I'm happy to reconsider things, but we need to do that with some clear analysis of what doesn't work yet and how best to implement that. None of what's been mentioned requires us to reconsider the Partition == Table assumption. There were other ways considered, but they didn't win out in the analysis, for documented reasons. If there are benefits to having partitions act like tables, then maybe we can make them behave differently in just those circumstances. Pack animals behave differently in a group, so why not tables, if need be? But when is that exactly? There is a ton of work to make partitioning the elegant beast we'd like it to be, but that seems like extension only, not rip and replace. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
On Wed, 2007-03-07 at 21:27 -0700, Jim Nasby wrote: > On Mar 7, 2007, at 3:26 AM, Simon Riggs wrote: > > If you know that the constraints on each of the tables is distinct, > > then > > building a UNIQUE index on each of the partitions is sufficient to > > prove > > that all rows in the combined partitioned table are distinct also. > > > > The hard part there is checking that the partition constraints are > > distinct. If the partition constraints are added one at a time, you > > can > > use the predicate testing logic to compare the to-be-added partition's > > constraint against each of the already added constraints. That becomes > > an O(N) problem. > > > > What is really needed is a data structure that allows range partitions > > to be accessed more efficiently. This could make adding partitions and > > deciding in which partition a specific value goes an O(logN) > > operation. > > Directing data to child tables with triggers pretty much necessitates > having some way to codify what partition a particular row belongs in. > IE: for partitioning by month, you'll see things like naming the > partition tables "parent_table_name_$YEAR_$MONTH", so the > 'partitioning function' takes a date or timestamp and then returns > what partition it belongs to. Perhaps there is some way to use that > mapping to drive the selection of what partitions could contain a > given value? > > One possibility would be to require 3 functions for a partitioned > table: one accepts the partitioning key and tells you what partition > it's in, one that tells you what the minimum partitioning key for a > partition would be, and one that tells you what the maximum would be. > If the user supplied those 3 functions, I think it would be possibly > to automatically generate code for the triggers and check > constraints. The min/max partition key functions might allow you to > more efficiently do partition elimination, too. ISTM this is a good idea. SQLServer uses partitioning functions and I like that approach. It makes it much easier to do partition-wise joins between tables that share partitioning functions. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
"Jim Nasby" <decibel@decibel.org> writes: > One possibility would be to require 3 functions for a partitioned table: one > accepts the partitioning key and tells you what partition it's in, one that > tells you what the minimum partitioning key for a partition would be, and one > that tells you what the maximum would be. If the user supplied those 3 > functions, I think it would be possibly to automatically generate code for the > triggers and check constraints. The min/max partition key functions might > allow you to more efficiently do partition elimination, too. But then it would be harder to tell whether a clause implied a given partition. That is, if you have a partition constraint of "col OP const" then we can test whether a query clause of "col OP2 const2" implies that constraint when planning (or actually whether it implies it's false to exclude the partition). If you have a constraint like "P1(const)" it'll be pretty hard to do much with that. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
> > The hard part there is checking that the partition constraints are > > distinct. If the partition constraints are added one at a time, you > > can use the predicate testing logic to compare the to-be-added > > partition's constraint against each of the already added constraints. > > That becomes an O(N) problem. Yes, we could preevaluate that check (per index) in the DDL phase and keep the info in a flag. Also interesting info is if there is an order the partitions can be read in to satisfy a particular order by. > > What is really needed is a data structure that allows range partitions > > to be accessed more efficiently. This could make adding partitions and > > deciding in which partition a specific value goes an O(logN) > > operation. I do not really see a problem with O(N) since typical N currently range from 10 to 200. N = 1000 is already good for a >= 10 TB table. If a 10 GB partition were too large we should imho invest more in the advanced indexing methods that are currently beeing developed. > Directing data to child tables with triggers pretty much > necessitates having some way to codify what partition a > particular row belongs in. > IE: for partitioning by month, you'll see things like naming > the partition tables "parent_table_name_$YEAR_$MONTH", so the > 'partitioning function' takes a date or timestamp and then > returns what partition it belongs to. Perhaps there is some > way to use that mapping to drive the selection of what > partitions could contain a given value? You put it in the first partition that has matching constraints. > One possibility would be to require 3 functions for a partitioned > table: one accepts the partitioning key and tells you what > partition it's in, one that tells you what the minimum > partitioning key for a partition would be, and one that tells > you what the maximum would be. > If the user supplied those 3 functions, I think it would be > possibly to automatically generate code for the triggers and > check constraints. The min/max partition key functions might > allow you to more efficiently do partition elimination, too. I can see this as a good optional addition, but it can only be optional else it would pretty much limit the methods that can be used for partitioning. e.g. hash, modulo do not have a min,max per partition. Andreas
On Thu, Mar 08, 2007 at 01:52:27PM +0530, NikhilS wrote: > >I think it'd be great to make adding and removing partitions as > >simple as ALTER TABLE. I don't think that DELETE should be the > >mechanism to drop a partition, though. Again, DML statements > >shouldn't be performing DDL. > > > Since partition is inheritance-based, a simple DROP or "NO INHERIT" will do > the job to deal with the partition. Do we want to reinvent additional syntax > when these are around and are documented? Well, if the syntax for adding a new partition eventually ends up as ALTER TABLE ADD PARTITION, then it would make more sense that you remove a partition via ALTER TABLE DROP PARTITION. -- Jim Nasby jim@nasby.net EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
On Thu, Mar 08, 2007 at 10:12:30AM +0000, Gregory Stark wrote: > "Jim Nasby" <decibel@decibel.org> writes: > > > One possibility would be to require 3 functions for a partitioned table: one > > accepts the partitioning key and tells you what partition it's in, one that > > tells you what the minimum partitioning key for a partition would be, and one > > that tells you what the maximum would be. If the user supplied those 3 > > functions, I think it would be possibly to automatically generate code for the > > triggers and check constraints. The min/max partition key functions might > > allow you to more efficiently do partition elimination, too. > > But then it would be harder to tell whether a clause implied a given > partition. That is, if you have a partition constraint of "col OP const" then > we can test whether a query clause of "col OP2 const2" implies that constraint > when planning (or actually whether it implies it's false to exclude the > partition). If you have a constraint like "P1(const)" it'll be pretty hard to > do much with that. Well, you could tell what partition 'const' was in; I would think that plus knowledge about OP2 would allow you to decide what partitions you need to look at. There's also nothing to prevent us from also adding the constraints and using constraint exclusion as well. In fact, I think we'd want to have the constraints just so we know that a given partition only contains the data we want it to. -- Jim Nasby jim@nasby.net EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
One other thought... a lot of this discussion seems to parallel the debate from a while ago about whether SERIAL should act like a macro (ie: it just sets everything up and users are free to monkey under the hood afterwards), or whether it should be it's own 'closed-box' construct. Currently, we seem to be leaning towards partition management being a 'macro', with child tables very exposed, etc. I don't know if that's good or bad, but it's probably worth some thought. -- Jim Nasby jim@nasby.net EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
> Note to Nikhil: Make sure the new syntax doesn't prevent partitions from > being placed upon multiple tablespaces in some manner, at CREATE TABLE > time. What if the syntax was something like - CREATE TABLE tabname ( ... ... ) PARTITION BY HASH(expr) | RANGE(expr) | LIST(expr) [PARTITIONS num_partitions] /* will apply to HASH only for now*/ [PARTITION partition_name CHECK(...) [USING TABLESPACE tblspcname], PARTITION partition_name CHECK(...) [USING TABLESPACEtblspcname] ... ]; And (if we use the ALTER TABLE to add partitions) ALTER TABLE tabname ADD PARTITION partition_name CHECK(...) [USING TABLESPACE tblspcname]; Of course ALTER TABLE childtable SET TABLESPACE tblspcname; should not cause any probs. -- Shane Ambler pgSQL@Sheeky.Biz Get Sheeky @ http://Sheeky.Biz
Jim C. Nasby wrote: > On Thu, Mar 08, 2007 at 01:52:27PM +0530, NikhilS wrote: >>> I think it'd be great to make adding and removing partitions as >>> simple as ALTER TABLE. I don't think that DELETE should be the >>> mechanism to drop a partition, though. Again, DML statements >>> shouldn't be performing DDL. >> >> Since partition is inheritance-based, a simple DROP or "NO INHERIT" will do >> the job to deal with the partition. Do we want to reinvent additional syntax >> when these are around and are documented? > > Well, if the syntax for adding a new partition eventually ends up as > ALTER TABLE ADD PARTITION, then it would make more sense that you remove > a partition via ALTER TABLE DROP PARTITION. This follows on from the suggestion I made - taken along the lines of the subject "auto creation of partitions" where I suggested the syntax of partition check(month of mydatecol) and have a new partition created as data was entered. With this scenario dropping the partition when it was empty would complement the creation of a new partition as needed. Given that there seems to be no real support of going with "auto maintenance" were new partitions are added as needed, then the auto dropping of empty partitions would also not apply. Leaving us with only specific add partition / drop partition commands. And have the parent table pick up rows not matching any partition check criteria. -- Shane Ambler pgSQL@Sheeky.Biz Get Sheeky @ http://Sheeky.Biz
Hi,
Regards,
Nikhils
--
EnterpriseDB http://www.enterprisedb.com
On 3/9/07, Shane Ambler <pgsql@sheeky.biz> wrote:
We could as well drop the USING part.
> Note to Nikhil: Make sure the new syntax doesn't prevent partitions from
> being placed upon multiple tablespaces in some manner, at CREATE TABLE
> time.
What if the syntax was something like -
CREATE TABLE tabname (
...
...
) PARTITION BY
HASH(expr)
| RANGE(expr)
| LIST(expr)
[PARTITIONS num_partitions] /* will apply to HASH only for now*/
[PARTITION partition_name CHECK(...) [USING TABLESPACE tblspcname],
PARTITION partition_name CHECK(...) [USING TABLESPACE tblspcname]
...
];
And (if we use the ALTER TABLE to add partitions)
ALTER TABLE tabname
ADD PARTITION partition_name CHECK(...)
[USING TABLESPACE tblspcname];
Regards,
Nikhils
--
EnterpriseDB http://www.enterprisedb.com
Hi,
I was thinking along the lines of what Jim had suggested earlier regarding overflow partition. Instead of dumping unmatched rows to the master table, we could put them into a default "DUMP/DUMB" partition.
Given that Simon wants to do away with having the master table APPENDed in the planning phase, this would be better.
Regards,
Nikhils
--
EnterpriseDB http://www.enterprisedb.com
This follows on from the suggestion I made - taken along the lines of
the subject "auto creation of partitions" where I suggested the syntax
of partition check(month of mydatecol) and have a new partition created
as data was entered. With this scenario dropping the partition when it
was empty would complement the creation of a new partition as needed.
Given that there seems to be no real support of going with "auto
maintenance" were new partitions are added as needed, then the auto
dropping of empty partitions would also not apply.
Leaving us with only specific add partition / drop partition commands.
And have the parent table pick up rows not matching any partition check
criteria.
I was thinking along the lines of what Jim had suggested earlier regarding overflow partition. Instead of dumping unmatched rows to the master table, we could put them into a default "DUMP/DUMB" partition.
Given that Simon wants to do away with having the master table APPENDed in the planning phase, this would be better.
Regards,
Nikhils
--
Shane Ambler
pgSQL@Sheeky.Biz
Get Sheeky @ http://Sheeky.Biz
--
EnterpriseDB http://www.enterprisedb.com
> > Since partition is inheritance-based, a simple DROP or "NO > INHERIT" > > will do the job to deal with the partition. Do we want to reinvent > > additional syntax when these are around and are documented? > > Well, if the syntax for adding a new partition eventually > ends up as ALTER TABLE ADD PARTITION, then it would make more > sense that you remove a partition via ALTER TABLE DROP PARTITION. But DROP PARTITION usually moves the data from this partition to other partitions, so it is something different. Andreas
On Fri, 2007-03-09 at 11:48 +0530, NikhilS wrote: > Hi, > > On 3/9/07, Shane Ambler <pgsql@sheeky.biz> wrote: > > > Note to Nikhil: Make sure the new syntax doesn't prevent > partitions from > > being placed upon multiple tablespaces in some manner, at > CREATE TABLE > > time. > > What if the syntax was something like - > > CREATE TABLE tabname ( > ... > ... > ) PARTITION BY > HASH(expr) > | RANGE(expr) > | LIST(expr) > [PARTITIONS num_partitions] /* will apply to HASH only for > now*/ > [PARTITION partition_name CHECK(...) [USING TABLESPACE > tblspcname], > PARTITION partition_name CHECK(...) [USING TABLESPACE > tblspcname] > ... > ]; > > > And (if we use the ALTER TABLE to add partitions) > > ALTER TABLE tabname > ADD PARTITION partition_name CHECK(...) > [USING TABLESPACE tblspcname]; > > > > We could as well drop the USING part. Why would we support HASH partitions? If you did, the full syntax for hash clusters should be supported. If we do the CHECK clauses like that then we still have don't have a guaranteed non-overlap between partitions. It would be easier to use Oracle syntax and then construct the CHECK clauses from that. Also, the syntax needs to be fairly complex to allow for a mixture of modes, e.g. range and list partitioning. That is currently possible today and the syntax for doing that is IMHO much simpler than the Oracle "simple" way of specifying it. An alternative is to provide a partitioning function which decides which partition each values goes into. PARTITION FUNCTION which_partition(date_col) The partition function must return an unsigned integer > 0, which would correspond to particular partitions. Partitions would be numbered 1..N, and named tablename_partM where 1 <= M <= N. The input and contents of the partition function would be up to the user. e.g. CREATE FUNCTION range_partition(date date_col) {if (date_col < D1) return 1;else if (date_col < D2) return 2;else if (date_col < D3) return 3; return 4; } Doing it this way would allow us to easily join two tables based upon a common partition function. In time, I would suggest we support both ways: declarative and functional. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
Hi,
In MySQL, Oracle, the syntax for HASH partitions seems to be similar to the one mentioned. I do not know much about hash clusters though.
Again Oracle, MySQL use "VALUES LESS THAN (expr)" format for RANGE partitions. So you mean that they end up creating ranges like "MININT - Range1", "Range1+1 - Range2" etc for each of the partitions?
I think Postgres users are used to the CHECK clauses and I still feel that the onus of distinct partitions lies on the partition creator.
Subpartitioning is not being targeted right now, but could be put on the TODO list for further enhancements.
Till now, we are going the declarative way.
Regards,
Nikhils
--
EnterpriseDB http://www.enterprisedb.com
Why would we support HASH partitions?
If you did, the full syntax for hash clusters should be supported.
In MySQL, Oracle, the syntax for HASH partitions seems to be similar to the one mentioned. I do not know much about hash clusters though.
If we do the CHECK clauses like that then we still have don't have a
guaranteed non-overlap between partitions. It would be easier to use
Oracle syntax and then construct the CHECK clauses from that.
Again Oracle, MySQL use "VALUES LESS THAN (expr)" format for RANGE partitions. So you mean that they end up creating ranges like "MININT - Range1", "Range1+1 - Range2" etc for each of the partitions?
I think Postgres users are used to the CHECK clauses and I still feel that the onus of distinct partitions lies on the partition creator.
Also, the syntax needs to be fairly complex to allow for a mixture of
modes, e.g. range and list partitioning. That is currently possible
today and the syntax for doing that is IMHO much simpler than the Oracle
"simple" way of specifying it.
Subpartitioning is not being targeted right now, but could be put on the TODO list for further enhancements.
An alternative is to provide a partitioning function which decides which
partition each values goes into.
PARTITION FUNCTION which_partition(date_col)
The partition function must return an unsigned integer > 0, which would
correspond to particular partitions. Partitions would be numbered 1..N,
and named tablename_partM where 1 <= M <= N.
The input and contents of the partition function would be up to the
user. e.g.
CREATE FUNCTION range_partition(date date_col)
{
if (date_col < D1)
return 1;
else if (date_col < D2)
return 2;
else if (date_col < D3)
return 3;
return 4;
}
Doing it this way would allow us to easily join two tables based upon a
common partition function.
In time, I would suggest we support both ways: declarative and
functional.
Till now, we are going the declarative way.
Regards,
Nikhils
EnterpriseDB http://www.enterprisedb.com
<p><font size="2">Simon,<br /><br /> What happens to the data when the function is dropped or replaced?<br /><br /> - Luke<br/><br /> Msg is shrt cuz m on ma treo<br /><br /> -----Original Message-----<br /> From: Simon Riggs [<a href="mailto:simon@2ndquadrant.com">mailto:simon@2ndquadrant.com</a>]<br/> Sent: Friday, March 09, 2007 06:20 AM EasternStandard Time<br /> To: NikhilS<br /> Cc: Shane Ambler; Luke Lonergan; Zeugswetter Andreas ADI SD; Peter Eisentraut;pgsql-hackers@postgresql.org<br /> Subject: Re: [HACKERS] Auto creation of Partitions<br /><br /> On Fri,2007-03-09 at 11:48 +0530, NikhilS wrote:<br /> > Hi,<br /> ><br /> > On 3/9/07, Shane Ambler <pgsql@sheeky.biz>wrote:<br /> > <br /> > > Note to Nikhil: Make sure the new syntax doesn'tprevent<br /> > partitions from<br /> > > being placed upon multiple tablespaces in somemanner, at<br /> > CREATE TABLE<br /> > > time.<br /> > <br /> > What ifthe syntax was something like -<br /> > <br /> > CREATE TABLE tabname (<br /> > ...<br/> > ...<br /> > ) PARTITION BY<br /> > HASH(expr)<br /> > | RANGE(expr)<br/> > | LIST(expr)<br /> > [PARTITIONS num_partitions] /* will apply to HASH only for<br/> > now*/<br /> > [PARTITION partition_name CHECK(...) [USING TABLESPACE<br /> > tblspcname],<br /> > PARTITION partition_name CHECK(...) [USING TABLESPACE<br /> > tblspcname]<br/> > ...<br /> > ];<br /> > <br /> > <br /> > And (ifwe use the ALTER TABLE to add partitions)<br /> > <br /> > ALTER TABLE tabname<br /> > ADD PARTITION partition_name CHECK(...)<br /> > [USING TABLESPACE tblspcname];<br /> > <br/> ><br /> ><br /> > We could as well drop the USING part.<br /><br /> Why would we support HASHpartitions?<br /> If you did, the full syntax for hash clusters should be supported.<br /><br /> If we do the CHECK clauseslike that then we still have don't have a<br /> guaranteed non-overlap between partitions. It would be easier to use<br/> Oracle syntax and then construct the CHECK clauses from that.<br /><br /> Also, the syntax needs to be fairly complexto allow for a mixture of<br /> modes, e.g. range and list partitioning. That is currently possible<br /> today andthe syntax for doing that is IMHO much simpler than the Oracle<br /> "simple" way of specifying it.<br /><br /> An alternativeis to provide a partitioning function which decides which<br /> partition each values goes into.<br /><br /> PARTITIONFUNCTION which_partition(date_col)<br /><br /> The partition function must return an unsigned integer > 0, whichwould<br /> correspond to particular partitions. Partitions would be numbered 1..N,<br /> and named tablename_partMwhere 1 <= M <= N.<br /><br /> The input and contents of the partition function would be up to the<br/> user. e.g.<br /><br /> CREATE FUNCTION range_partition(date date_col)<br /> {<br /> if (date_col < D1)<br/> return 1;<br /> else if (date_col < D2)<br /> return 2;<br /> else if (date_col < D3)<br /> return 3;<br /><br /> return 4;<br /> }<br /><br /> Doingit this way would allow us to easily join two tables based upon a<br /> common partition function.<br /><br /> In time,I would suggest we support both ways: declarative and<br /> functional.<br /><br /> --<br /> Simon Riggs <br/> EnterpriseDB <a href="http://www.enterprisedb.com">http://www.enterprisedb.com</a><br /><br /><br/><br /></font>
On Fri, 2007-03-09 at 07:40 -0500, Luke Lonergan wrote: > What happens to the data when the function is dropped or replaced? > Well, that wouldn't happen because you build in a dependency. I'm not working on this, so don't expect lots of detail. The idea is essentially to implement things the way SQLServer does it. The function would need some care. It might even need a function that writes a function. e.g. CreateRangePartitionFunction('{date1,date2,date3}', funcname); ReplaceRangePartitionFunction('{date1,date2,date3}', funcname); which would then give a properly designed function called funcname to be used for partitioning, which would include tests to make sure a partition wasn't inadvertently excluded from the list of existing partitions of any table that used it. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
On Mar 9, 2007, at 3:31 AM, Zeugswetter Andreas ADI SD wrote: >>> Since partition is inheritance-based, a simple DROP or "NO >> INHERIT" >>> will do the job to deal with the partition. Do we want to reinvent >>> additional syntax when these are around and are documented? >> >> Well, if the syntax for adding a new partition eventually >> ends up as ALTER TABLE ADD PARTITION, then it would make more >> sense that you remove a partition via ALTER TABLE DROP PARTITION. > > But DROP PARTITION usually moves the data from this partition to other > partitions, > so it is something different. It does? IIRC every partitioning system I've seen DROP PARTITION drops the data as well. It's up to you to move it somewhere else if you want to keep it. -- Jim Nasby jim@nasby.net EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
Ühel kenal päeval, R, 2007-03-09 kell 15:41, kirjutas Jim Nasby: > On Mar 9, 2007, at 3:31 AM, Zeugswetter Andreas ADI SD wrote: > >>> Since partition is inheritance-based, a simple DROP or "NO > >> INHERIT" > >>> will do the job to deal with the partition. Do we want to reinvent > >>> additional syntax when these are around and are documented? > >> > >> Well, if the syntax for adding a new partition eventually > >> ends up as ALTER TABLE ADD PARTITION, then it would make more > >> sense that you remove a partition via ALTER TABLE DROP PARTITION. > > > > But DROP PARTITION usually moves the data from this partition to other > > partitions, > > so it is something different. > > It does? IIRC every partitioning system I've seen DROP PARTITION > drops the data as well. It's up to you to move it somewhere else if > you want to keep it. Will this proposed DROP PARTITION just disassociate the table from the master, or will it actually drop the partitions table from the whole database ? -- ---------------- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com
On Friday 09 March 2007 01:23, NikhilS wrote: > Hi, > > > This follows on from the suggestion I made - taken along the lines of > > the subject "auto creation of partitions" where I suggested the syntax > > of partition check(month of mydatecol) and have a new partition created > > as data was entered. With this scenario dropping the partition when it > > was empty would complement the creation of a new partition as needed. > > > > Given that there seems to be no real support of going with "auto > > maintenance" were new partitions are added as needed, then the auto > > dropping of empty partitions would also not apply. > > > > Leaving us with only specific add partition / drop partition commands. > > And have the parent table pick up rows not matching any partition check > > criteria. > > I was thinking along the lines of what Jim had suggested earlier regarding > overflow partition. Instead of dumping unmatched rows to the master table, > we could put them into a default "DUMP/DUMB" partition. > I'm quite content dumping unmatched rows into the master table. This makes it very easy to scan partitioned tables for busted partition setups. Having a DUMP tables seems only different semantically, so why learn new semantics? > Given that Simon wants to do away with having the master table APPENDed in > the planning phase, this would be better. > ISTM you're trading appending the master table for appending the DUMP partition, which afaict would give you no gain. -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL
Hi,
--
EnterpriseDB http://www.enterprisedb.com
On 3/10/07, Hannu Krosing <hannu@skype.net> wrote:
Ühel kenal päeval, R, 2007-03-09 kell 15:41, kirjutas Jim Nasby:
> On Mar 9, 2007, at 3:31 AM, Zeugswetter Andreas ADI SD wrote:
> >>> Since partition is inheritance-based, a simple DROP or "NO
> >> INHERIT"
> >>> will do the job to deal with the partition. Do we want to reinvent
> >>> additional syntax when these are around and are documented?
> >>
> >> Well, if the syntax for adding a new partition eventually
> >> ends up as ALTER TABLE ADD PARTITION, then it would make more
> >> sense that you remove a partition via ALTER TABLE DROP PARTITION.
> >
> > But DROP PARTITION usually moves the data from this partition to other
> > partitions,
> > so it is something different.
>
> It does? IIRC every partitioning system I've seen DROP PARTITION
> drops the data as well. It's up to you to move it somewhere else if
> you want to keep it.
Will this proposed DROP PARTITION just disassociate the table from the
master, or will it actually drop the partitions table from the whole
database ?
Thats why I would prefer the existing mechanism, there a DROP on the child removes it and a NO INHERIT disassociates it. There might be situations where we would want to just disassociate and not drop.
Regards,
Nikhils
--
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia
Skype me: callto:hkrosing
Get Skype for free: http://www.skype.com
--
EnterpriseDB http://www.enterprisedb.com
Hi,
--
EnterpriseDB http://www.enterprisedb.com
> Given that Simon wants to do away with having the master table APPENDed in
> the planning phase, this would be better.
>
ISTM you're trading appending the master table for appending the DUMP
partition, which afaict would give you no gain.
If there are entries in the master table, I think it would get appended for all queries regardless of whether we need to examine its contents or not. Segregating dump data into a partition will avoid that.
I have seen examples in some other databases wherein a partition specifies a range of "someval - MAXINT" for instance, to catch such cases.
That again means that the onus is on the partition creator most of the times..
Regards,
Nikhils
--
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL
--
EnterpriseDB http://www.enterprisedb.com
On Saturday 10 March 2007 00:13, NikhilS wrote: > Hi, > > > > Given that Simon wants to do away with having the master table APPENDed > > > > in > > > > > the planning phase, this would be better. > > > > ISTM you're trading appending the master table for appending the DUMP > > partition, which afaict would give you no gain. > > If there are entries in the master table, I think it would get appended for > all queries regardless of whether we need to examine its contents or not. > Segregating dump data into a partition will avoid that. > > I have seen examples in some other databases wherein a partition specifies > a range of "someval - MAXINT" for instance, to catch such cases. > > That again means that the onus is on the partition creator most of the > times.. *shrug*... we can do that now in pgsql -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL