Thread: Partitioning wiki page
Hi all, I have put a first draft of the wiki page dedicated to the table partitioning development at http://wiki.postgresql.org/wiki/Table_partitioning The page is still incomplete and need much more work but it should be a good starting point for discussions and a way to move forward. Thanks, Emmanuel -- Emmanuel Cecchet Aster Data Systems Web: http://www.asterdata.com
On Tue, Dec 16, 2008 at 09:48:35PM -0500, Emmanuel Cecchet wrote: > Hi all, > > I have put a first draft of the wiki page dedicated to the table > partitioning development at > http://wiki.postgresql.org/wiki/Table_partitioning > > The page is still incomplete and need much more work but it should > be a good starting point for discussions and a way to move forward. Any word on moving partitions off-line and back on? This is a very common use case, so if it's not supported, it'd be good to document that this is so :) Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On Tue, 2008-12-16 at 21:48 -0500, Emmanuel Cecchet wrote: > I have put a first draft of the wiki page dedicated to the table > partitioning development at > http://wiki.postgresql.org/wiki/Table_partitioning > > The page is still incomplete and need much more work but it should be a > good starting point for discussions and a way to move forward. I'm glad you're looking to work on it. I have a few comments. At the moment, all you've done is show examples of how Oracle does partitioning and restate my requirements document, significant chunks completely verbatim. If you do that I think it fair to ask for a specific credit on that. But even so, thank you for the compliment. Others have suggested different syntax also, so you need to come up with arguments in favour of this particular syntax. Yes, Oracle does it, but that doesn't mean its the best way of doing it. Just check out the number of pages of DDL needed to configure a TPC-H test. That just happens to be the way they first thought of doing it and everybody has followed them. Bear in mind they have introduced new ways of doing this since then. Oracle's new interval partitioning sounds great, but it ignores the reality that most data varies considerably over time, either growing or fluctuating. I much prefer the idea of a size-equalized partitioning scheme, as implemented by Coppereye's Greenwich. That method gives equal sized partitions even when data varies. It also needs less DBA input because you don't even need to specify the time period. (But it probably also requires autonomous transactions to create new partitions when required, or at least a way of lock/release that doesn't last until the end of very long transactions). So my inclination would be to implement auto-partitioning first and then spend more time on the optimizer and performance related aspects of this topic, rather than on user interface. Also, if we were going to have hash partitioning, I would have thought we would use a virtual hash, like Teradata. But I'm not sure we want hash partitioning unless we have multiple nodes (which we don't). We're in the middle of a CommitFest, so I won't comment further and I guess you won't get many people passing comment yet either. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
> On Tue, 2008-12-16 at 21:48 -0500, Emmanuel Cecchet wrote: > I'm glad you're looking to work on it. I have a few comments. > Others have suggested different syntax also, so you need to come up with > arguments in favour of this particular syntax. I am not a fan of the proposed syntax. It is conceptually similar to what we already do with constraints, but invents a whole new syntax to no obvious benefit that I can see. I think we would do well to look at what other systems besides Oracle do, as well as considering any newer ideas Oracle may have introduced. Perhaps this would be a good thing to add to the Wiki page - instead of saying, this is the design, say, here are some different possibilities, what do we like? > Oracle's new interval partitioning sounds great, but it ignores the > reality that most data varies considerably over time, either growing or > fluctuating. I much prefer the idea of a size-equalized partitioning > scheme, as implemented by Coppereye's Greenwich. That method gives equal Sometimes (though certainly not always), the structure of the underling data makes interval partitioning a win, as when for example you are accumulating transactions that are billed at the end of each month. If you do a lot of queries on the open transactions for the current month, you want to make sure that there's a partition break at the start of the month so that you're not unnecessarily scanning some of the previous month's entries. ...Robert
Simon, > At the moment, all you've done is show examples of how Oracle does > partitioning and restate my requirements document, significant chunks > completely verbatim. If you do that I think it fair to ask for a > specific credit on that. But even so, thank you for the compliment. > It was not my intention to steal your work but to put together everything that was produced so far. I tried to put in the links section, references to all documents I have consulted so far. I thought this was a collective work and therefore did not ask for specific credit for everyone that participated on the mailing list. As you can also notice, I did not claim any credit nor put my name on the wiki page. I am just trying to help the community. If you think that every contribution should acknowledge its original author, I can add a bibliography and put references but this will make the page harder to maintain. Regarding the current syntax, it is not just a copy paste of Oracle but a merge between Oracle and MySQL. I have not looked yet at other products being new to the datawarehouse world. As I mentioned there is still a lot of work to do and I will try to integrate feedback as we go. Best regards, Emmanuel -- Emmanuel Cecchet Aster Data Systems Web: http://www.asterdata.com
Hi David, Could you explain me what do you mean by putting a partition offline? I am just starting to get familiar with the use cases at Aster Data Systems that are probably not covering all the requirements that everyone has seen so far in the field. So I am eager to learn and try to integrate that in the wiki page. Thanks for the feedback, Emmanuel > On Tue, Dec 16, 2008 at 09:48:35PM -0500, Emmanuel Cecchet wrote: > >> Hi all, >> >> I have put a first draft of the wiki page dedicated to the table >> partitioning development at >> http://wiki.postgresql.org/wiki/Table_partitioning >> >> The page is still incomplete and need much more work but it should >> be a good starting point for discussions and a way to move forward. >> > > Any word on moving partitions off-line and back on? This is a very > common use case, so if it's not supported, it'd be good to document > that this is so :) > > Cheers, > David. > -- > David Fetter <david@fetter.org> http://fetter.org/ > Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter > Skype: davidfetter XMPP: david.fetter@gmail.com > > Remember to vote! > Consider donating to Postgres: http://www.postgresql.org/about/donate > -- Emmanuel Cecchet Aster Data Systems Web: http://www.asterdata.com
Hi Robert, Thanks all for the time you are spending commenting on this during this busy Commit Fest. > I am not a fan of the proposed syntax. It is conceptually similar to > what we already do with constraints, but invents a whole new syntax to > no obvious benefit that I can see. Actually I did not invent a new syntax but tried to map the Oracle syntax which seems to be a requirement that appeared at multiple occasions in the discussions. The question is more whether Postgres wants to have its own syntax (and there might be good reasons for that) or if we want to use a syntax similar to other databases for easier migration (after all Postgres is very late on that topic compared to other databases). > I think we would do well to look > at what other systems besides Oracle do, as well as considering any > newer ideas Oracle may have introduced. Perhaps this would be a good > thing to add to the Wiki page - instead of saying, this is the design, > say, here are some different possibilities, what do we like? > Maybe I was not clear on the intent of the wiki page. I was not trying to pretend that this was the design but I was just trying to do my homework on what are the issues that we will have to address (that's basically the section headers) and I just filled the sections with what I understood so far from the discussions, the Oracle and MySQL documentations. So yes, there is more work to do on understanding what other systems besides Oracle & MySQL do, and then figuring out what we want to propose in Postgres. >> Oracle's new interval partitioning sounds great, but it ignores the >> reality that most data varies considerably over time, either growing or >> fluctuating. I much prefer the idea of a size-equalized partitioning >> scheme, as implemented by Coppereye's Greenwich. That method gives equal >> > > Sometimes (though certainly not always), the structure of the > underling data makes interval partitioning a win, as when for example > you are accumulating transactions that are billed at the end of each > month. If you do a lot of queries on the open transactions for the > current month, you want to make sure that there's a partition break at > the start of the month so that you're not unnecessarily scanning some > of the previous month's entries. > Partitions of similar sized are usually achieved by hash partitioning (in both Oracle and MySQL). There are many use cases where range/interval partitioning makes a lot of sense. One use case that was mentioned to me is multi-level hierarchies like year->month->week->day that would require more than 2 levels of partitioning. I think we can think of a way of having an arbitrary depth of partitions even though partition management might become slightly more complex/ I will try to integrate the comments as I see them on the mailing list but feel free also to update the wiki page with your thoughts and use cases. Thanks again for the feedback, Emmanuel -- Emmanuel Cecchet Aster Data Systems Web: http://www.asterdata.com
On Thu, Dec 18, 2008 at 12:04:00AM -0500, Emmanuel Cecchet wrote: > Hi David, > > Could you explain me what do you mean by putting a partition offline? I sometimes use pg_dump to archive a partition, then drop that table, bringing it offline, but if needed, it's there in the archive, ready to restore. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Hi,
Simon, if you look at the other current discussions on partitioning, Emmanuel has mentioned your document a number of times (crediting you directly via that). He had even specificly asked if it was ok to use your document and export it to the wiki. I think its great that he has started up this wiki page and taken on the responsibility to maintain and move this forward.
Yes we have to have some syntax and even to me the Oracle, Mysql (which I think is completely inspired from the Oracle syntax) seemed a good starting point. Even the grammar presented here in this thread was similar:
http://www.nabble.com/Declarative-partitioning-grammar-td14767382.html
Also, I have been thinking that we might collect all different proposals on the wiki but what is the next step? Maybe when people meet up at PGCon or other developers meets, they should have a specific agenda to take stock of the current discussions and decide on one approach to move ahead.
Regards,
Nikhils
--
http://www.enterprisedb.com
It was not my intention to steal your work but to put togetherAt the moment, all you've done is show examples of how Oracle does
partitioning and restate my requirements document, significant chunks
completely verbatim. If you do that I think it fair to ask for a
specific credit on that. But even so, thank you for the compliment.
everything that was produced so far. I tried to put in the links
section, references to all documents I have consulted so far. I thought
this was a collective work and therefore did not ask for specific credit
for everyone that participated on the mailing list. As you can also
notice, I did not claim any credit nor put my name on the wiki page. I
am just trying to help the community.
If you think that every contribution should acknowledge its original
author, I can add a bibliography and put references but this will make
the page harder to maintain.
Simon, if you look at the other current discussions on partitioning, Emmanuel has mentioned your document a number of times (crediting you directly via that). He had even specificly asked if it was ok to use your document and export it to the wiki. I think its great that he has started up this wiki page and taken on the responsibility to maintain and move this forward.
Regarding the current syntax, it is not just a copy paste of Oracle but
a merge between Oracle and MySQL. I have not looked yet at other
products being new to the datawarehouse world. As I mentioned there is
still a lot of work to do and I will try to integrate feedback as we go.
Yes we have to have some syntax and even to me the Oracle, Mysql (which I think is completely inspired from the Oracle syntax) seemed a good starting point. Even the grammar presented here in this thread was similar:
http://www.nabble.com/Declarative-partitioning-grammar-td14767382.html
Also, I have been thinking that we might collect all different proposals on the wiki but what is the next step? Maybe when people meet up at PGCon or other developers meets, they should have a specific agenda to take stock of the current discussions and decide on one approach to move ahead.
Regards,
Nikhils
--
http://www.enterprisedb.com
On Thu, 2008-12-18 at 00:00 -0500, Emmanuel Cecchet wrote: > > At the moment, all you've done is show examples of how Oracle does > > partitioning and restate my requirements document, significant chunks > > completely verbatim. If you do that I think it fair to ask for a > > specific credit on that. But even so, thank you for the compliment. > > > It was not my intention to steal your work but to put together > everything that was produced so far. I tried to put in the links > section, references to all documents I have consulted so far. I thought > this was a collective work and therefore did not ask for specific credit > for everyone that participated on the mailing list. As you can also > notice, I did not claim any credit nor put my name on the wiki page. I > am just trying to help the community. Approximately 50% of the Wiki entry is *exactly* the same as the document I wrote. Yes, the Wiki may one day become a collective work, but currently the following Wiki entry is fairly obviously not a collective work. http://wiki.postgresql.org/wiki/Table_partitioning I think asking that you give credit in these circumstances was and is fair and reasonable. Your attitude to this is quite remarkable and worrying. I have checked the source code examples you have used also and Google says these come directly from copyrighted Oracle documentation. You haven't even changed the names of datatypes to PostgreSQL ones. http://download-east.oracle.com/docs/cd/B13789_01/server.101/b10736/parpart.htm I strongly doubt you will be able to persuade Oracle that the PostgreSQL Wiki is a "collective work" and that they deserve no credit for their *copyrighted* work. I personally doubt whether we need Oracle's old ideas, but the PostgreSQL project definitely has no need for parts of their manuals. Please remove the copyrighted material immediately from the Wiki. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
>> I am not a fan of the proposed syntax. It is conceptually similar to >> what we already do with constraints, but invents a whole new syntax to >> no obvious benefit that I can see. > > Actually I did not invent a new syntax but tried to map the Oracle syntax > which seems to be a requirement that appeared at multiple occasions in the > discussions. The question is more whether Postgres wants to have its own > syntax (and there might be good reasons for that) or if we want to use a > syntax similar to other databases for easier migration (after all Postgres > is very late on that topic compared to other databases). I'm not saying you invented it - I'm saying that it is new relative to what exists in PG today, and there doesn't seem to be any reason to add something new. I'd also note that even if we decide to use an Oracle-style syntax for creating partitions, we're surely NOT going to use Oracle's format for date constants when and only when we are doing partitioning. ...Robert
> Approximately 50% of the Wiki entry is *exactly* the same as the > document I wrote. Yes, the Wiki may one day become a collective work, > but currently the following Wiki entry is fairly obviously not a > collective work. > http://wiki.postgresql.org/wiki/Table_partitioning [...] > Your attitude to this is quite remarkable and worrying. I have checked > the source code examples you have used also and Google says these come > directly from copyrighted Oracle documentation. You haven't even changed > the names of datatypes to PostgreSQL ones. > http://download-east.oracle.com/docs/cd/B13789_01/server.101/b10736/parpart.htm > > I strongly doubt you will be able to persuade Oracle that the PostgreSQL > Wiki is a "collective work" and that they deserve no credit for their > *copyrighted* work. For what it's worth, I think Simon is right on target here. There is no value at all in taking two documents written by other people and slamming them together without adding any significant original content, or even attributing the authors. And it is never wise to do something openly illegal with content that belongs to an $80B company. ...Robert
Hi all, I did not find the option to remove the page so I just left the links section. Feel free to remove the page so that there is no trace left of this in the history. I don't think it is worth keeping anything since no one found value in it anyway. Until I figure out a way to come up with a constructive proposal that does not offend anyone, I think that it is best that I do not contribute anything to the wiki anymore. Best, Emmanuel -- Emmanuel Cecchet Aster Data Systems Web: http://www.asterdata.com