Thread: Partitioning wiki page

Partitioning wiki page

From
Emmanuel Cecchet
Date:
Hi all,

I have put a first draft of the wiki page dedicated to the table 
partitioning development at 
http://wiki.postgresql.org/wiki/Table_partitioning

The page is still incomplete and need much more work but it should be a 
good starting point for discussions and a way to move forward.

Thanks,
Emmanuel

-- 
Emmanuel Cecchet
Aster Data Systems
Web: http://www.asterdata.com



Re: Partitioning wiki page

From
David Fetter
Date:
On Tue, Dec 16, 2008 at 09:48:35PM -0500, Emmanuel Cecchet wrote:
> Hi all,
>
> I have put a first draft of the wiki page dedicated to the table
> partitioning development at
> http://wiki.postgresql.org/wiki/Table_partitioning
>
> The page is still incomplete and need much more work but it should
> be a  good starting point for discussions and a way to move forward.

Any word on moving partitions off-line and back on?  This is a very
common use case, so if it's not supported, it'd be good to document
that this is so :)

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: Partitioning wiki page

From
Simon Riggs
Date:
On Tue, 2008-12-16 at 21:48 -0500, Emmanuel Cecchet wrote:

> I have put a first draft of the wiki page dedicated to the table 
> partitioning development at 
> http://wiki.postgresql.org/wiki/Table_partitioning
> 
> The page is still incomplete and need much more work but it should be a 
> good starting point for discussions and a way to move forward.

I'm glad you're looking to work on it. I have a few comments.

At the moment, all you've done is show examples of how Oracle does
partitioning and restate my requirements document, significant chunks
completely verbatim. If you do that I think it fair to ask for a
specific credit on that. But even so, thank you for the compliment.

Others have suggested different syntax also, so you need to come up with
arguments in favour of this particular syntax. Yes, Oracle does it, but
that doesn't mean its the best way of doing it. Just check out the
number of pages of DDL needed to configure a TPC-H test. That just
happens to be the way they first thought of doing it and everybody has
followed them. Bear in mind they have introduced new ways of doing this
since then.

Oracle's new interval partitioning sounds great, but it ignores the
reality that most data varies considerably over time, either growing or
fluctuating. I much prefer the idea of a size-equalized partitioning
scheme, as implemented by Coppereye's Greenwich. That method gives equal
sized partitions even when data varies. It also needs less DBA input
because you don't even need to specify the time period. (But it probably
also requires autonomous transactions to create new partitions when
required, or at least a way of lock/release that doesn't last until the
end of very long transactions).

So my inclination would be to implement auto-partitioning first and then
spend more time on the optimizer and performance related aspects of this
topic, rather than on user interface.

Also, if we were going to have hash partitioning, I would have thought
we would use a virtual hash, like Teradata. But I'm not sure we want
hash partitioning unless we have multiple nodes (which we don't).

We're in the middle of a CommitFest, so I won't comment further and I
guess you won't get many people passing comment yet either.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Partitioning wiki page

From
"Robert Haas"
Date:
> On Tue, 2008-12-16 at 21:48 -0500, Emmanuel Cecchet wrote:
> I'm glad you're looking to work on it. I have a few comments.
> Others have suggested different syntax also, so you need to come up with
> arguments in favour of this particular syntax.

I am not a fan of the proposed syntax.  It is conceptually similar to
what we already do with constraints, but invents a whole new syntax to
no obvious benefit that I can see.  I think we would do well to look
at what other systems besides Oracle do, as well as considering any
newer ideas Oracle may have introduced.  Perhaps this would be a good
thing to add to the Wiki page - instead of saying, this is the design,
say, here are some different possibilities, what do we like?

> Oracle's new interval partitioning sounds great, but it ignores the
> reality that most data varies considerably over time, either growing or
> fluctuating. I much prefer the idea of a size-equalized partitioning
> scheme, as implemented by Coppereye's Greenwich. That method gives equal

Sometimes (though certainly not always), the structure of the
underling data makes interval partitioning a win, as when for example
you are accumulating transactions that are billed at the end of each
month.  If you do a lot of queries on the open transactions for the
current month, you want to make sure that there's a partition break at
the start of the month so that you're not unnecessarily scanning some
of the previous month's entries.

...Robert


Re: Partitioning wiki page

From
Emmanuel Cecchet
Date:
Simon,
> At the moment, all you've done is show examples of how Oracle does
> partitioning and restate my requirements document, significant chunks
> completely verbatim. If you do that I think it fair to ask for a
> specific credit on that. But even so, thank you for the compliment.
>
It was not my intention to steal your work but to put together
everything that was produced so far. I tried to put in the links
section, references to all documents I have consulted so far. I thought
this was a collective work and therefore did not ask for specific credit
for everyone that participated on the mailing list. As you can also
notice, I did not claim any credit nor put my name on the wiki page. I
am just trying to help the community.
If you think that every contribution should acknowledge its original
author, I can add a bibliography and put references but this will make
the page harder to maintain.

Regarding the current syntax, it is not just a copy paste of  Oracle but
a merge between Oracle and MySQL. I have not looked yet at other
products being new to the datawarehouse world. As I mentioned there is
still a lot of work to do and I will try to integrate feedback as we go.

Best regards,
Emmanuel

--
Emmanuel Cecchet
Aster Data Systems
Web: http://www.asterdata.com




Re: Partitioning wiki page

From
Emmanuel Cecchet
Date:
Hi David,

Could you explain me what do you mean by putting a partition offline?
I am just starting to get familiar with the use cases at Aster Data 
Systems that are probably not covering all the requirements that 
everyone has seen so far in the field. So I am eager to learn and try to 
integrate that in the wiki page.

Thanks for the feedback,
Emmanuel

> On Tue, Dec 16, 2008 at 09:48:35PM -0500, Emmanuel Cecchet wrote:
>   
>> Hi all,
>>
>> I have put a first draft of the wiki page dedicated to the table
>> partitioning development at
>> http://wiki.postgresql.org/wiki/Table_partitioning
>>
>> The page is still incomplete and need much more work but it should
>> be a  good starting point for discussions and a way to move forward.
>>     
>
> Any word on moving partitions off-line and back on?  This is a very
> common use case, so if it's not supported, it'd be good to document
> that this is so :)
>
> Cheers,
> David.
> --
> David Fetter <david@fetter.org> http://fetter.org/
> Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
> Skype: davidfetter      XMPP: david.fetter@gmail.com
>
> Remember to vote!
> Consider donating to Postgres: http://www.postgresql.org/about/donate
>   


-- 
Emmanuel Cecchet
Aster Data Systems
Web: http://www.asterdata.com



Re: Partitioning wiki page

From
Emmanuel Cecchet
Date:
Hi Robert,

Thanks all for the time you are spending commenting on this during this 
busy Commit Fest.
> I am not a fan of the proposed syntax.  It is conceptually similar to
> what we already do with constraints, but invents a whole new syntax to
> no obvious benefit that I can see.  
Actually I did not invent a new syntax but tried to map the Oracle 
syntax which seems to be a requirement that appeared at multiple 
occasions in the discussions. The question is more whether Postgres 
wants to have its own syntax (and there might be good reasons for that) 
or if we want to use a syntax similar to other databases for easier 
migration (after all Postgres is very late on that topic compared to 
other databases).
> I think we would do well to look
> at what other systems besides Oracle do, as well as considering any
> newer ideas Oracle may have introduced.  Perhaps this would be a good
> thing to add to the Wiki page - instead of saying, this is the design,
> say, here are some different possibilities, what do we like?
>   
Maybe I was not clear on the intent of the wiki page. I was not trying 
to pretend that this was the design but I was just trying to do my 
homework on what are the issues that we will have to address (that's 
basically the section headers) and I just filled the sections with what 
I understood so far from the discussions, the Oracle and MySQL 
documentations.
So yes, there is more work to do on understanding what other systems 
besides Oracle & MySQL do, and then figuring out what we want to propose 
in Postgres.
>> Oracle's new interval partitioning sounds great, but it ignores the
>> reality that most data varies considerably over time, either growing or
>> fluctuating. I much prefer the idea of a size-equalized partitioning
>> scheme, as implemented by Coppereye's Greenwich. That method gives equal
>>     
>
> Sometimes (though certainly not always), the structure of the
> underling data makes interval partitioning a win, as when for example
> you are accumulating transactions that are billed at the end of each
> month.  If you do a lot of queries on the open transactions for the
> current month, you want to make sure that there's a partition break at
> the start of the month so that you're not unnecessarily scanning some
> of the previous month's entries.
>   
Partitions of similar sized are usually achieved by hash partitioning 
(in both Oracle and MySQL). There are many use cases where 
range/interval partitioning makes a lot of sense. One use case that was 
mentioned to me is multi-level hierarchies like year->month->week->day 
that would require more than 2 levels of partitioning. I think we can 
think of a way of having an arbitrary depth of partitions even though 
partition management might become slightly more complex/

I will try to integrate the comments as I see them on the mailing list 
but feel free also to update the wiki page with your thoughts and use 
cases.

Thanks again for the feedback,
Emmanuel

-- 
Emmanuel Cecchet
Aster Data Systems
Web: http://www.asterdata.com



Re: Partitioning wiki page

From
David Fetter
Date:
On Thu, Dec 18, 2008 at 12:04:00AM -0500, Emmanuel Cecchet wrote:
> Hi David,
>
> Could you explain me what do you mean by putting a partition offline?

I sometimes use pg_dump to archive a partition, then drop that table,
bringing it offline, but if needed, it's there in the archive, ready
to restore.

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: Partitioning wiki page

From
"Nikhil Sontakke"
Date:
Hi,
At the moment, all you've done is show examples of how Oracle does
partitioning and restate my requirements document, significant chunks
completely verbatim. If you do that I think it fair to ask for a
specific credit on that. But even so, thank you for the compliment.

It was not my intention to steal your work but to put together
everything that was produced so far. I tried to put in the links
section, references to all documents I have consulted so far. I thought
this was a collective work and therefore did not ask for specific credit
for everyone that participated on the mailing list. As you can also
notice, I did not claim any credit nor put my name on the wiki page. I
am just trying to help the community.
If you think that every contribution should acknowledge its original
author, I can add a bibliography and put references but this will make
the page harder to maintain.

Simon, if you look at the other current discussions on partitioning, Emmanuel has mentioned your document a number of times (crediting you directly via that). He had even specificly asked if it was ok to use your document and export it to the wiki. I think its great that he has started up this wiki page and taken on the responsibility to maintain and move this forward.


Regarding the current syntax, it is not just a copy paste of  Oracle but
a merge between Oracle and MySQL. I have not looked yet at other
products being new to the datawarehouse world. As I mentioned there is
still a lot of work to do and I will try to integrate feedback as we go.

Yes we have to have some syntax and even to me the Oracle, Mysql (which I think is completely inspired from the Oracle syntax) seemed a good starting point. Even the grammar presented here in this thread was similar:

http://www.nabble.com/Declarative-partitioning-grammar-td14767382.html

Also, I have been thinking that we might collect all different proposals on the wiki but what is the next step? Maybe when people meet up at PGCon or other developers meets, they should have a specific agenda to take stock of the current discussions and decide on one approach to move ahead.

Regards,
Nikhils
--
http://www.enterprisedb.com

Re: Partitioning wiki page

From
Simon Riggs
Date:
On Thu, 2008-12-18 at 00:00 -0500, Emmanuel Cecchet wrote:

> > At the moment, all you've done is show examples of how Oracle does
> > partitioning and restate my requirements document, significant chunks
> > completely verbatim. If you do that I think it fair to ask for a
> > specific credit on that. But even so, thank you for the compliment.
> >
> It was not my intention to steal your work but to put together
> everything that was produced so far. I tried to put in the links
> section, references to all documents I have consulted so far. I thought
> this was a collective work and therefore did not ask for specific credit
> for everyone that participated on the mailing list. As you can also
> notice, I did not claim any credit nor put my name on the wiki page. I
> am just trying to help the community.

Approximately 50% of the Wiki entry is *exactly* the same as the
document I wrote. Yes, the Wiki may one day become a collective work,
but currently the following Wiki entry is fairly obviously not a
collective work.
http://wiki.postgresql.org/wiki/Table_partitioning

I think asking that you give credit in these circumstances was and is
fair and reasonable.

Your attitude to this is quite remarkable and worrying. I have checked
the source code examples you have used also and Google says these come
directly from copyrighted Oracle documentation. You haven't even changed
the names of datatypes to PostgreSQL ones.
http://download-east.oracle.com/docs/cd/B13789_01/server.101/b10736/parpart.htm

I strongly doubt you will be able to persuade Oracle that the PostgreSQL
Wiki is a "collective work" and that they deserve no credit for their
*copyrighted* work.

I personally doubt whether we need Oracle's old ideas, but the
PostgreSQL project definitely has no need for parts of their manuals.

Please remove the copyrighted material immediately from the Wiki.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Partitioning wiki page

From
"Robert Haas"
Date:
>> I am not a fan of the proposed syntax.  It is conceptually similar to
>> what we already do with constraints, but invents a whole new syntax to
>> no obvious benefit that I can see.
>
> Actually I did not invent a new syntax but tried to map the Oracle syntax
> which seems to be a requirement that appeared at multiple occasions in the
> discussions. The question is more whether Postgres wants to have its own
> syntax (and there might be good reasons for that) or if we want to use a
> syntax similar to other databases for easier migration (after all Postgres
> is very late on that topic compared to other databases).

I'm not saying you invented it - I'm saying that it is new relative to
what exists in PG today, and there doesn't seem to be any reason to
add something new.

I'd also note that even if we decide to use an Oracle-style syntax for
creating partitions, we're surely NOT going to use Oracle's format for
date constants when and only when we are doing partitioning.

...Robert


Re: Partitioning wiki page

From
"Robert Haas"
Date:
> Approximately 50% of the Wiki entry is *exactly* the same as the
> document I wrote. Yes, the Wiki may one day become a collective work,
> but currently the following Wiki entry is fairly obviously not a
> collective work.
> http://wiki.postgresql.org/wiki/Table_partitioning
[...]
> Your attitude to this is quite remarkable and worrying. I have checked
> the source code examples you have used also and Google says these come
> directly from copyrighted Oracle documentation. You haven't even changed
> the names of datatypes to PostgreSQL ones.
> http://download-east.oracle.com/docs/cd/B13789_01/server.101/b10736/parpart.htm
>
> I strongly doubt you will be able to persuade Oracle that the PostgreSQL
> Wiki is a "collective work" and that they deserve no credit for their
> *copyrighted* work.

For what it's worth, I think Simon is right on target here.  There is
no value at all in taking two documents written by other people and
slamming them together without adding any significant original
content, or even attributing the authors.  And it is never wise to do
something openly illegal with content that belongs to an $80B company.

...Robert


Re: Partitioning wiki page

From
Emmanuel Cecchet
Date:
Hi all,

I did not find the option to remove the page so I just left the links 
section. Feel free to remove the page so that there is no trace left of 
this in the history. I don't think it is worth keeping anything since no 
one found value in it anyway.
Until I figure out a way to come up with a constructive proposal that 
does not offend anyone, I think that it is best that I do not contribute 
anything to the wiki anymore.

Best,
Emmanuel

-- 
Emmanuel Cecchet
Aster Data Systems
Web: http://www.asterdata.com