Thread: Automating Partitions in PostgreSQL - Query on syntax

Automating Partitions in PostgreSQL - Query on syntax

From
Kedar Potdar
Date:
<p class="MsoNormal">Hi ,<p class="MsoNormal"> <p class="MsoNormal">We are working on a <a
href="http://archives.postgresql.org/pgsql-hackers/2009-03/msg00897.php">patch</a>to automate partitioning<span
style="color:navy;"> </span>in PostgreSQL.<span style="font-size: 10pt; font-family: "Arial","sans-serif"; color:
navy;"><br/></span><p class="MsoNormal"> <p class="MsoNormal">For Range partitions, we have proposed the syntax which
isas follows –<p class="MsoNormal"> <p class="MsoNormal" style=""><b><span style="font-size: 10pt; font-family:
"CourierNew";">CREATE TABLE emp (</span></b><p class="MsoNormal" style=""><b><span style="font-size: 10pt; font-family:
"CourierNew";">    emp_id          int not null primary key,</span></b><p class="MsoNormal" style=""><b><span
style="font-size:10pt; font-family: "Courier New";">    designation     text not null,</span></b><p class="MsoNormal"
style=""><b><spanstyle="font-size: 10pt; font-family: "Courier New";">    location        varchar(50) not
null,</span></b><pclass="MsoNormal" style=""><b><span style="font-size: 10pt; font-family: "Courier New";">   
jdate          date not null,</span></b><p class="MsoNormal" style=""><b><span style="font-size: 10pt; font-family:
"CourierNew";">    ctc             float not null</span></b><p class="MsoNormal" style=""><b><span style="font-size:
10pt;font-family: "Courier New";"> </span></b><p class="MsoNormal" style=""><b><span style="font-size: 10pt;
font-family:"Courier New";">)</span></b><p class="MsoNormal" style=""><b><span style="font-size: 10pt; font-family:
"CourierNew";">PARTITION BY RANGE (emp_id)</span></b><p class="MsoNormal" style=""><b><span style="font-size: 10pt;
font-family:"Courier New";">(</span></b><p class="MsoNormal" style=""><b><span style="font-size: 10pt; font-family:
"CourierNew";">emp_500 (START 1 END 500),</span></b><p class="MsoNormal" style=""><b><span style="font-size: 10pt;
font-family:"Courier New";">emp_1500 (START 500 END 1500),</span></b><p class="MsoNormal" style=""><b><span
style="font-size:10pt; font-family: "Courier New";">emp_4000 (START 1520 END 4000)</span></b><p
class="MsoNormal"><b><spanstyle="font-size: 10pt; font-family: "Courier New";">);</span></b><p class="MsoNormal"> <p
class="MsoNormal">Asobserved in this syntax, user needs to specify explicitly, the min and max values of a range for a
givenpartition. <p class="MsoNormal">With this design, partition ranges are inherently allowed to be fragmented and
non-contiguous.As ‘gaps’ are allowed <p class="MsoNormal">in the ranges, we’re also supporting an ‘overflow’ partition,
sothat any row, which does not satisfy constraints of any <p class="MsoNormal">existing partitions, does not stall a
bigUPDATE operation and such rows are preserved.(in overflow table)<p class="MsoNormal"> <p class="MsoNormal">However,
Oracleuses user-friendly syntax but make<span style="color: navy;">s</span> it compulsion that partition ranges *have*
tobe contiguous. <p class="MsoNormal"> <p class="MsoNormal" style=""><b><span style="font-size: 10pt; font-family:
"CourierNew";">PARTITION BY RANGE (emp_id)</span></b><p class="MsoNormal" style=""><b><span style="font-size: 10pt;
font-family:"Courier New";">(</span></b><p class="MsoNormal" style=""><b><span style="font-size: 10pt; font-family:
"CourierNew";">Partition emp_500  values less than (500),</span></b><p class="MsoNormal" style=""><b><span
style="font-size:10pt; font-family: "Courier New";">Partition emp_1500 values less than (1500),</span></b><p
class="MsoNormal"style=""><b><span style="font-size: 10pt; font-family: "Courier New";">Partition emp_4000 values less
than(4000),</span></b><p class="MsoNormal" style=""><b><span style="font-size: 10pt; font-family: "Courier
New";">Partitionemp_max  values less than (maxvalue)</span></b><p class="MsoNormal"><b><span style="font-size: 10pt;
font-family:"Courier New";">);</span></b><p class="MsoNormal"> <p class="MsoNormal">As it does not allow fragmented
ranges,it automatically removes the need for an ‘overflow’ partition. <span style="color: navy;"> </span><p
class="MsoNormal"> <pclass="MsoNormal">The syntax proposed by us is more flexible and would handle both the cases of
rangeswith gaps or ranges without gaps. <p class="MsoNormal"> <p class="MsoNormal">I want to seek general opinion from
thecommunity on preferences between user-friendly ‘Oracle’ syntax, and a more generic syntax that allows ‘gaps’ in
partitionranges?<span style="font-size: 10pt; font-family: "Arial","sans-serif"; color: navy;"><br /></span><p
class="MsoNormal"> <pclass="MsoNormal">Regards,<p class="MsoNormal">--<p class="MsoNormal">Kedar<br /> 

Re: Automating Partitions in PostgreSQL - Query on syntax

From
Greg Stark
Date:
On Tue, Apr 21, 2009 at 12:50 PM, Kedar Potdar <kedar.potdar@gmail.com> wrote:
> I want to seek general opinion from the community on preferences between
> user-friendly ‘Oracle’ syntax, and a more generic syntax that allows ‘gaps’
> in partition ranges?


What happens to records in the overflow table when you add a new
partition whose range covers their values?

--
greg


Re: Automating Partitions in PostgreSQL - Query on syntax

From
Kedar Potdar
Date:
Hi Greg,<br /><br />Thanks for your response.<br /><br />Currently, such records are left in the overflow partition and
itsresponsibility <br />of user to insert them into partitioned-table which will then re-direct those to <br
/>appropriatepartitions.<br /><br />Regards,<br />--<br />Kedar.<br /><br /><div class="gmail_quote">On Tue, Apr 21,
2009at 5:29 PM, Greg Stark <span dir="ltr"><<a
href="mailto:stark@enterprisedb.com">stark@enterprisedb.com</a>></span>wrote:<br /><blockquote class="gmail_quote"
style="border-left:1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div class="im">On Tue,
Apr21, 2009 at 12:50 PM, Kedar Potdar <<a href="mailto:kedar.potdar@gmail.com">kedar.potdar@gmail.com</a>>
wrote:<br/> > I want to seek general opinion from the community on preferences between<br /> > user-friendly
‘Oracle’syntax, and a more generic syntax that allows ‘gaps’<br /> > in partition ranges?<br /><br /><br
/></div>Whathappens to records in the overflow table when you add a new<br /> partition whose range covers their
values?<br/><br /> --<br /><font color="#888888">greg<br /></font></blockquote></div><br /> 

Re: Automating Partitions in PostgreSQL - Query on syntax

From
"Dickson S. Guedes"
Date:
Em Ter, 2009-04-21 às 17:20 +0530, Kedar Potdar escreveu:
> Hi ,
>
> We are working on a patch to automate partitioning in PostgreSQL.

Nice. ":)

> For Range partitions, we have proposed the syntax which is as
> follows
>
> (...)
> PARTITION BY RANGE (emp_id)
> (
> emp_500 (START 1 END 500),
> emp_1500 (START 500 END 1500),
> emp_4000 (START 1520 END 4000)
> );

What if I need more columns to set the partitions?


[]s
--
Dickson S. Guedes
mail/xmpp: guedes@guedesoft.net - skype: guediz
http://guedesoft.net - http://planeta.postgresql.org.br

Re: Automating Partitions in PostgreSQL - Query on syntax

From
Kedar Potdar
Date:


2009/4/21 Dickson S. Guedes <listas@guedesoft.net>
Em Ter, 2009-04-21 às 17:20 +0530, Kedar Potdar escreveu:
> Hi ,
>
> We are working on a patch to automate partitioning in PostgreSQL.

Nice. ":)

> For Range partitions, we have proposed the syntax which is as
> follows
>
> (...)
> PARTITION BY RANGE (emp_id)
> (
> emp_500 (START 1 END 500),
> emp_1500 (START 500 END 1500),
> emp_4000 (START 1520 END 4000)
> );

What if I need more columns to set the partitions?
 
You can do so by using command like this,

CREATE TABLE emp (
    emp_id          int not null primary key,
    designation     text not null,
    location        varchar(50) not null,
    jdate           date not null,
    ctc             float not null

)
PARTITION BY RANGE (jdate, emp_id)
(
emp_500 (START '01-01-1980',1 END '01-01-1990',500),
emp_1500(START '01-01-1990',500 END '01-01-2000',1500+10),
emp_4000 (START '1 Jan 2000',1500+20 END 'Jan 1, 2010', 4000-50)
);

You can have multiple columns as partition key attributes and values for these attributes should appear in the order specified.




[]s
--
Dickson S. Guedes
mail/xmpp: guedes@guedesoft.net - skype: guediz
http://guedesoft.net - http://planeta.postgresql.org.br

Re: Automating Partitions in PostgreSQL - Query on syntax

From
"Dickson S. Guedes"
Date:
Vacuum, I guess you forgot to reply-to-all. ":)

Em Ter, 2009-04-21 às 16:05 +0200, vacuum@quantentunnel.de escreveu:
> why not partitions by "conditions"?
>
> you can do that in similar way how "case when then " works ..
>
> CREATE PARTITIONS ON <table>
>   SWITCH <any expression>
>   CASE <value> THEN <part_1>,
>   CASE <value> THEN <part_2>,
>   CASE <value> THEN <part_3>,
>   DEFAULT emp_default;
>
> with a difference to case: <value> may not be a boolean expression
>
> or you can do it as CASE works
>
> forinstance:
>
> CREATE PARTITIONS ON emp (
>   CASE WHEN <value> THEN emp_xxx,
>   CASE WHEN <value> THEN emp_yyy,
>   CASE WHEN <value> emp_zzz,
>   ELSE emp_default
>   END CASE;
> );
>
> -------- Original-Nachricht --------
> > Datum: Tue, 21 Apr 2009 10:46:41 -0300
> > Von: "Dickson S. Guedes" <listas@guedesoft.net>
> > An: Kedar Potdar <kedar.potdar@gmail.com>, pgsql-hackers@postgresql.org
> > Betreff: Re: [HACKERS] Automating Partitions in PostgreSQL - Query on syntax
>
> > Em Ter, 2009-04-21 às 17:20 +0530, Kedar Potdar escreveu:
> > > Hi ,
> > >
> > > We are working on a patch to automate partitioning in PostgreSQL.
> >
> > Nice. ":)
> >
> > > For Range partitions, we have proposed the syntax which is as
> > > follows
> > >
> > > (...)
> > > PARTITION BY RANGE (emp_id)
> > > (
> > > emp_500 (START 1 END 500),
> > > emp_1500 (START 500 END 1500),
> > > emp_4000 (START 1520 END 4000)
> > > );
> >
> > What if I need more columns to set the partitions?
> >
> >
> > []s
> > --
> > Dickson S. Guedes
> > mail/xmpp: guedes@guedesoft.net - skype: guediz
> > http://guedesoft.net - http://planeta.postgresql.org.br
>
--
Dickson S. Guedes
mail/xmpp: guedes@guedesoft.net - skype: guediz
http://guedesoft.net - http://planeta.postgresql.org.br

Re: Automating Partitions in PostgreSQL - Query on syntax

From
Pavel Stehule
Date:
2009/4/21 Kedar Potdar <kedar.potdar@gmail.com>:
>
>
> 2009/4/21 Dickson S. Guedes <listas@guedesoft.net>
>>
>> Em Ter, 2009-04-21 às 17:20 +0530, Kedar Potdar escreveu:
>> > Hi ,
>> >
>> > We are working on a patch to automate partitioning in PostgreSQL.
>>
>> Nice. ":)
>>
>> > For Range partitions, we have proposed the syntax which is as
>> > follows
>> >
>> > (...)
>> > PARTITION BY RANGE (emp_id)
>> > (
>> > emp_500 (START 1 END 500),
>> > emp_1500 (START 500 END 1500),
>> > emp_4000 (START 1520 END 4000)
>> > );
>>
>> What if I need more columns to set the partitions?
>
>
> You can do so by using command like this,
>
> CREATE TABLE emp (
>     emp_id          int not null primary key,
>     designation     text not null,
>     location        varchar(50) not null,
>     jdate           date not null,
>     ctc             float not null
>
> )
> PARTITION BY RANGE (jdate, emp_id)
> (
> emp_500 (START '01-01-1980',1 END '01-01-1990',500),
> emp_1500(START '01-01-1990',500 END '01-01-2000',1500+10),
> emp_4000 (START '1 Jan 2000',1500+20 END 'Jan 1, 2010', 4000-50)
> );
>
> You can have multiple columns as partition key attributes and values for
> these attributes should appear in the order specified.

-1, this is really ugly

Pavel

>
>>
>>
>>
>> []s
>> --
>> Dickson S. Guedes
>> mail/xmpp: guedes@guedesoft.net - skype: guediz
>> http://guedesoft.net - http://planeta.postgresql.org.br
>
>


Re: Automating Partitions in PostgreSQL - Query on syntax

From
Nikhil Sontakke
Date:
>> > (...)
>> > PARTITION BY RANGE (emp_id)
>> > (
>> > emp_500 (START 1 END 500),
>> > emp_1500 (START 500 END 1500),
>> > emp_4000 (START 1520 END 4000)
>> > );
>>
>> What if I need more columns to set the partitions?
>
>
> You can do so by using command like this,
>
> CREATE TABLE emp (
>     emp_id          int not null primary key,
>     designation     text not null,
>     location        varchar(50) not null,
>     jdate           date not null,
>     ctc             float not null
>
> )
> PARTITION BY RANGE (jdate, emp_id)
> (
> emp_500 (START '01-01-1980',1 END '01-01-1990',500),
> emp_1500(START '01-01-1990',500 END '01-01-2000',1500+10),
> emp_4000 (START '1 Jan 2000',1500+20 END 'Jan 1, 2010', 4000-50)
> );
>
> You can have multiple columns as partition key attributes and values for
> these attributes should appear in the order specified.

-1, this is really ugly

Yeah, but what is the syntax for multiple column ranges in Orcl - is it similarly ugly?

As to Kedar's original question about gap-based ranges or Orcl style no-gap ranges, I don't think while partitioning such fine-grained gap-based ranges is such a stringent requirement.

Also specification of the maxvalue range in Orcl's case nicely maps to the overflow partition that is being mentioned upthread.

So whichever way we go maybe we can also consider accomodating the maxval for different datatypes if possible within the syntax itself.

Regards,
Nikhils
 

Pavel

>
>>
>>
>>
>> []s
>> --
>> Dickson S. Guedes
>> mail/xmpp: guedes@guedesoft.net - skype: guediz
>> http://guedesoft.net - http://planeta.postgresql.org.br
>
>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers



--
http://www.enterprisedb.com

Re: Automating Partitions in PostgreSQL - Query on syntax

From
vacuum@quantentunnel.de
Date:
oh sorry .. 

but what I miss in this discussion - where we can define tablespace?

CREATE PARTITIONS ON <table> (

CASE WHEN <condition> THEN emp_xxx [IN <tablespace>],
CASE WHEN <condition> THEN emp_yyy [IN <tablespace>],
CASE WHEN <condition> emp_zzz [IN <tablespace>],
ELSE emp_default [IN <tablespace>]
)

-------- Original-Nachricht --------
> Datum: Tue, 21 Apr 2009 11:11:54 -0300
> Von: "Dickson S. Guedes" <listas@guedesoft.net>
> An: vacuum@quantentunnel.de, pgsql-hackers@postgresql.org
> Betreff: Re: [HACKERS] Automating Partitions in PostgreSQL - Query on syntax

> Vacuum, I guess you forgot to reply-to-all. ":)
> 
> Em Ter, 2009-04-21 às 16:05 +0200, vacuum@quantentunnel.de escreveu:
> > why not partitions by "conditions"? 
> > 
> > you can do that in similar way how "case when then " works .. 
> > 
> > CREATE PARTITIONS ON <table> 
> >   SWITCH <any expression>
> >   CASE <value> THEN <part_1>,
> >   CASE <value> THEN <part_2>,
> >   CASE <value> THEN <part_3>,
> >   DEFAULT emp_default;
> > 
> > with a difference to case: <value> may not be a boolean expression
> > 
> > or you can do it as CASE works
> > 
> > forinstance:
> > 
> > CREATE PARTITIONS ON emp (
> >   CASE WHEN <value> THEN emp_xxx,
> >   CASE WHEN <value> THEN emp_yyy,
> >   CASE WHEN <value> emp_zzz,
> >   ELSE emp_default
> >   END CASE;
> > );
> > 
> > -------- Original-Nachricht --------
> > > Datum: Tue, 21 Apr 2009 10:46:41 -0300
> > > Von: "Dickson S. Guedes" <listas@guedesoft.net>
> > > An: Kedar Potdar <kedar.potdar@gmail.com>,
> pgsql-hackers@postgresql.org
> > > Betreff: Re: [HACKERS] Automating Partitions in PostgreSQL - Query on
> syntax
> > 
> > > Em Ter, 2009-04-21 às 17:20 +0530, Kedar Potdar escreveu:
> > > > Hi , 
> > > > 
> > > > We are working on a patch to automate partitioning in PostgreSQL. 
> > > 
> > > Nice. ":)
> > > 
> > > > For Range partitions, we have proposed the syntax which is as
> > > > follows  
> > > > 
> > > > (...)
> > > > PARTITION BY RANGE (emp_id)
> > > > (
> > > > emp_500 (START 1 END 500),
> > > > emp_1500 (START 500 END 1500),
> > > > emp_4000 (START 1520 END 4000)
> > > > );
> > > 
> > > What if I need more columns to set the partitions?
> > > 
> > > 
> > > []s
> > > -- 
> > > Dickson S. Guedes 
> > > mail/xmpp: guedes@guedesoft.net - skype: guediz
> > > http://guedesoft.net - http://planeta.postgresql.org.br
> > 
> -- 
> Dickson S. Guedes 
> mail/xmpp: guedes@guedesoft.net - skype: guediz
> http://guedesoft.net - http://planeta.postgresql.org.br

-- 
Neu: GMX FreeDSL Komplettanschluss mit DSL 6.000 Flatrate + Telefonanschluss für nur 17,95 Euro/mtl.!*
http://dslspecial.gmx.de/freedsl-surfflat/?ac=OM.AD.PD003K11308T4569a


Re: Automating Partitions in PostgreSQL - Query on syntax

From
Greg Stark
Date:
On Tue, Apr 21, 2009 at 4:13 PM,  <vacuum@quantentunnel.de> wrote:
> oh sorry ..
>
> but what I miss in this discussion - where we can define tablespace?
>
> CREATE PARTITIONS ON <table> (
>
> CASE WHEN <condition> THEN emp_xxx [IN <tablespace>],

One thing you should realize is that the actual details of the syntax
will be knocked around and redefined later when there's actual code
anyways. People love to add their contribution at that point, there's
a nickname for it "bike shedding". You should worry more about the
functionality rather than the precise presentation of it. What are you
aiming to achieve here that the existing system lacks?

The point that you want to be able to set tablespaces is a good one
though. The general point is that you're going to want to be able to
set arbitrary table storage options though. That might mean you need a
separate DDL command for each partition.

The main thing the existing system lacks is an idea of what the
partition key is and how to determine which partition a given key
belongs in without checking every single partition. Figure out how to
accomplish this in the backend and people will be happy to offer their
suggestions for syntax.

-- 
greg


Re: Automating Partitions in PostgreSQL - Query on syntax

From
Robert Haas
Date:
On Tue, Apr 21, 2009 at 11:13 AM,  <vacuum@quantentunnel.de> wrote:
> oh sorry ..
>
> but what I miss in this discussion - where we can define tablespace?
>
> CREATE PARTITIONS ON <table> (
>
> CASE WHEN <condition> THEN emp_xxx [IN <tablespace>],
> CASE WHEN <condition> THEN emp_yyy [IN <tablespace>],
> CASE WHEN <condition> emp_zzz [IN <tablespace>],
> ELSE emp_default [IN <tablespace>]
> )

Oh blech.  That IN <tablespace> syntax is awful.  What about the other
relevant parameters to CREATE TABLE - where are you going to put
those?

WITH (<storage parameter>)
WITH OIDS
WITHOUT OIDS

And what happens when I need to alter a partition after the fact, say
to move it to a new tablespace, or change the fillfactor?  Since your
syntax doesn't give names to the partitions, how will I refer to them
later?

...Robert


Re: Automating Partitions in PostgreSQL - Query on syntax

From
Robert Haas
Date:
On Tue, Apr 21, 2009 at 9:38 AM, Kedar Potdar <kedar.potdar@gmail.com> wrote:
> Currently, such records are left in the overflow partition and its
> responsibility
> of user to insert them into partitioned-table which will then re-direct
> those to
> appropriate partitions.

This doesn't sound like a very good idea, because the planner cannot
then rely on the overflow table not containing tuples that ought to be
within some other partition.

The big win that is associated with table partitioning is using
constraint exclusion to avoid unnecessary partitions scans.

...Robert


Re: Automating Partitions in PostgreSQL - Query on syntax

From
"steven king"
Date:
the idea is, that this statement does not create a new kind of db-object. this "CREATE PARTITIONS" statement should be
ameta-statement -> "meta" like the serial type.
 

with create paritions you can create all defined tables (maybe by inheritin from <table>) at once. 

but you're right - why this statement should not be able to define some other table-properties (except columns)?!

please notice ... the statement below was only a draft - to show the intention. the major difference is the usage of
conditions.. this could lead into creation of triggers with a lot of PL/pqSQL code or invocations of built-in
functions.

open your mind :)

-------- Original-Nachricht --------
> Datum: Tue, 21 Apr 2009 11:39:37 -0400
> Von: Robert Haas <robertmhaas@gmail.com>
> An: vacuum@quantentunnel.de
> CC: "Dickson S. Guedes" <listas@guedesoft.net>, pgsql-hackers@postgresql.org
> Betreff: Re: [HACKERS] Automating Partitions in PostgreSQL - Query on syntax

> On Tue, Apr 21, 2009 at 11:13 AM,  <vacuum@quantentunnel.de> wrote:
> > oh sorry ..
> >
> > but what I miss in this discussion - where we can define tablespace?
> >
> > CREATE PARTITIONS ON <table> (
> >
> > CASE WHEN <condition> THEN emp_xxx [IN <tablespace>],
> > CASE WHEN <condition> THEN emp_yyy [IN <tablespace>],
> > CASE WHEN <condition> emp_zzz [IN <tablespace>],
> > ELSE emp_default [IN <tablespace>]
> > )
> 
> Oh blech.  That IN <tablespace> syntax is awful.  What about the other
> relevant parameters to CREATE TABLE - where are you going to put
> those?
> 
> WITH (<storage parameter>)
> WITH OIDS
> WITHOUT OIDS
> 
> And what happens when I need to alter a partition after the fact, say
> to move it to a new tablespace, or change the fillfactor?  Since your
> syntax doesn't give names to the partitions, how will I refer to them
> later?
> 
> ...Robert

-- 
Neu: GMX FreeDSL Komplettanschluss mit DSL 6.000 Flatrate + Telefonanschluss für nur 17,95 Euro/mtl.!*
http://dslspecial.gmx.de/freedsl-surfflat/?ac=OM.AD.PD003K11308T4569a


Re: Automating Partitions in PostgreSQL - Query on syntax

From
Csaba Nagy
Date:
On Tue, 2009-04-21 at 11:43 -0400, Robert Haas wrote:
> This doesn't sound like a very good idea, because the planner cannot
> then rely on the overflow table not containing tuples that ought to be
> within some other partition.
> 
> The big win that is associated with table partitioning is using
> constraint exclusion to avoid unnecessary partitions scans.

Well it could always check 2 partitions: the overflow and the one
selected by the constraint exclusion. If the overflow is kept empty by
properly setting up the partitions so that all insertions always go to
one of the active partitions, that would be cheap enough too while still
providing a way to catch unexpected data. Then when a new partition is
defined, there's no need to shuffle around data immediately, but there
could be a maintenance command to clean up the overflow... not to
mention that you could define a trigger to create the new partition once
you get something in the overflow (how cool would that be if it would
work ?).

Cheers,
Csaba.




Re: Automating Partitions in PostgreSQL - Query on syntax

From
vacuum@quantentunnel.de
Date:
-------- Original-Nachricht --------
> Datum: Tue, 21 Apr 2009 17:50:02 +0200
> Von: Csaba Nagy <nagy@ecircle-ag.com>
> An: Robert Haas <robertmhaas@gmail.com>
> CC: Kedar Potdar <kedar.potdar@gmail.com>, Greg Stark <stark@enterprisedb.com>, pgsql-hackers@postgresql.org
> Betreff: Re: [HACKERS] Automating Partitions in PostgreSQL - Query on syntax


> > The big win that is associated with table partitioning is using
> > constraint exclusion to avoid unnecessary partitions scans.

there is no reason for not using the 'CASE condition' for constraint exclusion.



-- 
Neu: GMX FreeDSL Komplettanschluss mit DSL 6.000 Flatrate + Telefonanschluss für nur 17,95 Euro/mtl.!*
http://dslspecial.gmx.de/freedsl-surfflat/?ac=OM.AD.PD003K11308T4569a


Re: Automating Partitions in PostgreSQL - Query on syntax

From
"steven king"
Date:
-------- Original-Nachricht --------
> Datum: Tue, 21 Apr 2009 16:37:15 +0100
> Von: Greg Stark <stark@enterprisedb.com>
> An: vacuum@quantentunnel.de
> CC: "Dickson S. Guedes" <listas@guedesoft.net>, pgsql-hackers@postgresql.org
> Betreff: Re: [HACKERS] Automating Partitions in PostgreSQL - Query on syntax

> On Tue, Apr 21, 2009 at 4:13 PM,  <vacuum@quantentunnel.de> wrote:
> > oh sorry ..
> >
> > but what I miss in this discussion - where we can define tablespace?
> >
> > CREATE PARTITIONS ON <table> (
> >
> > CASE WHEN <condition> THEN emp_xxx [IN <tablespace>],
> 
> The point that you want to be able to set tablespaces is a good one
> though. The general point is that you're going to want to be able to
> set arbitrary table storage options though. That might mean you need a
> separate DDL command for each partition.

not in real - currently, when i create a table, the specified tablespaces must exist. this restriction you can apply to
CREATEPARTITIONS too.
 
> The main thing the existing system lacks is an idea of what the
> partition key is and how to determine which partition a given key
> belongs in without checking every single partition. Figure out how to
> accomplish this in the backend and people will be happy to offer their
> suggestions for syntax.

why a partition-key cannot be a complex expression? 

like this:

SWITCH <expression>
CASE <key_value> TABLE <table> [IN <table_space>]
CASE <key_value> TABLE <table> [IN <table_space>]
CASE <key_value> TABLE <table> [IN <table_space>]
DEFAULT <table> [IN <table_space>]

that is generic

> -- 
> greg
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

-- 
Psssst! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger01


Re: Automating Partitions in PostgreSQL - Query on syntax

From
vacuum@quantentunnel.de
Date:
> SWITCH <expression>
> CASE <key_value> TABLE <table> [IN <table_space>]
> CASE <key_value> TABLE <table> [IN <table_space>]
> CASE <key_value> TABLE <table> [IN <table_space>]
> DEFAULT <table> [IN <table_space>]

some examples:

CREATE TABLE ttt (id integer,txt varchar
)

SWITCH id % 3
CASE 0 TABLE <table> [IN <table_space>]
CASE 1 TABLE <table> [IN <table_space>]
CASE 2 TABLE <table> [IN <table_space>]
DEFAULT <table> [IN <table_space>]

or 

SWITCH lower(txt)
CASE 'hello' TABLE <table> [IN <table_space>]
CASE 'world' TABLE <table> [IN <table_space>]
CASE 'foo' TABLE <table> [IN <table_space>]
CASE 'bar' TABLE <table> [IN <table_space>]
DEFAULT <table> [IN <table_space>]

-- 
Neu: GMX FreeDSL Komplettanschluss mit DSL 6.000 Flatrate + Telefonanschluss für nur 17,95 Euro/mtl.!*
http://dslspecial.gmx.de/freedsl-surfflat/?ac=OM.AD.PD003K11308T4569a


Re: Automating Partitions in PostgreSQL - Query on syntax

From
Robert Haas
Date:
On Tue, Apr 21, 2009 at 11:50 AM, Csaba Nagy <nagy@ecircle-ag.com> wrote:
> On Tue, 2009-04-21 at 11:43 -0400, Robert Haas wrote:
>> This doesn't sound like a very good idea, because the planner cannot
>> then rely on the overflow table not containing tuples that ought to be
>> within some other partition.
>>
>> The big win that is associated with table partitioning is using
>> constraint exclusion to avoid unnecessary partitions scans.
>
> Well it could always check 2 partitions: the overflow and the one
> selected by the constraint exclusion. If the overflow is kept empty by
> properly setting up the partitions so that all insertions always go to
> one of the active partitions, that would be cheap enough too while still
> providing a way to catch unexpected data. Then when a new partition is
> defined, there's no need to shuffle around data immediately, but there
> could be a maintenance command to clean up the overflow... not to
> mention that you could define a trigger to create the new partition once
> you get something in the overflow (how cool would that be if it would
> work ?).

Sure, you could do it that way.  But it will cause problems for people
who want to have a million rows in each of 100 partitions, and another
million rows in the overflow partition.  Now all operations that can
be done on a single partition must scan 2 million rows instead of 1
million, just on the off chance that someone executed a DDL command
and didn't clean up after themselves.

...Robert


Re: Automating Partitions in PostgreSQL - Query on syntax

From
Robert Haas
Date:
On Tue, Apr 21, 2009 at 12:35 PM, steven king <vacuum@quantentunnel.de> wrote:
> why a partition-key cannot be a complex expression?
>
> like this:
>
> SWITCH <expression>
> CASE <key_value> TABLE <table> [IN <table_space>]
> CASE <key_value> TABLE <table> [IN <table_space>]
> CASE <key_value> TABLE <table> [IN <table_space>]
> DEFAULT <table> [IN <table_space>]
>
> that is generic

Rather than SWITCH <expression> CASE <value> ... you probably would
want to reuse the existing PostgreSQL syntax of CASE <expression> WHEN
<value>...

But see my previous email for concerns about the performance of this
approach when the number of partitions is large.

...Robert


Re: Automating Partitions in PostgreSQL - Query on syntax

From
Robert Haas
Date:
>> > The big win that is associated with table partitioning is using
>> > constraint exclusion to avoid unnecessary partitions scans.
>
> there is no reason for not using the 'CASE condition' for constraint exclusion.

There is if you can't rely on the rows to be in the right partition.
Allowing DDL commands to change the partitioning layout without moving
the rows is going cause a lot of problems for you, I think.

I'm also not sure how skillful the constraint exclusion logic is at
proving theorems when CASE statements are involved.  It might be good
to test that before going too far with this approach.

...Robert


Re: Automating Partitions in PostgreSQL - Query on syntax

From
Robert Haas
Date:
On Tue, Apr 21, 2009 at 12:17 PM, steven king <vacuum@quantentunnel.de> wrote:
> the idea is, that this statement does not create a new kind of db-object. this "CREATE PARTITIONS" statement should
bea meta-statement -> "meta" like the serial type.
 

That was my assumption as well.

> with create paritions you can create all defined tables (maybe by inheritin from <table>) at once.
>
> but you're right - why this statement should not be able to define some other table-properties (except columns)?!

I am of the opinion that defining partitions is a sufficiently
heavyweight operation that no one should worry too much about whether
or not each one needs to be created separately.  As Greg Stark pointed
out, it's a lot more interesting to try to figure out how it's
actually going to be implemented.  We can invent pretty syntax after
the fact easily enough.

> please notice ... the statement below was only a draft - to show the intention. the major difference is the usage of
conditions.. this could lead into creation of triggers with a lot of PL/pqSQL code or invocations of built-in
functions.

Sure, I'm just providing my feedback, since, hey, someone posted to
the mailing list and asked for input...

I am not sold on the "CASE" method of defining partitions.  It seems
to me that one of the issues that needs to be tackled is how to select
a partition efficiently, and I don't think CASE is going to be ideal
for that, because it presupposes an iterative model working down from
top to bottom. If there are 1000 partitions and the conditions are
complex, that could start to add up to significant overhead.

I think we should aim to set up range partitioning using a data
structure that is amenable to binary search.  For example, you might
think of keeping a sorted array of length N of values (perhaps a row
type if a composite key is being used) and an array of length N+1 of
partitions.  Now you can use binary search to find the first value in
the list which is greater than the key (if any) and then look up that
index in the second list to figure out where to put the tuple.

Now, you might think that's a bad design... you're welcome to propose
your own.  But I think the design should come first and the syntax
afterward.

> open your mind :)

Not trying to be close-minded...

...Robert


Re: Automating Partitions in PostgreSQL - Query on syntax

From
Greg Stark
Date:
More importantly it makes it a lot harder for the planner to do clever  
things. Currently having to append two tables means losing the  
ordering of the records and having to resort. Even if that's fixed it  
makes it harder to get reasonable estimates for size and distinctness.

Ideally partitioned tables would be completely eliminated from plans  
at plan time whenever possible so that the runtime plan is the same as  
you would have gotten without partitioning.  Where that's not possible  
we should aim to get as close as possible.

-- 
Greg


On 21 Apr 2009, at 18:22, Robert Haas <robertmhaas@gmail.com> wrote:

> On Tue, Apr 21, 2009 at 11:50 AM, Csaba Nagy <nagy@ecircle-ag.com>  
> wrote:
>> On Tue, 2009-04-21 at 11:43 -0400, Robert Haas wrote:
>>> This doesn't sound like a very good idea, because the planner cannot
>>> then rely on the overflow table not containing tuples that ought  
>>> to be
>>> within some other partition.
>>>
>>> The big win that is associated with table partitioning is using
>>> constraint exclusion to avoid unnecessary partitions scans.
>>
>> Well it could always check 2 partitions: the overflow and the one
>> selected by the constraint exclusion. If the overflow is kept empty  
>> by
>> properly setting up the partitions so that all insertions always go  
>> to
>> one of the active partitions, that would be cheap enough too while  
>> still
>> providing a way to catch unexpected data. Then when a new partition  
>> is
>> defined, there's no need to shuffle around data immediately, but  
>> there
>> could be a maintenance command to clean up the overflow... not to
>> mention that you could define a trigger to create the new partition  
>> once
>> you get something in the overflow (how cool would that be if it would
>> work ?).
>
> Sure, you could do it that way.  But it will cause problems for people
> who want to have a million rows in each of 100 partitions, and another
> million rows in the overflow partition.  Now all operations that can
> be done on a single partition must scan 2 million rows instead of 1
> million, just on the off chance that someone executed a DDL command
> and didn't clean up after themselves.
>
> ...Robert


Re: Automating Partitions in PostgreSQL - Query on syntax

From
"steven king"
Date:
-------- Original-Nachricht --------
> Datum: Tue, 21 Apr 2009 13:36:19 -0400
> Von: Robert Haas <robertmhaas@gmail.com>
> An: steven king <vacuum@quantentunnel.de>
> CC: Greg Stark <stark@enterprisedb.com>, pgsql-hackers@postgresql.org, listas@guedesoft.net
> Betreff: Re: [HACKERS] Automating Partitions in PostgreSQL - Query on syntax
> > SWITCH <expression>
> > CASE <key_value> TABLE <table> [IN <table_space>]
> > CASE <key_value> TABLE <table> [IN <table_space>]
> > DEFAULT <table> [IN <table_space>]
> >
> > that is generic
> 
> Rather than SWITCH <expression> CASE <value> ... you probably would
> want to reuse the existing PostgreSQL syntax of CASE <expression> WHEN
> <value>...

I think - at first we've to ask for the problem we have to solve.

The syntax it isnt. If we get confused with CASE of CASE THEN ELSE - we can use other keywords .. forinstance SWITCH
<expression>ON <value> USE ... that should not the problem.
 

You talking about 1000s of partitions - I cant see that this is the major use-case of table partitioning .. Who wants
thousandsof partitions?
 

We simply need a tool to create partitions for common use-cases. Maybe we should provide two or more types of
partitioningstrategies. 
 

1. key-range partitioning
2. constraint exclusion partitioning 
3.? auto-partitioning (for performance issues only)
-- 
Psssst! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger01


Re: Automating Partitions in PostgreSQL - Query on syntax

From
"Dickson S. Guedes"
Date:
Em Ter, 2009-04-21 às 19:33 +0530, Kedar Potdar escreveu:

(...)

> You can have multiple columns as partition key attributes and values
> for these attributes should appear in the order specified.

How would be the behavior if the partition keys are foreing key with ON
UPDATE CASCADE? I'm thinking in the scenario that this is allowed we can
do a update that don't satisfies the actual partition rule but another
in other partition, so the data will be moved?

This make senses?

[]s
Dickson S. Guedes
mail/xmpp: guedes@guedesoft.net - skype: guediz
http://guedesoft.net - http://planeta.postgresql.org.br

Re: Automating Partitions in PostgreSQL - Query on syntax

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> I'm also not sure how skillful the constraint exclusion logic is at
> proving theorems when CASE statements are involved.

It's not at all, and unlikely to become so.  However, I think worrying
about that might be focusing on the wrong thing.  If this patch expects
us to still use theorem proving to handle partition exclusion, it's
going at things in the wrong way from the get-go.  The partitioning
rules should be simple enough that they can easily be applied at runtime
to determine which partition to look in.

Which leads me to the same conclusion: anything as complicated as CASE
is the wrong design.  But perhaps for slightly different reasons.
        regards, tom lane


Re: Automating Partitions in PostgreSQL - Query on syntax

From
Robert Haas
Date:
On Tue, Apr 21, 2009 at 2:29 PM, steven king <vacuum@quantentunnel.de> wrote:
>> Rather than SWITCH <expression> CASE <value> ... you probably would
>> want to reuse the existing PostgreSQL syntax of CASE <expression> WHEN
>> <value>...
>
> I think - at first we've to ask for the problem we have to solve.
>
> The syntax it isnt. If we get confused with CASE of CASE THEN ELSE - we can use other keywords .. forinstance SWITCH
<expression>ON <value> USE ... that should not the problem.
 
>
> You talking about 1000s of partitions - I cant see that this is the major use-case of table partitioning .. Who wants
thousandsof partitions?
 

It's come up in previous discussions on pgsql-hackers.  Besides, even
if you have only 100 partitions, a binary search requires only 8
comparisons whereas a linear search through a list requires 100.  I'd
be hesistant to say that doesn't matter without testing it; I suspect
it WILL matter.

> We simply need a tool to create partitions for common use-cases. Maybe we should provide two or more types of
partitioningstrategies.
 
>
> 1. key-range partitioning
> 2. constraint exclusion partitioning
> 3.? auto-partitioning (for performance issues only)

Yep, probably.  I don't think a "tool to create partitions for common
use-cases" is very interesting.  To really make partitioning work in a
simple, reliable, and efficient fashion, you're going to need to do a
lot more than that.  Of course if you just want to make a simple tool,
that's fine too, but in that case I'd recommend designing it as an
add-on rather than something that modifies the core syntax of
PostgreSQL, because I doubt you're going to have much luck getting a
patch of that type accepted.

...Robert


Re: Automating Partitions in PostgreSQL - Query on syntax

From
Grzegorz Jaskiewicz
Date:
Does sql standard defines it ?
and another question, what about updating existing partitions, with no  
need to drop/recreate ?



Re: Automating Partitions in PostgreSQL - Query on syntax

From
"steven king"
Date:
-------- Original-Nachricht --------
> Datum: Tue, 21 Apr 2009 14:53:54 -0400
> Von: Robert Haas <robertmhaas@gmail.com>
> An: steven king <vacuum@quantentunnel.de>
> CC: listas@guedesoft.net, pgsql-hackers@postgresql.org, stark@enterprisedb.com
> Betreff: Re: [HACKERS] Automating Partitions in PostgreSQL - Query on syntax

> Of course if you just want to make a simple tool,

look your (?) idea for range-partitioning ... that solution is (in principle) very close to "my" constraints.

the difference is, that I added the possability to evaluate "logical ranges" (-> values) before deciding the partition
tobe used.
 

not more ... 


-- 
Psssst! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger01


Re: Automating Partitions in PostgreSQL - Query on syntax

From
Zeugswetter Andreas OSB sIT
Date:
> Which leads me to the same conclusion: anything as complicated as CASE
> is the wrong design.  But perhaps for slightly different reasons.

What I like about the sql CASE is, that it is expression based, and thus
allows full flexibility in partitioning and is highly self documenting.

Do we need to invent special syntax, or could we use common syntax and
detect specific use cases and handle them specially ?

e.g. "when a >= const1 and a < const2 ...; when a >= const2 and a < const3" - check a btree opclass exists for datatype
ofa- prove the partitions don't overlap- prove the btree order of the partitions- ... 

Andreas

Re: Automating Partitions in PostgreSQL - Query on syntax

From
Tom Lane
Date:
Zeugswetter Andreas OSB sIT <Andreas.Zeugswetter@s-itsolutions.at> writes:
>> Which leads me to the same conclusion: anything as complicated as CASE
>> is the wrong design.  But perhaps for slightly different reasons.

> What I like about the sql CASE is, that it is expression based, and thus 
> allows full flexibility in partitioning and is highly self documenting.

> Do we need to invent special syntax, or could we use common syntax and 
> detect specific use cases and handle them specially ?

The problem with that approach is you still need to have an
implementation for the non-specific cases.  What I want is to design
the syntax so that *only* the optimized special cases are possible.
We should not waste time either on implementing the general case or
on constantly re-deducing which special case applies.  That's the
Achilles heel of what we have now (ie, constraint exclusion via theorem
proving) --- it's a beautifully general approach, but it's so general
that it's hard to make any but the simplest cases work efficiently, and
the runtime cost of proving *each time* that a special case applies is
horrid.

The KISS principle applies with a vengeance here.  I think we should
make the partitioning stuff handle only the simplest cases but do those
well.  Anybody who wants something more complex can still try to tackle
it via the existing facilities.
        regards, tom lane


Re: Automating Partitions in PostgreSQL - Query on syntax

From
Simon Riggs
Date:
On Tue, 2009-04-21 at 14:51 -0400, Tom Lane wrote:

> The partitioning
> rules should be simple enough that they can easily be applied at
> runtime to determine which partition to look in.

+1

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Automating Partitions in PostgreSQL - Query on syntax

From
Robert Haas
Date:
On Wed, Apr 22, 2009 at 11:34 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> The KISS principle applies with a vengeance here.  I think we should
> make the partitioning stuff handle only the simplest cases but do those
> well.  Anybody who wants something more complex can still try to tackle
> it via the existing facilities.

+1.

...Robert