Thread: about partitioning

about partitioning

From
Joao Ferreira gmail
Date:
Hello all,

my application is coming to a point on which 'partitioning' seems to be
the solution for many problems:

- query speed up
- data elimination speed up



I'dd like to get the feeling of it by talking to people who use
partitioning, in general..

- good, bad,
- hard to manage, easy to manage,
- processing over-head during INSERT/UPDATE,
- stability/compatibility of pg_dump and restore operations,
- how many partitions would be reasonable for read _and_ write  access
optimal speed;

thx
joao



Re: about partitioning

From
Robert Treat
Date:
On Thursday 11 September 2008 07:47:00 Joao Ferreira gmail wrote:
> Hello all,
>
> my application is coming to a point on which 'partitioning' seems to be
> the solution for many problems:
>
> - query speed up
> - data elimination speed up
>
> I'dd like to get the feeling of it by talking to people who use
> partitioning, in general..
>
> - good, bad,

good :-)

> - hard to manage, easy to manage,

I think the upfront costs for managing a partitioning setup are higher with
postgres than other systems, but there is nothing that you shouldn't be able
to automate in a cron script (at which point management becomes easy), plus
postgres gives you some interesting flexibility that is harder to find in
other setups.

> - processing over-head during INSERT/UPDATE,

you can setup inserts to have relativly little overhead, but it requires more
management/maintence work up front. Updates within a partition also have
relativly little extra overhead, especially if you put in a little
application logic to figure out how to work on a partition directly. Updates
where you are changing the partition key value are always more problematic
though.

> - stability/compatibility of pg_dump and restore operations,

no real issues here as long as your on recent enough versions to do wildcard
table matching for individual tables.

> - how many partitions would be reasonable for read _and_ write  access
> optimal speed;
>

again, this depends on how exactly your working on the data. For example, we
have tables with over a thousand partitions on them; in those scenarios all
data is written into a single partition (with a new partition created daily),
and the qeury patterns are really straightforward... last month gets a lot of
queries, lasat three months not so much, last year barely any, and beyond
that is pretty much just archive info. That said, we have other systems where
that wouldnt work at all (for example, a static number of partitions, all of
which are queried activly).

For some more info, I've given at least one presentation on the topic, which
seems to be missing from the omniti site, but I've uploaded it to
slideshare...
http://www.slideshare.net/xzilla/postgresql-partitioning-pgcon-2007-presentation

HTH.

--
Robert Treat
http://www.omniti.com
Database: Scalability: Consulting:

Re: about partitioning

From
Blazej
Date:
8><
> For some more info, I've given at least one presentation on the topic, which
> seems to be missing from the omniti site, but I've uploaded it to
> slideshare...
> http://www.slideshare.net/xzilla/postgresql-partitioning-pgcon-2007-presentation
>
> HTH.
>
8><

Very nice presentation. I have 2 additional questions:

(1) Are you testing INSERT/UPDATE performance when is only one rule
script for all partition tables?
(2) Are you using "SET constraint_exclusion = on;"?

Regards,
Blazej

Re: about partitioning

From
Joao Ferreira gmail
Date:
hei,

thanks everybody for this great feedback.

I'll sure look deeper into this.

Joao


On Sat, 2008-09-13 at 16:48 -0400, Robert Treat wrote:
> On Thursday 11 September 2008 07:47:00 Joao Ferreira gmail wrote:
> > Hello all,
> >
> > my application is coming to a point on which 'partitioning' seems to be
> > the solution for many problems:
> >
> > - query speed up
> > - data elimination speed up
> >
> > I'dd like to get the feeling of it by talking to people who use
> > partitioning, in general..
> >
> > - good, bad,
>
> good :-)
>
> > - hard to manage, easy to manage,
>
> I think the upfront costs for managing a partitioning setup are higher with
> postgres than other systems, but there is nothing that you shouldn't be able
> to automate in a cron script (at which point management becomes easy), plus
> postgres gives you some interesting flexibility that is harder to find in
> other setups.
>
> > - processing over-head during INSERT/UPDATE,
>
> you can setup inserts to have relativly little overhead, but it requires more
> management/maintence work up front. Updates within a partition also have
> relativly little extra overhead, especially if you put in a little
> application logic to figure out how to work on a partition directly. Updates
> where you are changing the partition key value are always more problematic
> though.
>
> > - stability/compatibility of pg_dump and restore operations,
>
> no real issues here as long as your on recent enough versions to do wildcard
> table matching for individual tables.
>
> > - how many partitions would be reasonable for read _and_ write  access
> > optimal speed;
> >
>
> again, this depends on how exactly your working on the data. For example, we
> have tables with over a thousand partitions on them; in those scenarios all
> data is written into a single partition (with a new partition created daily),
> and the qeury patterns are really straightforward... last month gets a lot of
> queries, lasat three months not so much, last year barely any, and beyond
> that is pretty much just archive info. That said, we have other systems where
> that wouldnt work at all (for example, a static number of partitions, all of
> which are queried activly).
>
> For some more info, I've given at least one presentation on the topic, which
> seems to be missing from the omniti site, but I've uploaded it to
> slideshare...
> http://www.slideshare.net/xzilla/postgresql-partitioning-pgcon-2007-presentation
>
> HTH.
>


Re: about partitioning

From
Robert Treat
Date:
Blazej wrote:
> 8><
>> For some more info, I've given at least one presentation on the topic, which
>> seems to be missing from the omniti site, but I've uploaded it to
>> slideshare...
>> http://www.slideshare.net/xzilla/postgresql-partitioning-pgcon-2007-presentation
>>
>> HTH.
>>
> 8><
>
> Very nice presentation. I have 2 additional questions:
>
> (1) Are you testing INSERT/UPDATE performance when is only one rule
> script for all partition tables?

I've tested a number of different scenarios; I guess I'd say as a
general rule, the more rules you have, likely the worse your performance
will get; but really I would use triggers anyway

> (2) Are you using "SET constraint_exclusion = on;"?
>

yes, otherwise there isnt much point :-)

--
Robert Treat
http://www.omniti.com/
Database: Scalability: Consulting:

Re: about partitioning

From
Joao Ferreira gmail
Date:
Hi Robert and all,


I've been trying to follow the examples on the pg docs, section 5.9,
with your presentation as aditional guide.

I've bumped into the UPDATES. I'm getting this:

----------------------------------------------------------
CREATE RULE measurement_update_y2004m02 AS ON UPDATE TO measurement
WHERE ( logdate >= DATE '2004-02-01' AND logdate < DATE '2004-03-01' )
DO INSTEAD UPDATE INTO measurement_y2004m02 VALUES (NEW.city_id,
NEW.logdate, NEW.peaktemp, NEW.unitsales);
ERROR:  syntax error at or near "INTO"
LINE 1: ...D logdate < DATE '2004-03-01' ) DO INSTEAD UPDATE INTO
measu...
-------------------------------------------------

it doesn't like the INTO part.

can you help me?

thanks
joao



On Sat, 2008-09-13 at 16:48 -0400, Robert Treat wrote:
> On Thursday 11 September 2008 07:47:00 Joao Ferreira gmail wrote:



Re: about partitioning

From
kevin kempter
Date:
I may be wrong and I didn't double check the docs but I think you need
this (leave the INTO keyword off):


> CREATE RULE measurement_update_y2004m02 AS ON UPDATE TO measurement
> WHERE ( logdate >= DATE '2004-02-01' AND logdate < DATE '2004-03-01' )
> DO INSTEAD UPDATE measurement_y2004m02 VALUES (NEW.city_id,
> NEW.logdate, NEW.peaktemp, NEW.unitsales)






On Sep 15, 2008, at 4:48 PM, Joao Ferreira gmail wrote:

> Hi Robert and all,
>
>
> I've been trying to follow the examples on the pg docs, section 5.9,
> with your presentation as aditional guide.
>
> I've bumped into the UPDATES. I'm getting this:
>
> ----------------------------------------------------------
> CREATE RULE measurement_update_y2004m02 AS ON UPDATE TO measurement
> WHERE ( logdate >= DATE '2004-02-01' AND logdate < DATE '2004-03-01' )
> DO INSTEAD UPDATE INTO measurement_y2004m02 VALUES (NEW.city_id,
> NEW.logdate, NEW.peaktemp, NEW.unitsales);
> ERROR:  syntax error at or near "INTO"
> LINE 1: ...D logdate < DATE '2004-03-01' ) DO INSTEAD UPDATE INTO
> measu...
> -------------------------------------------------
>
> it doesn't like the INTO part.
>
> can you help me?
>
> thanks
> joao
>
>
>
> On Sat, 2008-09-13 at 16:48 -0400, Robert Treat wrote:
>> On Thursday 11 September 2008 07:47:00 Joao Ferreira gmail wrote:
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general


Re: about partitioning

From
David Fetter
Date:
On Mon, Sep 15, 2008 at 11:48:49PM +0100, Joao Ferreira gmail wrote:
> Hi Robert and all,
>
>
> I've been trying to follow the examples on the pg docs, section 5.9,
> with your presentation as aditional guide.
>
> I've bumped into the UPDATES. I'm getting this:
>
> ----------------------------------------------------------
> CREATE RULE measurement_update_y2004m02 AS ON UPDATE TO measurement
> WHERE ( logdate >= DATE '2004-02-01' AND logdate < DATE '2004-03-01' )
> DO INSTEAD UPDATE INTO measurement_y2004m02 VALUES (NEW.city_id,
> NEW.logdate, NEW.peaktemp, NEW.unitsales);
> ERROR:  syntax error at or near "INTO"
> LINE 1: ...D logdate < DATE '2004-03-01' ) DO INSTEAD UPDATE INTO
> measu...
> -------------------------------------------------
>
> it doesn't like the INTO part.
>
> can you help me?
>
> thanks
> joao

João,

Rules are not how you want to do this anyhow.  Write triggers :)

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: about partitioning

From
Simon Riggs
Date:
On Mon, 2008-09-15 at 10:05 -0400, Robert Treat wrote:
> Blazej wrote:
> > (1) Are you testing INSERT/UPDATE performance when is only one rule
> > script for all partition tables?
>
> I've tested a number of different scenarios; I guess I'd say as a
> general rule, the more rules you have, likely the worse your performance
> will get; but really I would use triggers anyway

I would try to load data directly into a partition if possible. If this
is possible, you get to skip worrying about all that stuff completely
and performance is much better. You can't COPY into a table using RULEs,
and if you've got enough data to use partitioning then you will want to
use COPY to load data. Choosing a partition key that is optimal for
loading and queries will help you.

> > (2) Are you using "SET constraint_exclusion = on;"?
> >
>
> yes, otherwise there isnt much point :-)

I think Blazej means "in the postgresql.conf".

Probably yes, but not in all cases. If you have queries that access the
partitioned table and yet don't benefit from constraint exclusion, then
you will be adding time and CPU for no benefit. If you see it CPU bound
then its worth looking at in more detail.

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


Re: about partitioning

From
Joao Ferreira gmail
Date:
On Mon, 2008-09-15 at 17:30 -0700, David Fetter wrote:

> João,
>
> Rules are not how you want to do this anyhow.  Write triggers :)
>

yes. you're right. I was just trying to follow the tutorial to get a
better grasp of the whole scenario.

Triggers seem to be a better approach, in my case.

João

PS: thanks for the tilde ~   :)  I keep forgetting

> Cheers,
> David.


Re: about partitioning

From
Joao Ferreira gmail
Date:
Hello all,


still with partitioning...

wheter I use rules or triggers is there a way for me _not_ to specify
field-by-field all the fields I wish to be redirected to the
child-table...


as example:

instead of this:
---------------------------------------------------------
create rule insert_t_1 AS ON INSERT TO teste WHERE (t>=1000 and t<2000)
DO INSTEAD INSERT INTO t_1 VALUES (NEW.t, NEW.s1, NEW.s2 NEW.s3, NEW.s4
NEW.s5, NEW.s6, NEW.s7, NEW.s8);
---------------------------------------------------------

something like this:
---------------------------------------------------------
create rule insert_t_1 AS ON INSERT TO teste WHERE (t>=1000 and t<2000)
DO INSTEAD INSERT INTO t_1 VALUES (__ALL__);
---------------------------------------------------------

of course this assumes that the child table inherits all fields from the
parent table _and_ has no extra fields which is exactly my case.

any hints.

thx
j



On Sat, 2008-09-13 at 16:48 -0400, Robert Treat wrote:
> On Thursday 11 September 2008 07:47:00 Joao Ferreira gmail wrote:
> > Hello all,
> >
> > my application is coming to a point on which 'partitioning' seems to be
> > the solution for many problems:
> >
> > - query speed up
> > - data elimination speed up
> >
> > I'dd like to get the feeling of it by talking to people who use
> > partitioning, in general..
> >
> > - good, bad,
>
> good :-)
>
> > - hard to manage, easy to manage,
>
> I think the upfront costs for managing a partitioning setup are higher with
> postgres than other systems, but there is nothing that you shouldn't be able
> to automate in a cron script (at which point management becomes easy), plus
> postgres gives you some interesting flexibility that is harder to find in
> other setups.
>
> > - processing over-head during INSERT/UPDATE,
>
> you can setup inserts to have relativly little overhead, but it requires more
> management/maintence work up front. Updates within a partition also have
> relativly little extra overhead, especially if you put in a little
> application logic to figure out how to work on a partition directly. Updates
> where you are changing the partition key value are always more problematic
> though.
>
> > - stability/compatibility of pg_dump and restore operations,
>
> no real issues here as long as your on recent enough versions to do wildcard
> table matching for individual tables.
>
> > - how many partitions would be reasonable for read _and_ write  access
> > optimal speed;
> >
>
> again, this depends on how exactly your working on the data. For example, we
> have tables with over a thousand partitions on them; in those scenarios all
> data is written into a single partition (with a new partition created daily),
> and the qeury patterns are really straightforward... last month gets a lot of
> queries, lasat three months not so much, last year barely any, and beyond
> that is pretty much just archive info. That said, we have other systems where
> that wouldnt work at all (for example, a static number of partitions, all of
> which are queried activly).
>
> For some more info, I've given at least one presentation on the topic, which
> seems to be missing from the omniti site, but I've uploaded it to
> slideshare...
> http://www.slideshare.net/xzilla/postgresql-partitioning-pgcon-2007-presentation
>
> HTH.
>