Thread: Table partitioning problem

Table partitioning problem

From
Samba GUEYE
Date:
Hello,
I have a problem with table partitioning because i have a foreign key
applied on the partionned table and it throw a constraint violation
error during inserts.
I saw on the manual
(http://www.postgresql.org/docs/8.4/interactive/ddl-inherit.html caveats
section) that it's a limitation due to postgrsql table inheritance
select queries performance are really bad without partitionning and i'm
looking for a workaround to this foreign key problem or another solution
for improve performance for larges tables.

Thanks in advance for your responses


Re: Table partitioning problem

From
Jim Nasby
Date:
On Mar 8, 2011, at 9:45 AM, Samba GUEYE wrote:
> I have a problem with table partitioning because i have a foreign key applied on the partionned table and it throw a
constraintviolation error during inserts. 
> I saw on the manual (http://www.postgresql.org/docs/8.4/interactive/ddl-inherit.html caveats section) that it's a
limitationdue to postgrsql table inheritance select queries performance are really bad without partitionning and i'm
lookingfor a workaround to this foreign key problem or another solution for improve performance for larges tables. 

Actually, this sounds more like having a foreign key pointed at a parent table in an inheritance tree; which flat-out
doesn'tdo what you'd want. 

Can you tell us what the foreign key constraint actually is, and what the inheritance setup for the tables in the FK
is?
--
Jim C. Nasby, Database Architect                   jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net



Re: Table partitioning problem

From
Samba GUEYE
Date:
Hi jim thanks for your answer,

The database model is some' like that :
Measure(Id, numbering,Date, crcCorrect, sensorId) and a SimpleMeasure
(Id, doubleValue) and GenericMeasure (Id, BlobValue, numberOfElements)
and in the UML model SimpleMeasure and GenericMeasure inherits from the
Measure class so in the database, the foreign key of SimpleMeasure and
GenericMeasure points to the Measure Table which is partitionned by sensor.

The measure insertion is successful but problems raise up when inserting
in the simpleMeasure table because it can't find the foreign key
inserted the measure table and do not look at the partitionned tables

ERROR:  insert or update on table "simpleMeasure" violates foreign key constraint "fk_measure_id"
DETAIL:  Key(measure_id)=(1) is not present in table Measure


The inheritance is just used to set the Postgre's partionning and the
limitation of the partitioning comes from here

The same problem is also related in the following post :

http://archives.postgresql.org/pgsql-performance/2008-07/msg00224.php
and this
http://archives.postgresql.org/pgsql-admin/2007-09/msg00031.php

Best Regards


Le 09/03/2011 23:01, Jim Nasby a écrit :
> On Mar 8, 2011, at 9:45 AM, Samba GUEYE wrote:
>> I have a problem with table partitioning because i have a foreign key applied on the partionned table and it throw a
constraintviolation error during inserts. 
>> I saw on the manual (http://www.postgresql.org/docs/8.4/interactive/ddl-inherit.html caveats section) that it's a
limitationdue to postgrsql table inheritance select queries performance are really bad without partitionning and i'm
lookingfor a workaround to this foreign key problem or another solution for improve performance for larges tables. 
> Actually, this sounds more like having a foreign key pointed at a parent table in an inheritance tree; which flat-out
doesn'tdo what you'd want. 
>
> Can you tell us what the foreign key constraint actually is, and what the inheritance setup for the tables in the FK
is?
> --
> Jim C. Nasby, Database Architect                   jim@nasby.net
> 512.569.9461 (cell)                         http://jim.nasby.net
>
>


Re: Table partitioning problem

From
Robert Haas
Date:
On Thu, Mar 10, 2011 at 3:25 AM, Samba GUEYE <samba.gueye@intesens.com> wrote:
> The measure insertion is successful but problems raise up when inserting in
> the simpleMeasure table because it can't find the foreign key inserted the
> measure table and do not look at the partitionned tables

Yes, that's how it works.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Table partitioning problem

From
Samba GUEYE
Date:
Yeah but is there a workaround to force the root table to propagate the
foreign key to the partitionned table
because right now all foreign keys to partitionned table throws
constraints violation and it's a big problem for me
Le 11/03/2011 19:31, Robert Haas a écrit :
> On Thu, Mar 10, 2011 at 3:25 AM, Samba GUEYE<samba.gueye@intesens.com>  wrote:
>> The measure insertion is successful but problems raise up when inserting in
>> the simpleMeasure table because it can't find the foreign key inserted the
>> measure table and do not look at the partitionned tables
> Yes, that's how it works.
>


Re: Table partitioning problem

From
Robert Haas
Date:
On Mon, Mar 14, 2011 at 12:42 PM, Samba GUEYE <samba.gueye@intesens.com> wrote:
> Yeah but is there a workaround to force the root table to propagate the
> foreign key to the partitionned table
> because right now all foreign keys to partitionned table throws constraints
> violation and it's a big problem for me

No.  Generally, table partitioning is not a good idea unless you are
dealing with really large tables, and nearly all of your queries apply
only to a single partition.  Most likely you are better off not using
table inheritance in the first place if you need this feature.

It would be nice if we had a way to do this for the rare cases where
it would be useful, but we don't.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Table partitioning problem

From
Conor Walsh
Date:
On Mon, Mar 14, 2011 at 12:40 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> Generally, table partitioning is not a good idea unless you are
> dealing with really large tables, and nearly all of your queries apply
> only to a single partition.  Most likely you are better off not using
> table inheritance in the first place if you need this feature.

I don't know if my tables count as 'large' or not, but I've gotten
some good mileage in the past out of time-based partitioning and
setting higher compression levels on old tables.  Also the ability to
drop-and-reload a day is sometimes useful, but I grant that it would
be better to never need to do that.

-C.

Re: Table partitioning problem

From
Samba GUEYE
Date:
Alright thanks all of you for your answers, but i've got 3 more questions :
  1. Why "... partitionning is not a good idea ..." like you said Robert and Conor "... I grant that it would be better to never need to do that" ?
  2. Is there another way or strategy to deal with very large tables (over 100 000 000 rows per year in one table)  beyond indexing and partitionning?
  3. If you had to quantify a limit of numbers of rows per table in a single postgresql database server what would you say?
PS: i'm using postgresql since less than 2 month because i thought that partitioning was a possible solution  that doesn't offer me Apache Derby for my large table problem so if these questions sounds "dummy" for you this is a postgresql novice talking to you.

Regards

Le 14/03/2011 20:40, Robert Haas a écrit :
On Mon, Mar 14, 2011 at 12:42 PM, Samba GUEYE <samba.gueye@intesens.com> wrote:
Yeah but is there a workaround to force the root table to propagate the
foreign key to the partitionned table
because right now all foreign keys to partitionned table throws constraints
violation and it's a big problem for me
No.  Generally, table partitioning is not a good idea unless you are
dealing with really large tables, and nearly all of your queries apply
only to a single partition.  Most likely you are better off not using
table inheritance in the first place if you need this feature.

It would be nice if we had a way to do this for the rare cases where
it would be useful, but we don't.


Re: Table partitioning problem

From
Shaun Thomas
Date:
On 03/15/2011 05:10 AM, Samba GUEYE wrote:

> 1. Why "... partitionning is not a good idea ..." like you said
> Robert and Conor "... I grant that it would be better to never need
> to do that" ?

There are a number of difficulties the planner has with partitioned
tables. Only until very recently, MAX and MIN would scan every single
partition, even if you performed the action on the constraint column.
Basically quite a few queries will either not have an ideal execution
plan, or act in unexpected manners unless you physically target the
exact partition you want.

Even though we have several tables over the 50-million rows, I'm
reluctant to partition them because we have a very transaction-intensive
database, and can't risk the possible penalties.

> 2. Is there another way or strategy to deal with very large tables
> (over 100 000 000 rows per year in one table) beyond indexing and
> partitionning?

What you have is a very good candidate for partitioning... if you can
effectively guarantee a column to partition the data on. If you're
getting 100M rows per year, I could easily see some kind of created_date
column and then having one partition per month.

One of the things we hate most about very large tables is the amount of
time necessary to vacuum or index them. CPU and disk IO can only go so
fast, so eventually you encounter a point where it can take hours to
index a single column. If you let your table get too big, your
maintenance costs will be prohibitive, and partitioning may be required
at that point.

As an example, we have a table that was over 100M rows and we have
enough memory that the entire table was in system cache. Even so,
rebuilding the indexes on that table required about an hour and ten
minutes *per index*. We knew this would happen and ran the reindex in
parallel, which we confirmed by watching five of our CPUs sit at 99%
utilization for the whole interval.

That wouldn't have happened if the table were partitioned.

> 3. If you had to quantify a limit of numbers of rows per table in a
> single postgresql database server what would you say?

I'd personally avoid having any tables over 10-million rows. We have
quad Xeon E7450's, tons of ram, and even NVRAM PCI cards to reduce IO
contention, and still, large tables are a nuisance. Even the best CPU
will balk at processing 10-million rows quickly.

And yes. Good queries and design will help. Always limiting result sets
will help. Efficient, highly selective indexes will help. But
maintenance grows linearly, despite our best efforts. The only way to
sidestep that issue is to partition tables or rewrite your application
to scale horizontally via data sharding or some other shared-nothing
cluster with plProxy, GridSQL or PGPool.

You'll have this problem with any modern database. Big tables are a pain
in everybody's asses.

It's too bad PostgreSQL can't assign one thread per data-file and merge
the results.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
sthomas@peak6.com

______________________________________________

See  http://www.peak6.com/email_disclaimer.php
for terms and conditions related to this email

Re: Table partitioning problem

From
Samba GUEYE
Date:
hi

Thanks again very much for these clear-cut answers

I think i'll try to implement the partitionning despite all the
difficulties you raise about it in this thread
because i can't find any viable solution right now for this situation.
It will constrain me to change the datamodel to workaround the
inheritance foreigh key issue but i will at least test it because we
have limited resources and can't afford to have many servers or whatever
to boost performances...

Best Regards


Le 15/03/2011 14:18, Shaun Thomas a écrit :
> On 03/15/2011 05:10 AM, Samba GUEYE wrote:
>
>> 1. Why "... partitionning is not a good idea ..." like you said
>> Robert and Conor "... I grant that it would be better to never need
>> to do that" ?
>
> There are a number of difficulties the planner has with partitioned
> tables. Only until very recently, MAX and MIN would scan every single
> partition, even if you performed the action on the constraint column.
> Basically quite a few queries will either not have an ideal execution
> plan, or act in unexpected manners unless you physically target the
> exact partition you want.
>
> Even though we have several tables over the 50-million rows, I'm
> reluctant to partition them because we have a very
> transaction-intensive database, and can't risk the possible penalties.
>
>> 2. Is there another way or strategy to deal with very large tables
>> (over 100 000 000 rows per year in one table) beyond indexing and
>> partitionning?
>
> What you have is a very good candidate for partitioning... if you can
> effectively guarantee a column to partition the data on. If you're
> getting 100M rows per year, I could easily see some kind of
> created_date column and then having one partition per month.
>
> One of the things we hate most about very large tables is the amount
> of time necessary to vacuum or index them. CPU and disk IO can only go
> so fast, so eventually you encounter a point where it can take hours
> to index a single column. If you let your table get too big, your
> maintenance costs will be prohibitive, and partitioning may be
> required at that point.
>
> As an example, we have a table that was over 100M rows and we have
> enough memory that the entire table was in system cache. Even so,
> rebuilding the indexes on that table required about an hour and ten
> minutes *per index*. We knew this would happen and ran the reindex in
> parallel, which we confirmed by watching five of our CPUs sit at 99%
> utilization for the whole interval.
>
> That wouldn't have happened if the table were partitioned.
>
>> 3. If you had to quantify a limit of numbers of rows per table in a
>> single postgresql database server what would you say?
>
> I'd personally avoid having any tables over 10-million rows. We have
> quad Xeon E7450's, tons of ram, and even NVRAM PCI cards to reduce IO
> contention, and still, large tables are a nuisance. Even the best CPU
> will balk at processing 10-million rows quickly.
>
> And yes. Good queries and design will help. Always limiting result
> sets will help. Efficient, highly selective indexes will help. But
> maintenance grows linearly, despite our best efforts. The only way to
> sidestep that issue is to partition tables or rewrite your application
> to scale horizontally via data sharding or some other shared-nothing
> cluster with plProxy, GridSQL or PGPool.
>
> You'll have this problem with any modern database. Big tables are a
> pain in everybody's asses.
>
> It's too bad PostgreSQL can't assign one thread per data-file and
> merge the results.
>