Thread: Table partitioning problem
Hello, I have a problem with table partitioning because i have a foreign key applied on the partionned table and it throw a constraint violation error during inserts. I saw on the manual (http://www.postgresql.org/docs/8.4/interactive/ddl-inherit.html caveats section) that it's a limitation due to postgrsql table inheritance select queries performance are really bad without partitionning and i'm looking for a workaround to this foreign key problem or another solution for improve performance for larges tables. Thanks in advance for your responses
On Mar 8, 2011, at 9:45 AM, Samba GUEYE wrote: > I have a problem with table partitioning because i have a foreign key applied on the partionned table and it throw a constraintviolation error during inserts. > I saw on the manual (http://www.postgresql.org/docs/8.4/interactive/ddl-inherit.html caveats section) that it's a limitationdue to postgrsql table inheritance select queries performance are really bad without partitionning and i'm lookingfor a workaround to this foreign key problem or another solution for improve performance for larges tables. Actually, this sounds more like having a foreign key pointed at a parent table in an inheritance tree; which flat-out doesn'tdo what you'd want. Can you tell us what the foreign key constraint actually is, and what the inheritance setup for the tables in the FK is? -- Jim C. Nasby, Database Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
Hi jim thanks for your answer, The database model is some' like that : Measure(Id, numbering,Date, crcCorrect, sensorId) and a SimpleMeasure (Id, doubleValue) and GenericMeasure (Id, BlobValue, numberOfElements) and in the UML model SimpleMeasure and GenericMeasure inherits from the Measure class so in the database, the foreign key of SimpleMeasure and GenericMeasure points to the Measure Table which is partitionned by sensor. The measure insertion is successful but problems raise up when inserting in the simpleMeasure table because it can't find the foreign key inserted the measure table and do not look at the partitionned tables ERROR: insert or update on table "simpleMeasure" violates foreign key constraint "fk_measure_id" DETAIL: Key(measure_id)=(1) is not present in table Measure The inheritance is just used to set the Postgre's partionning and the limitation of the partitioning comes from here The same problem is also related in the following post : http://archives.postgresql.org/pgsql-performance/2008-07/msg00224.php and this http://archives.postgresql.org/pgsql-admin/2007-09/msg00031.php Best Regards Le 09/03/2011 23:01, Jim Nasby a écrit : > On Mar 8, 2011, at 9:45 AM, Samba GUEYE wrote: >> I have a problem with table partitioning because i have a foreign key applied on the partionned table and it throw a constraintviolation error during inserts. >> I saw on the manual (http://www.postgresql.org/docs/8.4/interactive/ddl-inherit.html caveats section) that it's a limitationdue to postgrsql table inheritance select queries performance are really bad without partitionning and i'm lookingfor a workaround to this foreign key problem or another solution for improve performance for larges tables. > Actually, this sounds more like having a foreign key pointed at a parent table in an inheritance tree; which flat-out doesn'tdo what you'd want. > > Can you tell us what the foreign key constraint actually is, and what the inheritance setup for the tables in the FK is? > -- > Jim C. Nasby, Database Architect jim@nasby.net > 512.569.9461 (cell) http://jim.nasby.net > >
On Thu, Mar 10, 2011 at 3:25 AM, Samba GUEYE <samba.gueye@intesens.com> wrote: > The measure insertion is successful but problems raise up when inserting in > the simpleMeasure table because it can't find the foreign key inserted the > measure table and do not look at the partitionned tables Yes, that's how it works. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Yeah but is there a workaround to force the root table to propagate the foreign key to the partitionned table because right now all foreign keys to partitionned table throws constraints violation and it's a big problem for me Le 11/03/2011 19:31, Robert Haas a écrit : > On Thu, Mar 10, 2011 at 3:25 AM, Samba GUEYE<samba.gueye@intesens.com> wrote: >> The measure insertion is successful but problems raise up when inserting in >> the simpleMeasure table because it can't find the foreign key inserted the >> measure table and do not look at the partitionned tables > Yes, that's how it works. >
On Mon, Mar 14, 2011 at 12:42 PM, Samba GUEYE <samba.gueye@intesens.com> wrote: > Yeah but is there a workaround to force the root table to propagate the > foreign key to the partitionned table > because right now all foreign keys to partitionned table throws constraints > violation and it's a big problem for me No. Generally, table partitioning is not a good idea unless you are dealing with really large tables, and nearly all of your queries apply only to a single partition. Most likely you are better off not using table inheritance in the first place if you need this feature. It would be nice if we had a way to do this for the rare cases where it would be useful, but we don't. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Mar 14, 2011 at 12:40 PM, Robert Haas <robertmhaas@gmail.com> wrote: > Generally, table partitioning is not a good idea unless you are > dealing with really large tables, and nearly all of your queries apply > only to a single partition. Most likely you are better off not using > table inheritance in the first place if you need this feature. I don't know if my tables count as 'large' or not, but I've gotten some good mileage in the past out of time-based partitioning and setting higher compression levels on old tables. Also the ability to drop-and-reload a day is sometimes useful, but I grant that it would be better to never need to do that. -C.
Alright thanks all of you for your answers, but i've got 3 more questions :
Regards
Le 14/03/2011 20:40, Robert Haas a écrit :
- Why "... partitionning is not a good idea ..." like you said Robert and Conor "... I grant that it would be better to never need to do that" ?
- Is there another way or strategy to deal with very large tables (over 100 000 000 rows per year in one table) beyond indexing and partitionning?
- If you had to quantify a limit of numbers of rows per table in a single postgresql database server what would you say?
Regards
Le 14/03/2011 20:40, Robert Haas a écrit :
On Mon, Mar 14, 2011 at 12:42 PM, Samba GUEYE <samba.gueye@intesens.com> wrote:Yeah but is there a workaround to force the root table to propagate the foreign key to the partitionned table because right now all foreign keys to partitionned table throws constraints violation and it's a big problem for meNo. Generally, table partitioning is not a good idea unless you are dealing with really large tables, and nearly all of your queries apply only to a single partition. Most likely you are better off not using table inheritance in the first place if you need this feature. It would be nice if we had a way to do this for the rare cases where it would be useful, but we don't.
On 03/15/2011 05:10 AM, Samba GUEYE wrote: > 1. Why "... partitionning is not a good idea ..." like you said > Robert and Conor "... I grant that it would be better to never need > to do that" ? There are a number of difficulties the planner has with partitioned tables. Only until very recently, MAX and MIN would scan every single partition, even if you performed the action on the constraint column. Basically quite a few queries will either not have an ideal execution plan, or act in unexpected manners unless you physically target the exact partition you want. Even though we have several tables over the 50-million rows, I'm reluctant to partition them because we have a very transaction-intensive database, and can't risk the possible penalties. > 2. Is there another way or strategy to deal with very large tables > (over 100 000 000 rows per year in one table) beyond indexing and > partitionning? What you have is a very good candidate for partitioning... if you can effectively guarantee a column to partition the data on. If you're getting 100M rows per year, I could easily see some kind of created_date column and then having one partition per month. One of the things we hate most about very large tables is the amount of time necessary to vacuum or index them. CPU and disk IO can only go so fast, so eventually you encounter a point where it can take hours to index a single column. If you let your table get too big, your maintenance costs will be prohibitive, and partitioning may be required at that point. As an example, we have a table that was over 100M rows and we have enough memory that the entire table was in system cache. Even so, rebuilding the indexes on that table required about an hour and ten minutes *per index*. We knew this would happen and ran the reindex in parallel, which we confirmed by watching five of our CPUs sit at 99% utilization for the whole interval. That wouldn't have happened if the table were partitioned. > 3. If you had to quantify a limit of numbers of rows per table in a > single postgresql database server what would you say? I'd personally avoid having any tables over 10-million rows. We have quad Xeon E7450's, tons of ram, and even NVRAM PCI cards to reduce IO contention, and still, large tables are a nuisance. Even the best CPU will balk at processing 10-million rows quickly. And yes. Good queries and design will help. Always limiting result sets will help. Efficient, highly selective indexes will help. But maintenance grows linearly, despite our best efforts. The only way to sidestep that issue is to partition tables or rewrite your application to scale horizontally via data sharding or some other shared-nothing cluster with plProxy, GridSQL or PGPool. You'll have this problem with any modern database. Big tables are a pain in everybody's asses. It's too bad PostgreSQL can't assign one thread per data-file and merge the results. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604 312-676-8870 sthomas@peak6.com ______________________________________________ See http://www.peak6.com/email_disclaimer.php for terms and conditions related to this email
hi Thanks again very much for these clear-cut answers I think i'll try to implement the partitionning despite all the difficulties you raise about it in this thread because i can't find any viable solution right now for this situation. It will constrain me to change the datamodel to workaround the inheritance foreigh key issue but i will at least test it because we have limited resources and can't afford to have many servers or whatever to boost performances... Best Regards Le 15/03/2011 14:18, Shaun Thomas a écrit : > On 03/15/2011 05:10 AM, Samba GUEYE wrote: > >> 1. Why "... partitionning is not a good idea ..." like you said >> Robert and Conor "... I grant that it would be better to never need >> to do that" ? > > There are a number of difficulties the planner has with partitioned > tables. Only until very recently, MAX and MIN would scan every single > partition, even if you performed the action on the constraint column. > Basically quite a few queries will either not have an ideal execution > plan, or act in unexpected manners unless you physically target the > exact partition you want. > > Even though we have several tables over the 50-million rows, I'm > reluctant to partition them because we have a very > transaction-intensive database, and can't risk the possible penalties. > >> 2. Is there another way or strategy to deal with very large tables >> (over 100 000 000 rows per year in one table) beyond indexing and >> partitionning? > > What you have is a very good candidate for partitioning... if you can > effectively guarantee a column to partition the data on. If you're > getting 100M rows per year, I could easily see some kind of > created_date column and then having one partition per month. > > One of the things we hate most about very large tables is the amount > of time necessary to vacuum or index them. CPU and disk IO can only go > so fast, so eventually you encounter a point where it can take hours > to index a single column. If you let your table get too big, your > maintenance costs will be prohibitive, and partitioning may be > required at that point. > > As an example, we have a table that was over 100M rows and we have > enough memory that the entire table was in system cache. Even so, > rebuilding the indexes on that table required about an hour and ten > minutes *per index*. We knew this would happen and ran the reindex in > parallel, which we confirmed by watching five of our CPUs sit at 99% > utilization for the whole interval. > > That wouldn't have happened if the table were partitioned. > >> 3. If you had to quantify a limit of numbers of rows per table in a >> single postgresql database server what would you say? > > I'd personally avoid having any tables over 10-million rows. We have > quad Xeon E7450's, tons of ram, and even NVRAM PCI cards to reduce IO > contention, and still, large tables are a nuisance. Even the best CPU > will balk at processing 10-million rows quickly. > > And yes. Good queries and design will help. Always limiting result > sets will help. Efficient, highly selective indexes will help. But > maintenance grows linearly, despite our best efforts. The only way to > sidestep that issue is to partition tables or rewrite your application > to scale horizontally via data sharding or some other shared-nothing > cluster with plProxy, GridSQL or PGPool. > > You'll have this problem with any modern database. Big tables are a > pain in everybody's asses. > > It's too bad PostgreSQL can't assign one thread per data-file and > merge the results. >