Re: Table partitioning problem - Mailing list pgsql-performance

From Samba GUEYE
Subject Re: Table partitioning problem
Date
Msg-id 4D7F8392.5060306@intesens.com
Whole thread Raw
In response to Re: Table partitioning problem  (Shaun Thomas <sthomas@peak6.com>)
List pgsql-performance
hi

Thanks again very much for these clear-cut answers

I think i'll try to implement the partitionning despite all the
difficulties you raise about it in this thread
because i can't find any viable solution right now for this situation.
It will constrain me to change the datamodel to workaround the
inheritance foreigh key issue but i will at least test it because we
have limited resources and can't afford to have many servers or whatever
to boost performances...

Best Regards


Le 15/03/2011 14:18, Shaun Thomas a écrit :
> On 03/15/2011 05:10 AM, Samba GUEYE wrote:
>
>> 1. Why "... partitionning is not a good idea ..." like you said
>> Robert and Conor "... I grant that it would be better to never need
>> to do that" ?
>
> There are a number of difficulties the planner has with partitioned
> tables. Only until very recently, MAX and MIN would scan every single
> partition, even if you performed the action on the constraint column.
> Basically quite a few queries will either not have an ideal execution
> plan, or act in unexpected manners unless you physically target the
> exact partition you want.
>
> Even though we have several tables over the 50-million rows, I'm
> reluctant to partition them because we have a very
> transaction-intensive database, and can't risk the possible penalties.
>
>> 2. Is there another way or strategy to deal with very large tables
>> (over 100 000 000 rows per year in one table) beyond indexing and
>> partitionning?
>
> What you have is a very good candidate for partitioning... if you can
> effectively guarantee a column to partition the data on. If you're
> getting 100M rows per year, I could easily see some kind of
> created_date column and then having one partition per month.
>
> One of the things we hate most about very large tables is the amount
> of time necessary to vacuum or index them. CPU and disk IO can only go
> so fast, so eventually you encounter a point where it can take hours
> to index a single column. If you let your table get too big, your
> maintenance costs will be prohibitive, and partitioning may be
> required at that point.
>
> As an example, we have a table that was over 100M rows and we have
> enough memory that the entire table was in system cache. Even so,
> rebuilding the indexes on that table required about an hour and ten
> minutes *per index*. We knew this would happen and ran the reindex in
> parallel, which we confirmed by watching five of our CPUs sit at 99%
> utilization for the whole interval.
>
> That wouldn't have happened if the table were partitioned.
>
>> 3. If you had to quantify a limit of numbers of rows per table in a
>> single postgresql database server what would you say?
>
> I'd personally avoid having any tables over 10-million rows. We have
> quad Xeon E7450's, tons of ram, and even NVRAM PCI cards to reduce IO
> contention, and still, large tables are a nuisance. Even the best CPU
> will balk at processing 10-million rows quickly.
>
> And yes. Good queries and design will help. Always limiting result
> sets will help. Efficient, highly selective indexes will help. But
> maintenance grows linearly, despite our best efforts. The only way to
> sidestep that issue is to partition tables or rewrite your application
> to scale horizontally via data sharding or some other shared-nothing
> cluster with plProxy, GridSQL or PGPool.
>
> You'll have this problem with any modern database. Big tables are a
> pain in everybody's asses.
>
> It's too bad PostgreSQL can't assign one thread per data-file and
> merge the results.
>


pgsql-performance by date:

Previous
From: Shaun Thomas
Date:
Subject: Re: Table partitioning problem
Next
From: Uwe Bartels
Date:
Subject: big distinct clause vs. group by