Re: Table partitioning problem - Mailing list pgsql-performance

From Shaun Thomas
Subject Re: Table partitioning problem
Date
Msg-id 4D7F6735.2050904@peak6.com
Whole thread Raw
In response to Re: Table partitioning problem  (Samba GUEYE <samba.gueye@intesens.com>)
Responses Re: Table partitioning problem
List pgsql-performance
On 03/15/2011 05:10 AM, Samba GUEYE wrote:

> 1. Why "... partitionning is not a good idea ..." like you said
> Robert and Conor "... I grant that it would be better to never need
> to do that" ?

There are a number of difficulties the planner has with partitioned
tables. Only until very recently, MAX and MIN would scan every single
partition, even if you performed the action on the constraint column.
Basically quite a few queries will either not have an ideal execution
plan, or act in unexpected manners unless you physically target the
exact partition you want.

Even though we have several tables over the 50-million rows, I'm
reluctant to partition them because we have a very transaction-intensive
database, and can't risk the possible penalties.

> 2. Is there another way or strategy to deal with very large tables
> (over 100 000 000 rows per year in one table) beyond indexing and
> partitionning?

What you have is a very good candidate for partitioning... if you can
effectively guarantee a column to partition the data on. If you're
getting 100M rows per year, I could easily see some kind of created_date
column and then having one partition per month.

One of the things we hate most about very large tables is the amount of
time necessary to vacuum or index them. CPU and disk IO can only go so
fast, so eventually you encounter a point where it can take hours to
index a single column. If you let your table get too big, your
maintenance costs will be prohibitive, and partitioning may be required
at that point.

As an example, we have a table that was over 100M rows and we have
enough memory that the entire table was in system cache. Even so,
rebuilding the indexes on that table required about an hour and ten
minutes *per index*. We knew this would happen and ran the reindex in
parallel, which we confirmed by watching five of our CPUs sit at 99%
utilization for the whole interval.

That wouldn't have happened if the table were partitioned.

> 3. If you had to quantify a limit of numbers of rows per table in a
> single postgresql database server what would you say?

I'd personally avoid having any tables over 10-million rows. We have
quad Xeon E7450's, tons of ram, and even NVRAM PCI cards to reduce IO
contention, and still, large tables are a nuisance. Even the best CPU
will balk at processing 10-million rows quickly.

And yes. Good queries and design will help. Always limiting result sets
will help. Efficient, highly selective indexes will help. But
maintenance grows linearly, despite our best efforts. The only way to
sidestep that issue is to partition tables or rewrite your application
to scale horizontally via data sharding or some other shared-nothing
cluster with plProxy, GridSQL or PGPool.

You'll have this problem with any modern database. Big tables are a pain
in everybody's asses.

It's too bad PostgreSQL can't assign one thread per data-file and merge
the results.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
sthomas@peak6.com

______________________________________________

See  http://www.peak6.com/email_disclaimer.php
for terms and conditions related to this email

pgsql-performance by date:

Previous
From: Chetan Suttraway
Date:
Subject: Re: Index use difference betweer LIKE, LIKE ANY?
Next
From: Samba GUEYE
Date:
Subject: Re: Table partitioning problem