Home > mailing lists

Re: What needs to be done for real Partitioning? - Mailing list pgsql-performance

From	Stacy White
Subject	Re: What needs to be done for real Partitioning?
Date	March 20, 2005 19:58:02
Msg-id	002a01c52d87$543d1010$0200a8c0@grownups Whole thread Raw
In response to	What needs to be done for real Partitioning? (Josh Berkus <josh@agliodbs.com>)
Responses	Re: What needs to be done for real Partitioning?
List	pgsql-performance

Tree view

From: "Tom Lane" <tgl@sss.pgh.pa.us>
> Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
> > We probably also need multi-table indexes.
> As Josh says, that seems antithetical to the main point of partitioning,
> which is to be able to rapidly remove (and add) partitions of a table.
> If you have to do index cleaning before you can drop a partition, what's
> the point of partitioning?

Global indexes (as opposed to partition local indexes) are useful in cases
where you have a large number of partitions, index columns different than
the partition key, and index values that limit the query to just a subset of
the partitions.

The two domains that I'm most familiar with are warehouse management, and
the film industry. In both these cases it's logical to partition on
day/week/month, it's frequently important to keep a lot of history, and it's
common to have products that only show activity for a few months.  In one of
our production systems we have 800 partitions (by week, with a lot of
history), but a popular product might have only 20 weeks worth of activity.
Selecting records for the product requires at least 800 random-access reads
if you have local indexes on 'product_no', 780 of which just tell the
executor that the partition doesn't include any information on the product.

This is definitely a phase II item, but as I said before it's worth
considering since good DBAs can do a lot with global indexes.

FWIW, we see large benefits from partitioning other than the ability to
easily drop data, for example:

- We can vacuum only the active portions of a table
- Postgres automatically keeps related records clustered together on disk,
which makes it more likely that the blocks used by common queries can be
found in cache
- The query engine uses full table scans on the relevant sections of data,
and quickly skips over the irrelevant sections
- 'CLUSTER'ing a single partition is likely to be significantly more
performant than clustering a large table

In fact, we have yet to drop a partition on any of our Oracle or Postgres
production systems.

pgsql-performance by date:

From: Tom Lane
Date: 20 March 2005, 17:58:51
Subject: Re: What needs to be done for real Partitioning?

From: Josh Berkus
Date: 20 March 2005, 20:04:11
Subject: Re: What needs to be done for real Partitioning?

Re: What needs to be done for real Partitioning? - Mailing list pgsql-performance

Previous

Next