Re: What needs to be done for real Partitioning? - Mailing list pgsql-performance

From Stacy White
Subject Re: What needs to be done for real Partitioning?
Date
Msg-id 002a01c52d87$543d1010$0200a8c0@grownups
Whole thread Raw
In response to What needs to be done for real Partitioning?  (Josh Berkus <josh@agliodbs.com>)
Responses Re: What needs to be done for real Partitioning?  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
From: "Tom Lane" <tgl@sss.pgh.pa.us>
> Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
> > We probably also need multi-table indexes.
> As Josh says, that seems antithetical to the main point of partitioning,
> which is to be able to rapidly remove (and add) partitions of a table.
> If you have to do index cleaning before you can drop a partition, what's
> the point of partitioning?

Global indexes (as opposed to partition local indexes) are useful in cases
where you have a large number of partitions, index columns different than
the partition key, and index values that limit the query to just a subset of
the partitions.

The two domains that I'm most familiar with are warehouse management, and
the film industry. In both these cases it's logical to partition on
day/week/month, it's frequently important to keep a lot of history, and it's
common to have products that only show activity for a few months.  In one of
our production systems we have 800 partitions (by week, with a lot of
history), but a popular product might have only 20 weeks worth of activity.
Selecting records for the product requires at least 800 random-access reads
if you have local indexes on 'product_no', 780 of which just tell the
executor that the partition doesn't include any information on the product.

This is definitely a phase II item, but as I said before it's worth
considering since good DBAs can do a lot with global indexes.

FWIW, we see large benefits from partitioning other than the ability to
easily drop data, for example:

- We can vacuum only the active portions of a table
- Postgres automatically keeps related records clustered together on disk,
which makes it more likely that the blocks used by common queries can be
found in cache
- The query engine uses full table scans on the relevant sections of data,
and quickly skips over the irrelevant sections
- 'CLUSTER'ing a single partition is likely to be significantly more
performant than clustering a large table

In fact, we have yet to drop a partition on any of our Oracle or Postgres
production systems.


pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: What needs to be done for real Partitioning?
Next
From: Josh Berkus
Date:
Subject: Re: What needs to be done for real Partitioning?