Re: What needs to be done for real Partitioning? - Mailing list pgsql-performance

From Greg Stark
Subject Re: What needs to be done for real Partitioning?
Date
Msg-id 87fyypgb9o.fsf@stark.xeocode.com
Whole thread Raw
In response to Re: What needs to be done for real Partitioning?  (Josh Berkus <josh@agliodbs.com>)
List pgsql-performance
Josh Berkus <josh@agliodbs.com> writes:

> Well, I would think that specifying an expression that defines a new partition
> at each change in value (like EXTRACT(day FROM timestamp) on a time-based
> partitioning) would cover 90% of implemenations and be a lot simpler to
> administer.   The Oracle approach has the advantage of allowing "custom
> paritioning" at the expense of greater complexity.

Hm. This is where I might be less helpful. Once you're submersed in one way of
doing things it can be hard to think outside the box like this.

But I fear this scheme might be harder to actually take advantage of. If I do
a query like

 WHERE timestamp BETWEEN '2005-01-01 11:00' AND '2005-01-01 12:00'

How do you determine which partitions that range will cover?

Also, it seems like it would be inconvenient to try to construct expressions
to handle things like "start a new partition ever 1 million values".

And worse, how would you handle changing schemes with this? Like, say we want
to switch from starting one partition per month to starting one partition per
week?



I think some actual use cases might be helpful for you. I can contribute an
interesting one, though I have to be intentionally vague even though I don't
work on that system any more.

We had a table with a layout like:

txnid serial,
groupid integer,
data...

Each day a cron job created 6 new groups (actually later that was changed to
some other number). It then added a new partition to handle the range of the
new day's groups. Later another cron job exchanged out the partition from a
week earlier and exported that table, transfered it to another machine and
loaded it there.

txnid was a unique identifier but we couldn't have a unique constraint because
that would have required a global index. That didn't cause any problems since
it was a sequence generated column anyways.

We did have a unique index on <groupid,txnid> which is a local index because
groupid was the partition key. In reality nothing in our system ever really
needed a txn without knowing which group it came from anyways, so it was easy
to change our queries to take advantage of this.

We had a lot of jobs, some *extremely* performance sensitive that depended on
being able to scan the entire list of txns for a given day or a given set of
groupids. The partitions meant it could do a full table scan which made these
extremely fast.

This was with Oracle 8i. All partition keys in 8i were ranges. In 9 Oracle
added the ability to make partition reference specific id values. Sort of like
how you're describing having a key expression. We might have considered using
that scheme with groupid but then it would have meant adding a bunch of new
partitions each day and having some queries that would involve scanning
multiple partitions.

--
Greg

pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: What needs to be done for real Partitioning?
Next
From: Greg Stark
Date:
Subject: Re: What needs to be done for real Partitioning?