Re: What needs to be done for real Partitioning? - Mailing list pgsql-performance

From Jim C. Nasby
Subject Re: What needs to be done for real Partitioning?
Date
Msg-id 20050321225803.GU51784@decibel.org
Whole thread Raw
In response to What needs to be done for real Partitioning?  (Josh Berkus <josh@agliodbs.com>)
List pgsql-performance
I think Greg's email did a good job of putting this on track. Phase 1
should be manual, low-level type of support. Oracle has had partitioning
for years now, and IF they've added automated partition management, it's
only happened in 10g which is pretty recent.

For inserts that don't currently have a defined partition to fit in, the
Oracle model might be better than tossing an error: a partitioned table
in Oracle also contains a default partition. Any rows that don't match a
defined partition go into the default partition. For many cases you'll
never have anything in the default partition, but sometimes you'll have
some partition values that occur infrequenttly enough in the table so as
not to warrant their own partition.

There's also another partitioning application that I think is often
overlooked. I have a table with about 130M rows that is
'pseudo-partitioned' by project_id. Right now, there are 5 different
project IDs that account for the bulk of those 130M rows. Oracle
provides a means to partition on discreet values. When you do this,
there's not actually any reason to even store the partition field in the
partition tables, since it will be the same for every row in the
partition. In my case, since the smallint project ID is being aligned to
a 4 byte boundary, having this feature would save ~120M rows * 4 bytes =
480MB in the table. Granted, 480MB isn't anything for today's disk
sizes, but it makes a huge difference when you look at it from an I/O
standpoint. Knowing that a partition contains only one value of a field
or set of fields also means you can drop those fields from local indexes
without losing any effectiveness. In my case, I have 2 indexes I could
drop project_id from. Does each node in a B-tree index have the full
index key? If so, then there would be substantial I/O gains to be had
there, as well. Even if each node doesn't store the full key, there
could still be faster to handle a narrower index.

I realize this might be a more difficult case to support. It probably
couldn't be done using inheritance, though I don't know if inheritence
or a union view is better for partitioning. In either case, this case
might not be a good candidate for phase 1, but I think partitioning
should be designed with it in mind.
--
Jim C. Nasby, Database Consultant               decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

pgsql-performance by date:

Previous
From: Josh Berkus
Date:
Subject: Re: What about utility to calculate planner cost constants?
Next
From: Greg Stark
Date:
Subject: Re: What about utility to calculate planner cost constants?