Re: What needs to be done for real Partitioning? - Mailing list pgsql-performance

From Greg Stark
Subject Re: What needs to be done for real Partitioning?
Date
Msg-id 87br9ehmja.fsf@stark.xeocode.com
Whole thread Raw
In response to What needs to be done for real Partitioning?  (Josh Berkus <josh@agliodbs.com>)
Responses Re: What needs to be done for real Partitioning?
Re: What needs to be done for real Partitioning?
Re: What needs to be done for real Partitioning?
List pgsql-performance
Josh Berkus <josh@agliodbs.com> writes:

> -- INSERT INTO should automatically create new partitions where necessary
>     ---- new tables should automatically inherit all constraints, indexes,
>             keys of "parent" table

I think you're going about this backwards.

Phase I should be an entirely manual system where you add and remove
partitions manually and create and drop indexes you want manually. You need
these low level interfaces anyways for a complete system, it doesn't make
sense to have everything automatic and then later try to wedge in a low level
interface. Only once you have that do you then start offering options to do
these things automatically.

I also think there are a few other components mixed up in your proposal that
are really not integral to partitioned tables. Tablespaces and expression
indexes may well be useful features to use in combination with partitioned
tables, but they shouldn't be required or automatic.

From my experience with Oracle I think there's one big concept that makes the
whole system make a lot more sense: individual partitions are really tables.
The partitioned tables themselves are just meta-objects like views.

Once you get that concept the whole featureset makes a lot more sense. You can
pull a partition out of a partitioned table and it becomes a normal table. You
can take a normal table and put it into a partitioned table. Creating a new
partition or altering a partition is just the same as creating or altering a
new table (except for the actual data definition part).

Given that understanding it's clear that tablespaces are an entirely
orthogonal feature. One that happens to play well with partitioned tables, but
not one that partitioned tables need any special support for. When you create
a new partition or create a table intending to add it as a partition to a
partitioned table you specify the tablespace just as you would normally do.

It's also clear that the last thing you want is an index on the partition key.
A big part of the advantage of partitioned tables is precisely that you get
the advantage of an index on a column without the extra expense.

It would also be reasonable to allow clustering individual partitions;
creating table or column constraints on some partitions and not others; or
even allow having indexes on some partitions and not others. In general the
only operations that you wouldn't be able to do on an individual partition
would be operations that make the column definitions incompatible with the
parent table.

The $64 question is how to specify the partitioning rules. That is, the rule
for determining which partition an insert should go into and which partitions
to look for records in. Oracle handles this by specifying a list of columns
when creating the partitioned table and then specifying either a range or
specific values for each individual partition. I can imagine other approaches
but none that allow for the planner and optimizer to take as much advantage of
the information.

So I think Phase I should look like:

  An ALTER TABLE command to make an inherited table "abstract" in the object
  oriented sense. That is, no records can be inserted in the parent table. If
  you follow the oracle model this is also where you specify the partition
  key. There's no index associated with this partition key though.

  A command to create a new partition, essentially syntactic sugar for a
  CREATE TABLE with an implied INHERITS clause and a constraint on the
  partition key. If you follow the oracle model then you explicitly specify
  which range or specific value of the partition key this partition holds.

  A command to remove a partition from the partitioned table and turn it into
  a regular table.

  A command to take a regular table and turn it into a partition. Again here
  you specify the range or value of the partition key. There has to be some
  verification that the table really holds the correct data though. Perhaps
  this could be skipped by providing a table with a properly constructed
  constraint in place.

  Magic to make INSERT/UPDATE figure out the correct partition to insert the new
  record. (Normally I would have suggested that UPDATE wasn't really necessary
  but in Postgres it seems like it would fall out naturally from having INSERT.)

Phase II would be planner and executor improvements to take advantage of the
information to speed up queries and allow for individual partitions to be
read-only or otherwise inaccessible without impeding queries that don't need
that partition.

Phase III would be autopilot features like having new partitions automatically
created and destroyed and being able to specify in advance rules for
determining which tablespaces to use for these new partitions.

I'm not sure whether to put global indexes under Phase II or III. Personally I
think there's no point to them at all. They defeat the whole point of
partitioned tables. Once you have global indexes adding and removing
partitions becomes a lot harder and slower. You may as well have kept
everything in one table in the first place. But apparently some people find
them useful.

--
greg

pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: What needs to be done for real Partitioning?
Next
From: PFC
Date:
Subject: Re: What needs to be done for real Partitioning?