Re: What needs to be done for real Partitioning? - Mailing list pgsql-performance
From | Greg Stark |
---|---|
Subject | Re: What needs to be done for real Partitioning? |
Date | |
Msg-id | 87br9ehmja.fsf@stark.xeocode.com Whole thread Raw |
In response to | What needs to be done for real Partitioning? (Josh Berkus <josh@agliodbs.com>) |
Responses |
Re: What needs to be done for real Partitioning?
Re: What needs to be done for real Partitioning? Re: What needs to be done for real Partitioning? |
List | pgsql-performance |
Josh Berkus <josh@agliodbs.com> writes: > -- INSERT INTO should automatically create new partitions where necessary > ---- new tables should automatically inherit all constraints, indexes, > keys of "parent" table I think you're going about this backwards. Phase I should be an entirely manual system where you add and remove partitions manually and create and drop indexes you want manually. You need these low level interfaces anyways for a complete system, it doesn't make sense to have everything automatic and then later try to wedge in a low level interface. Only once you have that do you then start offering options to do these things automatically. I also think there are a few other components mixed up in your proposal that are really not integral to partitioned tables. Tablespaces and expression indexes may well be useful features to use in combination with partitioned tables, but they shouldn't be required or automatic. From my experience with Oracle I think there's one big concept that makes the whole system make a lot more sense: individual partitions are really tables. The partitioned tables themselves are just meta-objects like views. Once you get that concept the whole featureset makes a lot more sense. You can pull a partition out of a partitioned table and it becomes a normal table. You can take a normal table and put it into a partitioned table. Creating a new partition or altering a partition is just the same as creating or altering a new table (except for the actual data definition part). Given that understanding it's clear that tablespaces are an entirely orthogonal feature. One that happens to play well with partitioned tables, but not one that partitioned tables need any special support for. When you create a new partition or create a table intending to add it as a partition to a partitioned table you specify the tablespace just as you would normally do. It's also clear that the last thing you want is an index on the partition key. A big part of the advantage of partitioned tables is precisely that you get the advantage of an index on a column without the extra expense. It would also be reasonable to allow clustering individual partitions; creating table or column constraints on some partitions and not others; or even allow having indexes on some partitions and not others. In general the only operations that you wouldn't be able to do on an individual partition would be operations that make the column definitions incompatible with the parent table. The $64 question is how to specify the partitioning rules. That is, the rule for determining which partition an insert should go into and which partitions to look for records in. Oracle handles this by specifying a list of columns when creating the partitioned table and then specifying either a range or specific values for each individual partition. I can imagine other approaches but none that allow for the planner and optimizer to take as much advantage of the information. So I think Phase I should look like: An ALTER TABLE command to make an inherited table "abstract" in the object oriented sense. That is, no records can be inserted in the parent table. If you follow the oracle model this is also where you specify the partition key. There's no index associated with this partition key though. A command to create a new partition, essentially syntactic sugar for a CREATE TABLE with an implied INHERITS clause and a constraint on the partition key. If you follow the oracle model then you explicitly specify which range or specific value of the partition key this partition holds. A command to remove a partition from the partitioned table and turn it into a regular table. A command to take a regular table and turn it into a partition. Again here you specify the range or value of the partition key. There has to be some verification that the table really holds the correct data though. Perhaps this could be skipped by providing a table with a properly constructed constraint in place. Magic to make INSERT/UPDATE figure out the correct partition to insert the new record. (Normally I would have suggested that UPDATE wasn't really necessary but in Postgres it seems like it would fall out naturally from having INSERT.) Phase II would be planner and executor improvements to take advantage of the information to speed up queries and allow for individual partitions to be read-only or otherwise inaccessible without impeding queries that don't need that partition. Phase III would be autopilot features like having new partitions automatically created and destroyed and being able to specify in advance rules for determining which tablespaces to use for these new partitions. I'm not sure whether to put global indexes under Phase II or III. Personally I think there's no point to them at all. They defeat the whole point of partitioned tables. Once you have global indexes adding and removing partitions becomes a lot harder and slower. You may as well have kept everything in one table in the first place. But apparently some people find them useful. -- greg
pgsql-performance by date: