Feature suggestions (long) - Mailing list pgsql-hackers
From | Martijn van Oosterhout |
---|---|
Subject | Feature suggestions (long) |
Date | |
Msg-id | 20030517150039.GF3336@svana.org Whole thread Raw |
Responses |
Re: Feature suggestions (long)
|
List | pgsql-hackers |
[Please CC any replies to me. Thanks.] I'm going suggest a feature like what Oracle calls "partitions" and later on something with indexes. The idea is to generate some discussion to see if they are worthy of being added to the TODO list. Partitions ========== The idea is to split data across tables according to the contents of a tuple. So you could split the contents of a table based on years or some such. I'm not going to go too much into why it's a good idea, there are several links on the web discussing it: http://www.nyoug.org/200212nanda.pdf Anyway, what I'm proposing is different from what Oracle does but flexible enough such that what Oracle does could the implemented/emulated easily enough. I was thinking along the lines of: CREATE PARTITION sales_2003 ON sales WHERE saledate >= '2003-01-01' AND saledate < '2004-01-01'; This would produce a sequence of events equivalent to: CREATE TABLE sales_2003 () INHERITS ( sales ); INSERT INTO sales_2003 SELECT * FROM sales WHERE <condition>; DELETE FROM sales WHERE <condition>; CREATE RULE rule1 AS ON INSERT TO sales WHERE <condition> DO INSTEAD INSERT INTO sales_2003 (NEW.*); CREATE RULE rule2 AS ON DELETE TO sales WHERE <condition> DO INSTEAD DELETE FROM sales_2003 WHERE <???>; <a pile more RULEs> err... It gets a bit complex here. My point is that it can be done but with proper support it can be done more efficiently, reliably and usably. Internally you could treat them more like real inherited tables. Inserts would be fairly straight forward (use the conditions to determine the resulting table). Deletes require no work at all. Updates are trickier. The problem is that after an update the tuple may end up needing to be in another table. Adding/dropping/altering columns would have to be handled similarly to inherited tables. Looking at heap_update or ExecUpdate it's not clear whether it would be a problem if the old and new tuples are in a separate tables. As long as the descriptors are the same it shouldn't be too much of a deal hopefully. So far the storage doesn't seem too difficult. I'm not sure how to deal with inheritance and this. I'd suggest making partitions work only on individual tables and not inherited. Next stage would be teaching the planner. The conditions would be pseudo-constraints on the partitions. Hence if the conditions and the constraints form a non-intersecting set, you can skip that partition altogether. Now, semantically, should you be allowed to do inserts, updates and deletes on partitions directly, or should they be hidden? If/when we get table-spaces, you should be able to move them around. Deleting a partition would amount to moving all the tuples back to the main table. Of course, if there are other partitions the conditions will have to be re-evaluated to place the tuple into one of the other partitions. What do you do if multiple partitions have conflicting conditions? I'd be tempted to do a first-come-first-serve basis, though maybe some kind of priority? One advantage of the Oracle approach is that this issue doesn't come up. We certainly can't write a system to detect anything but the obvious cases. Of course, making individual indexes for all the partitions could get tedious, even if the system did it automatically; which me brings to my next suggestion: Multi-table indexes =================== Currently an index only applies to a single table. If you use inheritance all the indexes can get cumbersome. With something like the above partitions it could get completely out of hand if used heavily. What I'd like is if instead of an index mapping <index keys> -> ctid, it does <index keys> -> (tableoid,ctid). At the same time it lets us build UNIQUE indexes for inheritance (for primary keys). Currently, an index is represented by a row in pg_index. To implement this you would need to arrange a way for an index to list all its constituent tables so that when doing an update you know which to indexes to update. Multiple rows seems the easiest way but it does violate a uniqueness constraint. Obviously, the columns indexed can only be present in the root table. However, the index would be usable for queries on subtables too. The planner would simply have to remember to push down 'tableoid IN (oid,oid,oid,...)' conditions. The major problem here is statistics. What you really need is some sort of aggregate statistics over an entire inheritance tree. Without something like this the planner can't make sensible decisions on when to use the index. ANALYZE currently doesn't do this. It's not entirely clear how it would go about it either. This doesn't apply to UNIQUE indexes though. Another issue is when a table is deleted. Normally you can just purge the index when the table is deleted. With these indexes you can't since the remaining data is still useful. So the index access method would have to be careful not to return tuples from tables that don't exist anymore. VACUUM would hopefully eventually clean them up. There would also be an issue of the IndexScan node returning the right fields depending on which part of the subtree is being queried. For example, if B were inherited from A then an IndexScan on A* could return fields not in A if the query only ever referenced B. This may already be solved as the IndexScan node can scan multiple indexes. = In any case, the second feature seems like it will be required (or at least something similar) to handle proper referential integrity and primary keys for inherited tables. But if something like the former is implemented (hopefully, as using rules would get very cumbersome) then the latter would be necessary. Have a good weekend. -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > "the West won the world not by the superiority of its ideas or values or > religion but rather by its superiority in applying organised violence. > Westerners often forget this fact, non-Westerners never do." > - Samuel P. Huntington
pgsql-hackers by date: