Comments on automatic DML routing and explicit partitioning subcommands - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Comments on automatic DML routing and explicit partitioning subcommands |
Date | |
Msg-id | 1247564358.11347.1308.camel@ebony.2ndQuadrant Whole thread Raw |
Responses |
Re: Comments on automatic DML routing and explicit
partitioning subcommands
|
List | pgsql-hackers |
(submitting review slightly earlier than start of commitfest) Kedar, Thank you very much for the patch. Well done for getting to this stage. There is definitely much support for your work. My thoughts after initial review are fairly wide ranging. Overall, the patch is not ready/close to commit in this commitfest. My estimate is that it will take next 2 commitfests before we could commit it; that is still comfortably within this release. PARTITIONING * I have one main comment that effects about 50% of the patch: We should not be using triggers. Triggers are a useful external solution, but they are not the best or even a desirable approach internally. Inserting data using triggers creates a huge overhead in AfterTrigger events held in memory. In my view this is totally unsuitable for use with VLDBs, which is exactly the point of partitioning. There is nothing we can do to reduce or aggregate these trigger events as is possible with RI trigger checks (not yet done, but possible...). It should be possible to avoid triggers altogether by having a function that dynamically calculates the target partition relid from an input tuple, somewhere around ExecInsert() and during DoCopy(). Thus, current executor has minor changes and we do all required cleverness in a partition function evaluation module. You should be able to rework much of the code into that form. [bsearch()] Dynamic evaluation would also help SQL such as in-bound FK checks, i.e. FK checks against the partitioned table. That needs some other work also, but the guts of it will be similar to both problems. Avoiding triggers will also avoid the horrible looking stuff with pg_dump. "Might see a few errors" is a phrase unlikely to result in patch application, in my experience. :-) Whether we accept that or not, there should be a clear focus on measuring and reducing the time taken to route an inserted row through to its target partition, rather than just automating it. There may be an alternative that achieves better efficiencies. We should be measuring timings with 100s-1000s of partitions, not just 2 or 3. Dynamic partition evaluation will also be important in other places. Partition evaluation * I don't see any need or purpose for HASH partitioning. I suggest that is added as a later patch once everything else is accepted, if there is clear separate reason for it to exist. (I'm certain it has meaning for your sponsors; we must differentiate between the needs of add-on products and the needs of Postgres core). There isn't much point discussing this issue until other parts are committed anyway and a smaller patch will be more easily committed. Skip it for now, please. * It appears that *every* update is handled as a delete plus a re-insertion into the parent(s). Moving between partitions is usually an indicator of a poor choice of partitioning scheme, so we should not be de-optimising normal updates because of corner cases. I think we need to handle/optimize the stays-in-same-partition and the moves-partition cases differently. For me it would be an acceptable limitation to disallow row movement, at least until we get the first part of this committed. * I think we should be making some clear assumptions that partitions have identical row types to their parent and that we have only a single parent for each partition. There seems like extra code to handle additional cases. PATCH * There is no explanatory or command syntax documentation. That prevents people from understanding what you are attempting to deliver, apart from reading the test case. Many people would probably have comments on how partitioning works and syntax if they could see docs. (Please don't refer me to earlier design docs - we need to see how this patch works; external docs go stale very quickly. And no need to rehash the description about the need for and benefits of partitioning - we already have chapters on it in the pg docs.) * The test case is nowhere near long enough for the number of additional commands you are adding. I would expect to see about 10x as many test cases to cover all the options, side cases and complexity. Multiple data types, etc * There are very few comments, and those that do exist are typically one-liners. Look through other parts of the existing code and judge how much commenting is necessary. Basically, lots of long explanatory comments. Why this way? Why here? Why this API? Why not another way, What assumptions, approximations have been made etc.. (I've found this helps the author very much, not just the reviewer. Once extensive comments have been received you will be on a later version and you start to forget how parts work between all the different versions) * The patch is >8000 lines long. I feel like it is trying to do too much in one go and we should break it down. * I suggest that you strip out all the stuff about complex partition functions until we get the main items agreed, e.g. SPLIT PARTITION etc.. This will help you to concentrate on getting main items slick before we move on. * Please put all required files into one archive file, so we know which files are which. Don't let the reviewer guess which files to apply. * Changing the build system for some custom code is not likely to be acceptable without good reason. AFAICS there is no good reason for the way you've done it, and definitely no comments to explain that. There is already an established way to add functions to the backend - use that. I suspect we don't need to add any new external functions at all if we avoid the trigger route as discussed above. * It appears you don't have a unique index on pg_partition. We need a constraint on it, plus we need an index to handle cases where we have 100s and 1000s of partitions. I think you should be doing this kind of thing in your own testing. We don't want this to just pass the test case, we want it to work with TB of data and multiple tables, each with zillions of partitions. * Not clear why we have pg_partition rather than changes to pg_inherits? Why would we want another catalog table? You have two similar functions: find_inheritance_parents() and find_inheritance_parent() - why both? * Function names like parttype() or maxval() are not self explanatory, please expand function names to help later readers of the code. We don't use function names anywhere in Postgres that are both all lowercase and where the words have no underscore to separate them. We use function_names_like_this() or FunctionNamesLikeThis() but donotusefunctionnameslikethisplease() * You mention that you've modified TRUNCATE etc to work with partitioning. This was added in 8.4, so your patch no longer needs to do anything and that part of it should be removed. * I haven't tried to apply the patches, as yet. I feel there are major changes required before we get to that level of confirmation. If we can keep the initial changes fairly modest we can get it applied and still add many more useful things in this release. * The patch should be described as "Automatic DML routing and explicit syntax for partitioning". There are many other tasks required to make partitioning work. I also don't want anybody to think that we have "done partitioning" with this patch even when complete; this patch attempts to address syntax and DML routing. It doesn't address automatic partition creation, list partitioning, sub-partitioning, various partitioning optimisations and global indexing. Others may wish to work on those issues during this dev cycle also and we want to encourage people to submit patches on those also. * I would advise against merging global indexes and partitioning patches. IMHO global indexes are a tick-box feature and so a waste of dev time. They are almost never used once the reality of their ridiculous size takes hold; most DBAs laugh when they are suggested. And the dev footprint on Postgres will be fairly large also. Start a new thread if you wish to discuss them again - I don't. * If we can prove that the partition value ranges are distinct then it is possible to have a PrimaryKey without global indexes. I'm sure there'll be much discussion around this. However detailed the discussion gets, we're all very supportive of your efforts to contribute. Best Regards, -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
pgsql-hackers by date: