Re: Partitioning option for COPY - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Re: Partitioning option for COPY |
Date | |
Msg-id | 1258969003.27757.4259.camel@ebony Whole thread Raw |
In response to | Re: Partitioning option for COPY (Emmanuel Cecchet <manu@asterdata.com>) |
Responses |
Re: Partitioning option for COPY
Re: Partitioning option for COPY |
List | pgsql-hackers |
On Wed, 2009-11-11 at 19:53 -0500, Emmanuel Cecchet wrote: > Hi, > >> I have extracted the partitioning option for COPY (removed the error > >> logging part) from the previous patch. > >> > > > > We can use an INSERT trigger to route tuples into partitions even now. > > Why do you need an additional router for COPY? > Tom has already explained on the list why using a trigger was a bad idea > (and I know we can use a trigger since I am the one who wrote it). > If you look at the code you will see that you can do optimizations in > the COPY code that you cannot do in the trigger. > > > Also, it would be nicer > > that the router can works not only in COPY but also in INSERT. > > > As 8.5 will at best provide a syntactic hack on top of the existing > constraint implementation, I think that it will not hurt to have routing > in COPY since we will not have it anywhere otherwise. > > BTW, I'm working on meta data of partitioning now. Your "partitioning" > > option in COPY could be replaced with the catalog. > > > This implementation is only for the current 8.5 and it will not be > needed anymore once we get a fully functional partitioning in Postgres > which seems to be for a future version. Yes, the trigger way of doing this is a bad way. I regret to say that the way proposed here isn't much better, AFAICS. Let me explain why I think that, but -1 to anyone applying this patch. This patch proposes keeping a cache of last visited partitions to reduce the overhead of data routing. What I've requested is that partitioning work by using a data structure held in relcache for inheritance parents. This differs in 3 ways from this patch a) it has a clearly defined location for the cached metadata, with clearly identified and well worked out mechanisms for cache invalidation b) the cache can be built once when it is first needed, not slowly grown as parts of the metadata are used c) it would be available for all parts of the server, not just COPY. The easiest way to build that metadata is when structured partitioning info is available. i.e. the best next action is to complete and commit Itagaki's partitioning syntax patch. Then we can easily build the metadata for partitioning, which can then be used in COPY for data routing. Anyway, I want data routing, as is the intention of this patch. I just don't think this patch is a useful way to do it. It is too narrow in its scope and potentially buggy in its approach to developing a cache and using trigger-like stuff. ISTM that with the right metadata in the right place, a cleaner and easier solution is still possible for 8.5. The code within COPY should really just reduce to a small piece of code to derive the correct relation for the desired row and then use that during heap_insert(). I have just discussed partitioning with Itagaki-san at JPUG, so I know his plans. Itagaki-san and Manu, please can you work together to make this work for 8.5? --- A more detailed explanation of Partitioning Metadata: Partitioning Metadata is information held on the relcache for a table that has child partitions. Currently, a table does not cache info about its children, which prevents various optimisations. We would have an extra pointer on the Relation struct that points to a PartitioningMetadata struct. We can fill in this information when we construct the relcache for a relation, or we can populate it on demand the first time we attempt to use that information (if it exists). We want to hold an array of partition boundary values. This will then allow us to use bsearch to find the partition that a specific value applies to. Thus it can be used for routing data from INSERTs or COPY, can be used for identifying which partitions need to be included/excluded from an APPEND node. Using this will be O(logN) rather than O(N), so allowing us to have much larger number of partitions when required. Note that it can also be used within the executor to perform dynamic partition elimination, thus allowing us to easily implement partition aware joins etc. To construct the array we must sort the partition boundary values and prove that the partition definitions do not overlap. That is much easier to do when the partitions are explicitly defined. (Plus, there is no requirement to have, or mechanism to specify, unique partitions currently, although most users assume this in their usage). I imagine we would have an API called something like RelationIdentifyPartition() where we provide value(s) for the PartitioningKey column(s) and we then return the Oid of the partition that holds that value. That function would build the metadata, if not already cached, then bsearch it to provide the Oid. -- Simon Riggs www.2ndQuadrant.com
pgsql-hackers by date: