Thread: Obstacles to user-defined range canonicalization functions
I got religion this evening about the potential usefulness of user-defined canonicalization functions --- the example that did it for me was thinking about a range type over timestamp that quantizes boundaries to hours, or half hours, or 15 minutes, or any scheduling unit that is standard in a particular environment. In that sort of situation you really want a discrete range type, which the standard tsrange type is not. So how hard is it to build a user-defined canonicalization function to support such an application? The logic doesn't seem terribly difficult ... but *you have to write the darn thing in C*. There are two reasons why: * The underlying range_serialize function is only exposed at the C level. If you try to write something in, say, plpgsql then you are going to end up going through range_constructorN or range_in to produce your result value, and those call the type's canonical function. Infinite recursion, here we come. * The only way to create a canonicalization function in advance of declaring the range type is to declare it against a shell type. But the PL languages all reject creating PL functions that take or return a shell type. Maybe we could relax that, but it's nervous-making, and anyway the first problem still remains. Now you could argue that for performance reasons everybody should write their canonicalization functions in C anyway, but I'm not sure I buy that --- at the very least, it'd be nice to write the functions in something higher-level while prototyping. I have no immediate proposal for how to fix this, but I think it's something we ought to think about. One possibility that just came to me is to decree that every discrete range type has to be based on an underlying continuous range type (with all the same properties except no canonicalization function), and then the discrete range's canonicalization function could be declared to take and return the underlying range type instead of the discrete type itself. Haven't worked through the details though. regards, tom lane
On Nov24, 2011, at 04:33 , Tom Lane wrote: > One possibility that just came to me is to decree that every discrete > range type has to be based on an underlying continuous range type (with > all the same properties except no canonicalization function), and then > the discrete range's canonicalization function could be declared to take > and return the underlying range type instead of the discrete type > itself. Haven't worked through the details though. We could also make the canonicalization function receive the boundaries and boundary types as separate arguments, and return them in the same way. In plpgsql the signature could be canonicalize(inout lower base_type, inout upper base_type, inout lower_inclusive boolean, inout upper_inclusiveboolean) Not exactly pretty, but it avoids the need for a second continuous range type... best regards, Florian Pflug
On Nov 23, 2011, at 10:33 PM, Tom Lane wrote: > Now you could argue that for performance reasons everybody should write > their canonicalization functions in C anyway, but I'm not sure I buy > that --- at the very least, it'd be nice to write the functions in > something higher-level while prototyping. I would apply this argument to every single part of the system that requires code that extends the database to be writtenin C, including: * I/O functions (for custom data types) * tsearch parsers * use of RECORD arguments And probably many others. There are a *lot* of problems I’d love to be able to solve with prototypes written in PLs otherthan C, and in small databases (there are a lot of them out there), they may remain the production solutions. So I buy the argument in the case of creating range canonicalization functions, too, of course! Best, David
On Wed, 2011-11-23 at 22:33 -0500, Tom Lane wrote: > * The underlying range_serialize function is only exposed at the C > level. If you try to write something in, say, plpgsql then you are > going to end up going through range_constructorN or range_in to produce > your result value, and those call the type's canonical function. > Infinite recursion, here we come. That seems solvable, unless I'm missing something. > * The only way to create a canonicalization function in advance of > declaring the range type is to declare it against a shell type. But the > PL languages all reject creating PL functions that take or return a > shell type. Maybe we could relax that, but it's nervous-making, and > anyway the first problem still remains. That seems a little more challenging. > One possibility that just came to me is to decree that every discrete > range type has to be based on an underlying continuous range type (with > all the same properties except no canonicalization function), and then > the discrete range's canonicalization function could be declared to take > and return the underlying range type instead of the discrete type > itself. Haven't worked through the details though. An interesting approach. I wonder if there would be a reason to tie such types together for a reason other than just the canonical function? Would you have to define everything in terms of the continuous range, or could it be a constraint hierarchy; e.g. a step size 100 is based on a step size of 10 which is based on numeric? Regards,Jeff Davis