Thread: Range Types, constructors, and the type system
Different ranges over the same subtype make sense when using different total orders for the subtype. This is most apparent with text collation, but makes sense (at least mathematically, if not practically) for any subtype. For instance:[a, Z) is a valid range in "en_US", but not in "C", so it makes sense to have multiple ranges over the same subtype with different collations. But what if you have a function (like a constructor), of the form: (anyelement, anyelement) -> anyrange ? To work with the type system, you need to be able to figure out the return type from the arguments; which means to support functions like this we need a mapping from the subtype to the range type. Unfortunately, that restricts us to one range type per subtype (this isn't a problem for ARRAYs, because there is only one useful array type for a given element type). This problem first came up a while ago: http://archives.postgresql.org/pgsql-hackers/2011-01/msg02788.php My workaround was to use domains, but that's not a very clean solution (you have to add a bunch of casts to make sure the right domain is chosen). It became entirely unworkable with collations, because people would be using different text collations a lot more frequently than, say, a different ordering for timestamptz. Tom mentioned that here: http://archives.postgresql.org/message-id/24831.1308579443@sss.pgh.pa.us I think Florian proposed the most promising line of attack here: http://archives.postgresql.org/message-id/AD4FC75D-DB99-48ED-9082-52EE3A4D74A6@phlo.org by suggesting that functions of the form: (anyelement, [other non-anyrange arguments]) -> anyrange might be expendable. After all, they are only useful for constructors as far as we can tell. Other range functions will have an anyrange parameter, and we can use the actual type of the argument to know the range type (as well as the subtype). Although it's very nice to be able to say: range(1,10) and get an int4range out of it, it's not the only way, and it's not without its problems anyway. For instance, to get an int8range you have to do: range(1::int8, 10::int8) or similar. So, we could just make functions like: int4range(int4, int4) int8range(int8, int8) ... when creating the range type, and it would actually be a usability improvement. There are at least a few constructors that would need to be made for each rangetype: the constructor above, the singleton constructor, constructors that have infinite bounds, the empty constructor, and all of the permutations for inclusivity/exclusivity. That adds up to quite a few catalog entries per range type. We could reduce some of the permutations by using extra arguments somehow, but that seems like it adds to the ugliness. This might also be a time to revisit whether there is a better way to present all of these constructors (rather than the _[co][co] suffixes to names, etc.). Even if we're willing to put up with a bunch of catalog entries, it will take a little creativity to figure out how to run the functions generically from a fixed set of C functions. Are there other thoughts or ideas about eliminating the need for generic constructors like range()? Another idea Florian suggested (in the same email) was the ability to declare the return type of a function, and then use the declared type to infer the argument types. That would be nice because you would just have to do: range(1,10)::int8range However, that's kind of backwards from how our type inference system works now, and sounds like a big change. Maybe we could consider a syntax addition for constructing range values? That was kicked around briefly, but perhaps we should revisit the possibilities there. Regards,Jeff Davis
To eludicate my earlier comments on this subject ... I believe that the best general solution here is for every ordered base type to just have a single total order, which is always used with that type in any generic order-sensitive operation, including any ranges defined over it, or any ORDER BY or any <,>,etc. The built-in generic text type would have exactly 1 system-defined collation that can't be changed, and it would be something simple and generic, such as simply sorting on the codepoint as integers. When we want to have some other "native" ordering for an existing type, such as when we want to use a different text collation, we do this by creating a *new base type*, using CREATE TYPE or some shorthand thereof, and this new type defines its own ordering, such as that a particular text collation is used. For example: CREATE TYPE text__en_US AS (v text); CREATE TYPE text__C AS (v text); These will not compare equal to each other or to text, and that is good, because having a different text collation implies that we consider 'foo'::text__en_US and 'foo'::text__C to be different values. I believe that any other approach is worse, and in particular I believe that creating DOMAIN over text is worse, because DOMAIN are supposed to be subtypes of some other type, whose set of member values is a subset of the other type's values, and that have the same ordering. Multiple CREATE type over the same base type don't interfere with each other like multiple DOMAIN could. Assuming that what CREATE TYPE produces is actually a base type, I believe there is no better solution using the facilities that SQL provides. If there is concern about performance related to CREATE TYPE being a composite type, I'm sure it is possible to engineer an optimization for when the type has just 1 attribute so that performance isn't an issue. The main point I'm trying to raise here is about semantics and good type systems. Likewise, don't let concern about syntax for using values of such composite types. Once again, there can be shorthands if necessary. In fact, Postgres could provide a general shorthand for creating a composite type of 1 attribute whose purpose is to make one type that is like but unequal to another, and using this shorthand could also cause the composite type to overload/polymorph all the operators of it's attribute type, so that the syntax to define one is very short. For example: CREATE TYPE text__en_US WRAPS text COLLATE en_US; ... and I assume the name of that attribute would just be system-defined. Note that the above is specific to wrapping text, and the COLLATE is just shorthand for defining an ordering function for text__en_US. A more general form could be: CREATE TYPE bar WRAPS foo ORDER USING FUNCTION baz (lhs foo, rhs foo) ...; And then we can say: RANGE OF text__en_US RANGE OF text ... similarly to how we declare array types with ARRAY. One can also just define range values as they do array values, such as like this: range('foo','bar') # default collation range('foo'::text__en_US, 'bar'::text__en_US) # en_us collation If that seems verbose, I have a few words for you: 1. Users should in practice name their wrapper types over their intended meaning, not their mechanics, such as like this (not using text for variety), and that may be more terse: CREATE TYPE acct_num WRAPS integer; # inherits integer ordering by default 2. If the wrapper types overload the base operators, either automatically or selectively (does it make sense to multiply an acct_num?), one doesn't have to keep unpacking and packing them to use them in most cases. For example, I'd expect many text wrappers to polymorph catenation or substring etc, so no extra syntax. 3. In practice, most literal values come from applications and are given to SQL code either as function parameters or bind parameter arguments. While lots of example code may have literal values in it, I would think that most real-work code would hardly have any, and hence you'd rarely see any 'foo'::text__en_US for example. You'd more likely see the less common var::text__en_US or such. So that's my position, CREATE TYPE on the regular types or the like is the best solution, and anything else is an inferior solution. Such a design is also how I do collations and ranges in my Muldis D language. -- Darren Duncan Jeff Davis wrote: > Different ranges over the same subtype make sense when using different > total orders for the subtype. This is most apparent with text collation, > but makes sense (at least mathematically, if not practically) for any > subtype. > > For instance: > [a, Z) > is a valid range in "en_US", but not in "C", so it makes sense to have > multiple ranges over the same subtype with different collations. > > But what if you have a function (like a constructor), of the form: > (anyelement, anyelement) -> anyrange > ? To work with the type system, you need to be able to figure out the > return type from the arguments; which means to support functions like > this we need a mapping from the subtype to the range type. > Unfortunately, that restricts us to one range type per subtype (this > isn't a problem for ARRAYs, because there is only one useful array type > for a given element type). > > This problem first came up a while ago: > http://archives.postgresql.org/pgsql-hackers/2011-01/msg02788.php > > My workaround was to use domains, but that's not a very clean solution > (you have to add a bunch of casts to make sure the right domain is > chosen). It became entirely unworkable with collations, because people > would be using different text collations a lot more frequently than, > say, a different ordering for timestamptz. Tom mentioned that here: > > http://archives.postgresql.org/message-id/24831.1308579443@sss.pgh.pa.us > > I think Florian proposed the most promising line of attack here: > > http://archives.postgresql.org/message-id/AD4FC75D-DB99-48ED-9082-52EE3A4D74A6@phlo.org > > by suggesting that functions of the form: > (anyelement, [other non-anyrange arguments]) -> anyrange > might be expendable. After all, they are only useful for constructors as > far as we can tell. Other range functions will have an anyrange > parameter, and we can use the actual type of the argument to know the > range type (as well as the subtype). > > Although it's very nice to be able to say: > range(1,10) > and get an int4range out of it, it's not the only way, and it's not > without its problems anyway. For instance, to get an int8range you have > to do: > range(1::int8, 10::int8) > or similar. > > So, we could just make functions like: > int4range(int4, int4) > int8range(int8, int8) > ... > when creating the range type, and it would actually be a usability > improvement. > > There are at least a few constructors that would need to be made for > each rangetype: the constructor above, the singleton constructor, > constructors that have infinite bounds, the empty constructor, and all > of the permutations for inclusivity/exclusivity. That adds up to quite a > few catalog entries per range type. > > We could reduce some of the permutations by using extra arguments > somehow, but that seems like it adds to the ugliness. This might also be > a time to revisit whether there is a better way to present all of these > constructors (rather than the _[co][co] suffixes to names, etc.). > > Even if we're willing to put up with a bunch of catalog entries, it will > take a little creativity to figure out how to run the functions > generically from a fixed set of C functions. > > Are there other thoughts or ideas about eliminating the need for generic > constructors like range()? > > Another idea Florian suggested (in the same email) was the ability to > declare the return type of a function, and then use the declared type to > infer the argument types. That would be nice because you would just have > to do: > range(1,10)::int8range > However, that's kind of backwards from how our type inference system > works now, and sounds like a big change. > > Maybe we could consider a syntax addition for constructing range values? > That was kicked around briefly, but perhaps we should revisit the > possibilities there. > > Regards, > Jeff Davis > >
Darren Duncan <darren@darrenduncan.net> writes: > I believe that the best general solution here is for every ordered base type to > just have a single total order, which is always used with that type in any > generic order-sensitive operation, including any ranges defined over it, or any > ORDER BY or any <,>,etc. We've spent years and blood on making sure that Postgres could support multiple orderings for any datatype; and there are plenty of natural examples for the usefulness of that. So I'm not at all impressed by any line of reasoning that starts out by baldly throwing that away. > When we want to have some other "native" ordering for an existing type, such as > when we want to use a different text collation, we do this by creating a *new > base type*, Nope. This has all sorts of problems that you're conveniently ignoring, beginning with the need to duplicate all of the infrastructure for the type (such as non-ordering-related operators), and then moving into difficulties arising from added ambiguity as to which operator is meant. regards, tom lane
On Sun, 2011-06-26 at 00:57 -0700, Darren Duncan wrote: > I believe that the best general solution here is for every ordered base type to > just have a single total order, which is always used with that type in any > generic order-sensitive operation, including any ranges defined over it, or any > ORDER BY or any <,>,etc. The built-in generic text type would have exactly 1 > system-defined collation that can't be changed, and it would be something simple > and generic, such as simply sorting on the codepoint as integers. Well, we're trying to support SQL, and SQL supports collations, so I don't think we can just ignore that. I also agree with Tom that it's not a good idea. My reasons are: * Practical considerations, such as having a bunch of cruft from duplicated types all over the system. With sufficient changes to the type system, maybe that could be overcome. Or perhaps domains could be used to make that work for range types (sort of), but the result would not be very consistent with the rest of the system. * It doesn't seem to be based in any mathematical argument. A type is a set of values, and there's no reason it can't have several total orders; or no total order at all. So it appears to just be piggybacking on the type system infrastructure as a place to hold the metadata for a total order. * Who's to say that a "compare" function is the only way to specify a total order? There might be other interfaces that would support something closer to a lexicographic sort. So, from a theoretical standpoint, trying to attach a single notion of total order to a type seems strange, because there might be multiple interfaces for specifying even one total order. * It would require extra explicit type annotations. If you have 12 text types, the only way to practically use any text type is to constantly specify which more-specific text type it actually is (probably using the :: operator). That is not necessarily a bad choice if starting a language from scratch and forming the syntax in a way that it's reasonable to do. But this is SQL, and lots of type annotations are un-SQL-like. Regards,Jeff Davis
On Jun26, 2011, at 00:29 , Jeff Davis wrote: > declare the return type of a function, and then use the declared type to > infer the argument types. That would be nice because you would just have > to do: > range(1,10)::int8range > However, that's kind of backwards from how our type inference system > works now, and sounds like a big change. Well, there actually *is* some precedence for that kind of top-down (form a syntactic perspective) type inference. We *enforce* the cast in array[]::<arraytype> and actually for a very similar reason - without the case, there's no way of knowing which type of empty array was meant. I think we also special-case 'literal'::<type> to use the input function of type directly, instead of first creating a text value and later casting it to <type>. best regards, Florian Pflug
On Mon, 2011-06-27 at 00:56 +0200, Florian Pflug wrote: > Well, there actually *is* some precedence for that kind of top-down > (form a syntactic perspective) type inference. We *enforce* the cast > in > array[]::<arraytype> > and actually for a very similar reason - without the case, there's no > way of knowing which type of empty array was meant. I think we also That's a good point. Although, I'm not sure whether that's an argument that we can make the type system work as-is, or if it means that we should add syntax like ARRAY[]. > special-case > 'literal'::<type> > to use the input function of type directly, instead of first creating > a text value and later casting it to <type>. That is certainly true. Quoted strings never start out as text, they start out as "unknown" and wait for the type inference to determine the type. I'm not entirely sure whether a quoted string followed by a cast is briefly unknown and then cast, or if it's directly interpreted using the cast's type input function. I don't know if that's a good example though because it's near the end of the line and there's no function call in between the arguments and the cast. It might get more complex with cases like: range(lower(range(1,2)),upper(range(1,2)))::int8range but maybe that can be done more easily than I think? Regards,Jeff Davis
Tom Lane wrote: > Darren Duncan <darren@darrenduncan.net> writes: >> I believe that the best general solution here is for every ordered base type to >> just have a single total order, which is always used with that type in any >> generic order-sensitive operation, including any ranges defined over it, or any >> ORDER BY or any <,>,etc. > > We've spent years and blood on making sure that Postgres could support > multiple orderings for any datatype; and there are plenty of natural > examples for the usefulness of that. So I'm not at all impressed by > any line of reasoning that starts out by baldly throwing that away. I'm not saying that you can't use multiple orderings with a data type. I'm just saying that the type only has *at most* one (possibly none) *native* ordering, which is what is used when you do something ordered-sensitive with the type, such as have a range. To be specific, if the type system supports a concept like Perl 6 roles (or other languages have similar concepts) where types can declare themselves members of a union type such as "Ordered", then types of that union would have the native ordering and other types wouldn't and then generic range operators could be declared over ANYORDERED or such. >> When we want to have some other "native" ordering for an existing type, such as >> when we want to use a different text collation, we do this by creating a *new >> base type*, > > Nope. This has all sorts of problems that you're conveniently ignoring, > beginning with the need to duplicate all of the infrastructure for the > type (such as non-ordering-related operators), and then moving into > difficulties arising from added ambiguity as to which operator is meant. Well a related solution is to have exactly 1 text wrapper type which has 2 attributes, one being the text value and the other being the collation name. Then you just have 1 type that does the job instead a separate one per collation. But to keep the semantics, the collation name is part of the identity of the type. For example: CREATE TYPE collated_text AS (t text, c collation); The key point I'm trying to support is that collation issues are firmly attached to the text type, not the range type. Anyway, if a better solution can be arrived at for the problem at hand, then good for the team; meanwhile, what I've proposed is the best one I can think of. -- Darren Duncan
Jeff Davis wrote: > On Sun, 2011-06-26 at 00:57 -0700, Darren Duncan wrote: >> I believe that the best general solution here is for every ordered base type to >> just have a single total order, which is always used with that type in any >> generic order-sensitive operation, including any ranges defined over it, or any >> ORDER BY or any <,>,etc. The built-in generic text type would have exactly 1 >> system-defined collation that can't be changed, and it would be something simple >> and generic, such as simply sorting on the codepoint as integers. > > Well, we're trying to support SQL, and SQL supports collations, so I > don't think we can just ignore that. I'm not saying you can't support collations. See also my reply to Tom. > I also agree with Tom that it's not a good idea. My reasons are: > > * Practical considerations, such as having a bunch of cruft from > duplicated types all over the system. With sufficient changes to the > type system, maybe that could be overcome. Or perhaps domains could be > used to make that work for range types (sort of), but the result would > not be very consistent with the rest of the system. Yes, duplication can be avoided. > * It doesn't seem to be based in any mathematical argument. A type is a > set of values, and there's no reason it can't have several total orders; > or no total order at all. So it appears to just be piggybacking on the > type system infrastructure as a place to hold the metadata for a total > order. Yes, I agree that a type is a set of values, and a type can have 0..N total orders. My proposal is just that, for those types that have at least 1 total order, exactly 1 of those is defined to be used implicitly in contexts where a total order is desired and no explicit collation is given, such as in ranges. > * Who's to say that a "compare" function is the only way to specify a > total order? There might be other interfaces that would support > something closer to a lexicographic sort. So, from a theoretical > standpoint, trying to attach a single notion of total order to a type > seems strange, because there might be multiple interfaces for specifying > even one total order. Thank you for bringing this up, the notion of multiple interfaces for specifying even one total order. My example of a compare function was just an example, and it is valuable to consider that this may not be the only way to do it. > * It would require extra explicit type annotations. If you have 12 text > types, the only way to practically use any text type is to constantly > specify which more-specific text type it actually is (probably using > the :: operator). That is not necessarily a bad choice if starting a > language from scratch and forming the syntax in a way that it's > reasonable to do. But this is SQL, and lots of type annotations are > un-SQL-like. Well sometimes it doesn't hurt to suggest solutions from the point of view that one can start the language from scratch, because that provides a clean way to conceptualize and explain a feature. And then people can find some middle ground that adapts benefits from that idea for the feature without changing SQL more than needed. Witness the various improvements to Perl 5 that were first expressed in terms of Perl 6. -- Darren Duncan
On Jun27, 2011, at 02:48 , Jeff Davis wrote: > On Mon, 2011-06-27 at 00:56 +0200, Florian Pflug wrote: >> Well, there actually *is* some precedence for that kind of top-down >> (form a syntactic perspective) type inference. We *enforce* the cast >> in >> array[]::<arraytype> >> and actually for a very similar reason - without the case, there's no >> way of knowing which type of empty array was meant. I think we also > > That's a good point. > > Although, I'm not sure whether that's an argument that we can make the > type system work as-is, or if it means that we should add syntax like > ARRAY[]. It was meant as an argument for the former, i.e. for extending the type system (or rather the function call syntax, as I argue below). >> special-case >> 'literal'::<type> >> to use the input function of type directly, instead of first creating >> a text value and later casting it to <type>. > > That is certainly true. Quoted strings never start out as text, they > start out as "unknown" and wait for the type inference to determine the > type. I'm not entirely sure whether a quoted string followed by a cast > is briefly unknown and then cast, or if it's directly interpreted using > the cast's type input function. It's at least labelled with type "unknown" for a while AFAIK. > I don't know if that's a good example though because it's near the end > of the line and there's no function call in between the arguments and > the cast. It might get more complex with cases like: > > range(lower(range(1,2)),upper(range(1,2)))::int8range > > but maybe that can be done more easily than I think? I wouldn't take it that far. What I had in mind was to *only* support the case where the cast directly follows the function call, i.e. the case f(...)::type I view this more as an extension of the function call syntax than of type inference. In other languages with polymorphism, there usually is an explicit syntactic construct for specifying the type arguments to a polymorphic function. For example, C++ you'd write make_range<int>(3,4) to call the polymorphic function make_range() with it's (first) type argument set to "int". I think of f(...)::type as essentially the same thing, but re-using already existing syntax instead of inventing new one. I just checked - we currently special case "array[]::type" in transformExpr() by detecting the case of an array expression being the immediate child of a cast expression. I suggest we do the same for "f(...)::type", i.e. also special case a function call being the immediate child of a cast expression and pass down the forced result type to the function call node. Function call nodes would then usually ignore that passed-down result type, except in the case of a polymorphic functions whose argument types don't uniquely define its result type. But I haven't tried doing that, so there might be stumbling block down that road that I missed... best regards, Florian Pflug
On Mon, 2011-06-27 at 12:16 +0200, Florian Pflug wrote: > I wouldn't take it that far. What I had in mind was to *only* support > the case where the cast directly follows the function call, i.e. the case > f(...)::type OK, so instead of writing: range(lower(range(1,2)),upper(range(1,2)))::int8range users would write: range(lower(range(1,2)::int8range),upper(range(1,2)::int8range))::int8range A little more verbose, but it seems like it wouldn't be a practical problem in very many cases. Multiple levels of constructors seem like they'd be fairly uncommon, and probably a case where a function should be written anyway. OK, I'll have to think about this a little more, but it seems like a reasonable approach. Regards,Jeff Davis
On Sun, 2011-06-26 at 22:29 -0700, Darren Duncan wrote: > Tom Lane wrote: > > Darren Duncan <darren@darrenduncan.net> writes: > >> I believe that the best general solution here is for every ordered base type to > >> just have a single total order, which is always used with that type in any > >> generic order-sensitive operation, including any ranges defined over it, or any > >> ORDER BY or any <,>,etc. > > > > We've spent years and blood on making sure that Postgres could support > > multiple orderings for any datatype; and there are plenty of natural > > examples for the usefulness of that. So I'm not at all impressed by > > any line of reasoning that starts out by baldly throwing that away. > > I'm not saying that you can't use multiple orderings with a data type. I'm just > saying that the type only has *at most* one (possibly none) *native* ordering, > which is what is used when you do something ordered-sensitive with the type, > such as have a range. So, are you saying that it would be impossible to have a range that uses a different ordering? What about ORDER BY? What about BTrees? And if those things can use different orders for the same type, then what is the difference between what you are suggesting and a default ordering for the type (which we already support)? I suppose it's hard to tell what you mean by "native". Regards,Jeff Davis
Jeff Davis wrote: > On Sun, 2011-06-26 at 22:29 -0700, Darren Duncan wrote: >> Tom Lane wrote: >>> Darren Duncan <darren@darrenduncan.net> writes: >>>> I believe that the best general solution here is for every ordered base type to >>>> just have a single total order, which is always used with that type in any >>>> generic order-sensitive operation, including any ranges defined over it, or any >>>> ORDER BY or any <,>,etc. >>> We've spent years and blood on making sure that Postgres could support >>> multiple orderings for any datatype; and there are plenty of natural >>> examples for the usefulness of that. So I'm not at all impressed by >>> any line of reasoning that starts out by baldly throwing that away. >> I'm not saying that you can't use multiple orderings with a data type. I'm just >> saying that the type only has *at most* one (possibly none) *native* ordering, >> which is what is used when you do something ordered-sensitive with the type, >> such as have a range. > > So, are you saying that it would be impossible to have a range that uses > a different ordering? What about ORDER BY? What about BTrees? > > And if those things can use different orders for the same type, then > what is the difference between what you are suggesting and a default > ordering for the type (which we already support)? > > I suppose it's hard to tell what you mean by "native". > > Regards, > Jeff Davis Maybe I'm just talking about "default ordering" then. -- Darren Duncan
On Sat, Jun 25, 2011 at 6:29 PM, Jeff Davis <pgsql@j-davis.com> wrote: > Different ranges over the same subtype make sense when using different > total orders for the subtype. This is most apparent with text collation, > but makes sense (at least mathematically, if not practically) for any > subtype. > > For instance: > [a, Z) > is a valid range in "en_US", but not in "C", so it makes sense to have > multiple ranges over the same subtype with different collations. > > But what if you have a function (like a constructor), of the form: > (anyelement, anyelement) -> anyrange > ? To work with the type system, you need to be able to figure out the > return type from the arguments; which means to support functions like > this we need a mapping from the subtype to the range type. > Unfortunately, that restricts us to one range type per subtype (this > isn't a problem for ARRAYs, because there is only one useful array type > for a given element type). > > This problem first came up a while ago: > http://archives.postgresql.org/pgsql-hackers/2011-01/msg02788.php > > My workaround was to use domains, but that's not a very clean solution > (you have to add a bunch of casts to make sure the right domain is > chosen). It became entirely unworkable with collations, because people > would be using different text collations a lot more frequently than, > say, a different ordering for timestamptz. Tom mentioned that here: > > http://archives.postgresql.org/message-id/24831.1308579443@sss.pgh.pa.us > > I think Florian proposed the most promising line of attack here: > > http://archives.postgresql.org/message-id/AD4FC75D-DB99-48ED-9082-52EE3A4D74A6@phlo.org > > by suggesting that functions of the form: > (anyelement, [other non-anyrange arguments]) -> anyrange > might be expendable. After all, they are only useful for constructors as > far as we can tell. Other range functions will have an anyrange > parameter, and we can use the actual type of the argument to know the > range type (as well as the subtype). > > Although it's very nice to be able to say: > range(1,10) > and get an int4range out of it, it's not the only way, and it's not > without its problems anyway. For instance, to get an int8range you have > to do: > range(1::int8, 10::int8) > or similar. > > So, we could just make functions like: > int4range(int4, int4) > int8range(int8, int8) > ... > when creating the range type, and it would actually be a usability > improvement. Couldn't we also do neither of these things? I mean, presumably '[1,10]'::int8range had better work. I'm not saying that's ideal from a usability perspective but I fear this patch is going to be unmanageably large, and separating out the things that you need for it to work at all from the things that you need in order for it to be convenient might have some merit. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, 2011-06-27 at 14:50 -0400, Robert Haas wrote: > Couldn't we also do neither of these things? I mean, presumably > '[1,10]'::int8range had better work. I think that if we combine this idea with Florian's "PAIR" suggestion here: http://archives.postgresql.org/message-id/AD4FC75D-DB99-48ED-9082-52EE3A4D74A6@phlo.org then I think we have a solution. If we add a type RANGEINPUT that is not a pseudotype, we can use that as an intermediate type that is returned by range constructors. Then, we add casts from RANGEINPUT to each range type. That would allow range(1,2)::int8range to work without changing the type system around, because range() would have the signature: range(ANYELEMENT, ANYELEMENT) -> RANGEINPUT and then the cast would change it into an int8range. But we only need the one cast per range type, and we can also support all of the other kinds of constructors like: range_cc(ANYELEMENT, ANYELEMENT) -> RANGEINPUT range_linf_c(ANYELEMENT) -> RANGEINPUT without additional hassle. The RANGEINPUT type itself would hold similar information to actual range types: the subtype OID (instead of the range type, because it's not a range yet), optionally the two bounds (depending on the flags), and the flags byte. The cast to a real range type would read the subtype, and try to coerce the bounds to the subtype of the range you're casting to, set the range type oid, leave the flags byte the same, and it's done. So, in effect, RANGEINPUT is a special type used only for range constructors. If someone tried to output it, it would throw an exception, and we'd even have enough information at that point to print a nice error message with a hint. Actually, this is pretty much exactly Florian's idea (thanks again, Florian), but at the time I didn't like it because "pair" didn't capture everything that I wanted to capture, like infinite bounds, etc. But there's no reason that it can't, and your point made me realize that -- you are effectively just using TEXT as the intermediate type (which works, but has some undesirable characteristics). Do we think that this is a good way forward? The only thing I can think of that's undesirable is that it's not normal to be required to cast the result of a function, and might be slightly difficult to explain in the documentation in a straightforward way. Regards,Jeff Davis
On Mon, Jun 27, 2011 at 11:42 PM, Jeff Davis <pgsql@j-davis.com> wrote: > So, in effect, RANGEINPUT is a special type used only for range > constructors. If someone tried to output it, it would throw an > exception, and we'd even have enough information at that point to print > a nice error message with a hint. I don't think I like the idea of throwing an error when you try to output it, but the rest seems reasonably sensible. > Actually, this is pretty much exactly Florian's idea (thanks again, > Florian), but at the time I didn't like it because "pair" didn't capture > everything that I wanted to capture, like infinite bounds, etc. But > there's no reason that it can't, and your point made me realize that -- > you are effectively just using TEXT as the intermediate type (which > works, but has some undesirable characteristics). What undesirable characteristics? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Jun 27, 2011, at 8:42 PM, Jeff Davis wrote: > Do we think that this is a good way forward? The only thing I can think > of that's undesirable is that it's not normal to be required to cast the > result of a function, and might be slightly difficult to explain in the > documentation in a straightforward way That's the part that bothers me. I think that if it's not cast it should somehow be useful. Maybe default to a text rangeor something? Best, David
On Tue, 2011-06-28 at 10:58 -0400, Robert Haas wrote: > On Mon, Jun 27, 2011 at 11:42 PM, Jeff Davis <pgsql@j-davis.com> wrote: > > So, in effect, RANGEINPUT is a special type used only for range > > constructors. If someone tried to output it, it would throw an > > exception, and we'd even have enough information at that point to print > > a nice error message with a hint. > > I don't think I like the idea of throwing an error when you try to > output it, but the rest seems reasonably sensible. I thought it might add a little confusion if people thought they had a range type but really had RANGEINPUT. For instance, if you do a "create table as select range(1,2)" then the result might be slightly unexpected. But it's probably no more unexpected than "create table as select 'foo'". So, I suppose there's not much reason to throw an error. We can just output it in the same format as a range type. It's also much easier to explain something in the documentation that has an output format, because at least it's tangible. > > Actually, this is pretty much exactly Florian's idea (thanks again, > > Florian), but at the time I didn't like it because "pair" didn't capture > > everything that I wanted to capture, like infinite bounds, etc. But > > there's no reason that it can't, and your point made me realize that -- > > you are effectively just using TEXT as the intermediate type (which > > works, but has some undesirable characteristics). > > What undesirable characteristics? Well, for one, outputting something as text and then reading it back in does not always produce the same value. For instance, for float, it only does that if you have extra_float_digits set to some high-enough value. I suppose I could save the GUC, set it, and set it back; but that seems like unnecessary ugliness. There's also the deparsing/reparsing cycle. That might not really matter for performance, but it seems unnecessary. And there's always the fallback that "we have types for a reason". Wouldn't it be odd if you wrote a query like: select range(1,2) || 'foo' and it succeeded? I'm sure that kind of thing can lead to some dangerous situations. Regards,Jeff Davis
On Tue, 2011-06-28 at 09:30 -0700, David E. Wheeler wrote: > On Jun 27, 2011, at 8:42 PM, Jeff Davis wrote: > > > Do we think that this is a good way forward? The only thing I can think > > of that's undesirable is that it's not normal to be required to cast the > > result of a function, and might be slightly difficult to explain in the > > documentation in a straightforward way > > That's the part that bothers me. Yeah, that bothered me, too. > I think that if it's not cast it should somehow be useful. Let's see, what can one do with a range that has no ordering yet? ;) Robert suggested that we don't need to throw an error, and I think I agree. Just having a working output function solves most of the documentation problem, because it makes it less abstract. The only operators that we could really support are accessors, which seems somewhat reasonable. However, I'd have some concerns even about that, because if you do range(10,1), then what's the upper bound? > Maybe default to a text range or something? That sounds a little dangerous: select range('1','09') would fail before it could be cast to int4range. We could invent an UNKNOWNRANGE type or something. But I don't particularly like that; it would start out working nicely when people only had one textrange type, and then their old queries would start failing when they added another range type based on text. I think it's fine if the RANGEINPUT type isn't too useful by itself. It's already a common requirement to cast unknown literals, and this isn't too much different. It's only for constructors, so it still fits pretty closely with that idea. Regards,Jeff Davis
On Tue, Jun 28, 2011 at 12:58 PM, Jeff Davis <pgsql@j-davis.com> wrote: > On Tue, 2011-06-28 at 10:58 -0400, Robert Haas wrote: >> On Mon, Jun 27, 2011 at 11:42 PM, Jeff Davis <pgsql@j-davis.com> wrote: >> > So, in effect, RANGEINPUT is a special type used only for range >> > constructors. If someone tried to output it, it would throw an >> > exception, and we'd even have enough information at that point to print >> > a nice error message with a hint. >> >> I don't think I like the idea of throwing an error when you try to >> output it, but the rest seems reasonably sensible. > > I thought it might add a little confusion if people thought they had a > range type but really had RANGEINPUT. For instance, if you do a "create > table as select range(1,2)" then the result might be slightly > unexpected. True... > But it's probably no more unexpected than "create table as select > 'foo'". So, I suppose there's not much reason to throw an error. We can > just output it in the same format as a range type. +1. > It's also much easier to explain something in the documentation that has > an output format, because at least it's tangible. +1. >> > Actually, this is pretty much exactly Florian's idea (thanks again, >> > Florian), but at the time I didn't like it because "pair" didn't capture >> > everything that I wanted to capture, like infinite bounds, etc. But >> > there's no reason that it can't, and your point made me realize that -- >> > you are effectively just using TEXT as the intermediate type (which >> > works, but has some undesirable characteristics). >> >> What undesirable characteristics? > > Well, for one, outputting something as text and then reading it back in > does not always produce the same value. For instance, for float, it only > does that if you have extra_float_digits set to some high-enough value. > I suppose I could save the GUC, set it, and set it back; but that seems > like unnecessary ugliness. Yeah, I don't think we want to go there. > There's also the deparsing/reparsing cycle. That might not really matter > for performance, but it seems unnecessary. > > And there's always the fallback that "we have types for a reason". > Wouldn't it be odd if you wrote a query like: > select range(1,2) || 'foo' > and it succeeded? I'm sure that kind of thing can lead to some dangerous > situations. That's pretty much what we tried to get rid of with the 8.3 casting changes, so agreed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Jun28, 2011, at 05:42 , Jeff Davis wrote: > On Mon, 2011-06-27 at 14:50 -0400, Robert Haas wrote: >> Couldn't we also do neither of these things? I mean, presumably >> '[1,10]'::int8range had better work. > > I think that if we combine this idea with Florian's "PAIR" suggestion > here: > http://archives.postgresql.org/message-id/AD4FC75D-DB99-48ED-9082-52EE3A4D74A6@phlo.org > > then I think we have a solution. > > If we add a type RANGEINPUT that is not a pseudotype, we can use that as > an intermediate type that is returned by range constructors. Then, we > add casts from RANGEINPUT to each range type. That would allow > range(1,2)::int8range > to work without changing the type system around, because range() would > have the signature: > range(ANYELEMENT, ANYELEMENT) -> RANGEINPUT > and then the cast would change it into an int8range. But we only need > the one cast per range type, and we can also support all of the other > kinds of constructors like: > range_cc(ANYELEMENT, ANYELEMENT) -> RANGEINPUT > range_linf_c(ANYELEMENT) -> RANGEINPUT > without additional hassle. Hm, so RANGEINPUT would actually be what was previously discussed as the "range as a pair of bounds" definition, as opposed to the "range as a set of values" definition. So essentially we'd add a second concept of what a "range" is to work around the range input troubles. I believe if we go that route we should make RANGEINPUT a full-blown type, having "pair of bound" semantics. Adding a lobotomized version just for the sake of range input feels a bit like a kludge to me. Alternatively, we could replace RANGEINPUT simply with ANYARRAY, and add functions ANYRANGE->ANYRANGE which allow specifying the bound operator (<, <= respectively >,>=) after construction. So you'd write (using the functions-as-fields syntax I believe we support) (ARRAY[1,2]::int8range).left_open.right_closed for '(1,2]' and ARRAY[NULL,2]::int8range for '[-inf,2]' All assuming that modifying the type system to support polymorphic type resolution based on the return type is out of the question... ;-) best regards, Florian Pflug
On Tue, 2011-06-28 at 22:20 +0200, Florian Pflug wrote: > Hm, so RANGEINPUT would actually be what was previously discussed as > the "range as a pair of bounds" definition, as opposed to the > "range as a set of values" definition. So essentially we'd add a > second concept of what a "range" is to work around the range input > troubles. > > I believe if we go that route we should make RANGEINPUT a full-blown > type, having "pair of bound" semantics. Adding a lobotomized version > just for the sake of range input feels a bit like a kludge to me. I think David Wheeler was trying to make a similar point, but I'm still not convinced. It's not a pair, because it can be made up of 0, 1, or 2 scalar values (unless you count infinity as one of those values, in which case 0 or 2). And without ordering, it's not clear that those values are really "bounds". The type needs to:* represent two values, either of which might be a special infinite value* represent the value "empty"* represent inclusivity/exclusivity of both values and those things seem fairly specific to ranges, so I don't really see what other use we'd have for such a type. But I'm open to suggestion. I don't think that having an extra type around is so bad. It solves a lot of problems, and doesn't seem like it would get in the way. And it's only for the construction of ranges out of scalars, which seems like the most natural place where a cast might be required (similar to casting an unknown literal, which is fairly common). > Alternatively, we could replace RANGEINPUT simply with ANYARRAY, > and add functions ANYRANGE->ANYRANGE which allow specifying the > bound operator (<, <= respectively >,>=) after construction. > > So you'd write (using the functions-as-fields syntax I believe > we support) > (ARRAY[1,2]::int8range).left_open.right_closed for '(1,2]' > and > ARRAY[NULL,2]::int8range for '[-inf,2]' I think we can rule this one out:* The functions-as-fields syntax is all but deprecated (or should be)* That's hardly a readabilityimprovement* It still suffers similar problems as casting back and forth to text: ANYARRAY is too general, doesn't really take advantage of the type system, and not a great fit anyway. > All assuming that modifying the type system to support polymorphic > type resolution based on the return type is out of the question... ;-) It's still not out of the question, but I thought that the intermediate type would be a less-intrusive alternative (and Robert seemed concerned about how intrusive it was). There also might be a little more effort educating users if we selected the function based on the return type, because they might think that casting the inputs explicitly would be enough to get it to pick the right function. If it were a new syntax like RANGE[]::int8range, then I think it would be easier to understand. Regards,Jeff Davis
On Jun29, 2011, at 05:02 , Jeff Davis wrote: > On Tue, 2011-06-28 at 22:20 +0200, Florian Pflug wrote: >> I believe if we go that route we should make RANGEINPUT a full-blown >> type, having "pair of bound" semantics. Adding a lobotomized version >> just for the sake of range input feels a bit like a kludge to me. > > It's not a pair, because it can be made up of 0, 1, or 2 scalar values > (unless you count infinity as one of those values, in which case 0 or > 2). And without ordering, it's not clear that those values are really > "bounds". Hm, yeah, the lack of an ordering operator is trouble. There also seem to be more problems with that idea, see below for that. So scratch the idea of turning RANGEINPUT into a full-blown type... > I don't think that having an extra type around is so bad. It solves a > lot of problems, and doesn't seem like it would get in the way. And it's > only for the construction of ranges out of scalars, which seems like the > most natural place where a cast might be required (similar to casting an > unknown literal, which is fairly common). What I'm concerned about is how elegantly we'd be able to tie up all the loose ends. What'd be the result of select range(1,2) for example? Or create table (r rangeinput) for that matter. I think we'd want to forbid both of these, and more or less every other use except range(1,2)::<some range type> but that seems to require special-casing RANGEINPUT in a lot of places. If we don't restrict RANGEINPUT that way, I think we ought to provide at least a basic set of operators and functions for it - e.g. input, output, lower(), upper(), ... *Pondering this* But we can't do that easily, since RANGEINPUT would actually be a kind of VARIANT type (i.e. can hold values of arbitrary types). That's something that our type system doesn't really support. We do have RECORD, which is similar in a way, but its implementation is about as intrusive as it gets... >> Alternatively, we could replace RANGEINPUT simply with ANYARRAY, >> and add functions ANYRANGE->ANYRANGE which allow specifying the >> bound operator (<, <= respectively >,>=) after construction. >> >> So you'd write (using the functions-as-fields syntax I believe >> we support) >> (ARRAY[1,2]::int8range).left_open.right_closed for '(1,2]' >> and >> ARRAY[NULL,2]::int8range for '[-inf,2]' > > I think we can rule this one out: > * The functions-as-fields syntax is all but deprecated (or should be) Is it? That's actually too bad, since I kinda like it. But anyway, if that's a concern it could also be range_bounds(ARRAY[1,2]::int8range, '(]') > * That's hardly a readability improvement Granted, it won't win any beauty contest, but > * It still suffers similar problems as casting back and forth to text: > ANYARRAY is too general, doesn't really take advantage of the type > system, and not a great fit anyway. I believe it alleviates the gravest problems of casting back and forth to text. It doesn't have quoting issues and it doesn't potentially lose information. In any case, I wouldn't expect this to *stay* the only way to construct a range forever. But I does have it's virtues for a first incarnation of range type, I believe, mostly because it's completely unintrusive and won't cause any backwards-compatbility headaches in the future >> All assuming that modifying the type system to support polymorphic >> type resolution based on the return type is out of the question... ;-) > > It's still not out of the question, but I thought that the intermediate > type would be a less-intrusive alternative (and Robert seemed concerned > about how intrusive it was). I fear that the intermediate type will turn out to be quite intrusive, at least if we try to handle all the corner cases and loose ends. And if we don't, I'm concerned that we're painting ourselves into a corner here... > There also might be a little more effort educating users if we selected > the function based on the return type, because they might think that > casting the inputs explicitly would be enough to get it to pick the > right function. If it were a new syntax like RANGE[]::int8range, then I > think it would be easier to understand. There's certainly a risk of confusion here, simply because the relationship between ANYRANGE and ANYLEMENT will be quite different than that of ANYARRAY and ANYLEMENT. All we can do is state this very clearly in the docs I think, and explain that it must be that way to support multiple range types over the same base type. best regards, Florian Pflug
On Tue, Jun 28, 2011 at 11:02 PM, Jeff Davis <pgsql@j-davis.com> wrote: > It's still not out of the question, but I thought that the intermediate > type would be a less-intrusive alternative (and Robert seemed concerned > about how intrusive it was). I'm no great fan of our existing type system, and I'm not opposed to trying to improve it. However, I'm a bit wary of the theory that we can just tweak X, Y, or Z and then everything will go more smoothly for range types. I fear that there will be knock-on consequences that we'll spend a lot of time either (a) arguing about or (b) fixing. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, 2011-06-29 at 13:35 +0200, Florian Pflug wrote: > What I'm concerned about is how elegantly we'd be able to tie up all > the loose ends. What'd be the result of > select range(1,2) > for example? Or > create table (r rangeinput) > for that matter. > > I think we'd want to forbid both of these, and more or less every other > use except > range(1,2)::<some range type> > but that seems to require special-casing RANGEINPUT in a lot of places. We could make it a pseudo-type and make the IO functions generate exceptions. That should prevent most mistakes and effectively hide it from the user (sure, they could probably use it somewhere if they really want to, but I wouldn't be worried about breaking backwards compatibility with undocumented usage like that). There are plenty of types that are hidden from users in one way or another -- trigger, void, internal, fdw_handler, etc., so I don't see this as special-casing at all. That might make it slightly harder to document, but I think it can be done. All we have to do is document the range constructors saying "you must cast the result to a valid range type; trying to use the result of these functions directly raises an exception". In fact, I think I'll take back the "hard to document" claim from before: it will be pretty easy to document, and if someone gets it wrong, we can throw a helpful error and hint. Robert didn't really seem to like the idea of throwing an error though -- Robert, can you expand on your reasoning here? I tend to lean toward throwing an error as well, but I don't really have much of an opinion. > If we don't restrict RANGEINPUT that way, I think we ought to provide > at least a basic set of operators and functions for it - e.g. > input, output, lower(), upper(), ... > > *Pondering this* > > But we can't do that easily, since RANGEINPUT would actually be a kind of > VARIANT type (i.e. can hold values of arbitrary types). That's something > that our type system doesn't really support. We do have RECORD, which is > similar in a way, but its implementation is about as intrusive as it > gets... I don't want to go down the road of making this a fully supported type. I don't see any use case for it at all, and I think it's a bad idea to design something with no idea how people might want to use it. > Is it? That's actually too bad, since I kinda like it. But anyway, > if that's a concern it could also be > range_bounds(ARRAY[1,2]::int8range, '(]') What type would the result of that be? What value? > > * It still suffers similar problems as casting back and forth to text: > > ANYARRAY is too general, doesn't really take advantage of the type > > system, and not a great fit anyway. > > I believe it alleviates the gravest problems of casting back and forth > to text. It doesn't have quoting issues and it doesn't potentially lose > information. I think it still circumvents the type system to a degree. We're just putting stuff in an array with no intention of really using it that way. > In any case, I wouldn't expect this to *stay* the only way to construct > a range forever. But I does have it's virtues for a first incarnation of > range type, I believe, mostly because it's completely unintrusive and > won't cause any backwards-compatbility headaches in the future I'm not sure that your overloading of arrays is completely immune from backwards-compatibility problems, should we decide to change it later. But regardless, we have quite a lot of time to make a decision before 9.2 is released; so let's do it once and do it right. > I fear that the intermediate type will turn out to be quite intrusive, > at least if we try to handle all the corner cases and loose ends. And if > we don't, I'm concerned that we're painting ourselves into a corner here... Can you expand on some of the corner-cases and loose ends you're concerned about? Does marking it as a pseudotype and making the IO functions throw exceptions handle them? Regards,Jeff Davis
On Wed, 2011-06-29 at 08:52 -0400, Robert Haas wrote: > On Tue, Jun 28, 2011 at 11:02 PM, Jeff Davis <pgsql@j-davis.com> wrote: > > It's still not out of the question, but I thought that the intermediate > > type would be a less-intrusive alternative (and Robert seemed concerned > > about how intrusive it was). > > I'm no great fan of our existing type system, and I'm not opposed to > trying to improve it. However, I'm a bit wary of the theory that we > can just tweak X, Y, or Z and then everything will go more smoothly > for range types. I fear that there will be knock-on consequences that > we'll spend a lot of time either (a) arguing about or (b) fixing. Agreed. Regards,Jeff Davis
On Wed, Jun 29, 2011 at 11:41 AM, Jeff Davis <pgsql@j-davis.com> wrote: > Robert didn't really seem to like the idea of throwing an error though > -- Robert, can you expand on your reasoning here? I guess I don't have any terribly well-thought out reasoning - maybe it's fine. It just seems strange to have a type that you can't display. But now that I'm thinking about this a little more, I'm worried about this case: CREATE TABLE foo AS RANGE('something'::funkytype, 'somethingelse'::funktype); DROP TYPE funkytype; It seems to me that the first statement had better fail, or else the second one is going to create a hopeless mess (imagine that a new type comes along and gets the OID of funkytype). It also seems a bit strange to me that we're contemplating a system where users are always going to have to cast the return type. Generally, casts are annoying and we want to minimize the need for them. I'm not sure what the alternative is, though, unless we create separate constructor functions for each type: int8range_cc(1, 2). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Jun 28, 2011, at 8:02 PM, Jeff Davis wrote: > I think David Wheeler was trying to make a similar point, but I'm still > not convinced. > > It's not a pair, because it can be made up of 0, 1, or 2 scalar values > (unless you count infinity as one of those values, in which case 0 or > 2). And without ordering, it's not clear that those values are really > "bounds". > > The type needs to: > * represent two values, either of which might be a special infinite > value > * represent the value "empty" > * represent inclusivity/exclusivity of both values > > and those things seem fairly specific to ranges, so I don't really see > what other use we'd have for such a type. But I'm open to suggestion. > > I don't think that having an extra type around is so bad. It solves a > lot of problems, and doesn't seem like it would get in the way. And it's > only for the construction of ranges out of scalars, which seems like the > most natural place where a cast might be required (similar to casting an > unknown literal, which is fairly common). I'm fine with that, but my point is that if it's going to be exposed to users somehow, it needs to be useful on its own,without casting. Because some wit will make a column of this type. If it's not somehow useful on its own, then it shouldbe an implementation detail or internal that I never see in SQL. IMHO. Best, David
On Jun 29, 2011, at 8:41 AM, Jeff Davis wrote: > We could make it a pseudo-type and make the IO functions generate > exceptions. That should prevent most mistakes and effectively hide it > from the user (sure, they could probably use it somewhere if they really > want to, but I wouldn't be worried about breaking backwards > compatibility with undocumented usage like that). There are plenty of > types that are hidden from users in one way or another -- trigger, void, > internal, fdw_handler, etc., so I don't see this as special-casing at > all. That could work. > I don't want to go down the road of making this a fully supported type. > I don't see any use case for it at all, and I think it's a bad idea to > design something with no idea how people might want to use it. +1 I'm still not clear, though, on why the return type of range() should not be related to the types of its arguments. So range(1, 5) Should return intrange, and range(1::int8, 5::int8) Should return int8range, and range('foo', 'foooo') Should return textrange. Best, David
On Jun29, 2011, at 18:34 , Robert Haas wrote: > It also seems a bit strange to me that we're contemplating a system > where users are always going to have to cast the return type. > Generally, casts are annoying and we want to minimize the need for > them. I'm not sure what the alternative is, though, unless we create > separate constructor functions for each type: int8range_cc(1, 2). Well, if we want multiple range types per base type (which we do), then the user needs to specify which one to use somehow. A cast seems the most natural way to do that to me - after all, casting is *the* way to coerce value to a certain type. best regards, Florian Pflug
On Jun29, 2011, at 19:05 , David E. Wheeler wrote: > I'm still not clear, though, on why the return type of range() > should not be related to the types of its arguments. So > > range(1, 5) > > Should return intrange, and > > range(1::int8, 5::int8) > > Should return int8range, and > > range('foo', 'foooo') > > Should return textrange. Because there might be more than one range type for a base type. Say there are two range types over text, one with collation 'de_DE' and one with collation 'en_US'. What would the type of range('foo', 'foooo') be? best regards, Florian Pflug
On Jun 29, 2011, at 10:13 AM, Florian Pflug wrote: > Because there might be more than one range type for a > base type. Say there are two range types over text, one > with collation 'de_DE' and one with collation 'en_US'. > What would the type of > range('foo', 'foooo') > be? The one that corresponds to the current LC_COLLATE setting. Best, David
On ons, 2011-06-29 at 10:15 -0700, David E. Wheeler wrote: > On Jun 29, 2011, at 10:13 AM, Florian Pflug wrote: > > > Because there might be more than one range type for a > > base type. Say there are two range types over text, one > > with collation 'de_DE' and one with collation 'en_US'. > > What would the type of > > range('foo', 'foooo') > > be? > > The one that corresponds to the current LC_COLLATE setting. Yes, or more generally, we have logic that determines, for example, what collation to use for 'foo' < 'foooo' The same logic can be used to determine what collation to use for range('foo', 'foooo') (In fact, if you implement range() as a user-space function, that will happen automatically.)
On Jun29, 2011, at 23:44 , Peter Eisentraut wrote: > On ons, 2011-06-29 at 10:15 -0700, David E. Wheeler wrote: >> On Jun 29, 2011, at 10:13 AM, Florian Pflug wrote: >>> Because there might be more than one range type for a >>> base type. Say there are two range types over text, one >>> with collation 'de_DE' and one with collation 'en_US'. >>> What would the type of >>> range('foo', 'foooo') >>> be? >> >> The one that corresponds to the current LC_COLLATE setting. > > Yes, or more generally, we have logic that determines, for example, what > collation to use for > > 'foo' < 'foooo' > > The same logic can be used to determine what collation to use for > > range('foo', 'foooo') > > (In fact, if you implement range() as a user-space function, that will > happen automatically.) I don't think it will - as it stands, there isn't a single collatable type RANGE but instead one *distinct* type per combination of base type, btree opclass and collation. The reasons for that were discussed at length - the basic argument for doing it that way was to make a range represent a fixed set of values. There's also no guarantee that a range type with collation LC_COLLATE even exists. best regards, Florian Pflug
On tor, 2011-06-30 at 08:45 +0200, Florian Pflug wrote: > I don't think it will - as it stands, there isn't a single collatable > type RANGE but instead one *distinct* type per combination of base > type, btree opclass and collation. The reasons for that were discussed > at length - the basic argument for doing it that way was to make a > range represent a fixed set of values. How would the system catalogs be initialized under that theory: surely you're not going to seed (nr. of types) * (nr. of collations) * (nr. of opclasses) range types in initdb?
On Jun30, 2011, at 09:05 , Peter Eisentraut wrote: > On tor, 2011-06-30 at 08:45 +0200, Florian Pflug wrote: >> I don't think it will - as it stands, there isn't a single collatable >> type RANGE but instead one *distinct* type per combination of base >> type, btree opclass and collation. The reasons for that were discussed >> at length - the basic argument for doing it that way was to make a >> range represent a fixed set of values. > > How would the system catalogs be initialized under that theory: surely > you're not going to seed (nr. of types) * (nr. of collations) * (nr. of > opclasses) range types in initdb? There's CREATE RANGE. By default, no range types would exists I believe. best regards, Florian Pflug
On Jun29, 2011, at 17:41 , Jeff Davis wrote: >> Is it? That's actually too bad, since I kinda like it. But anyway, >> if that's a concern it could also be >> range_bounds(ARRAY[1,2]::int8range, '(]') > > What type would the result of that be? What value? ARRAY[1,2]::int8range would return an int8range instance representing the range [1,2] (i.e. the set of values {1,2}). range_bounds would then modify the left bound to be exclusive, and thus return the int8range (1,2] (i.e. the set of values {2}). range_bounds would have the signature range_bounds(anyrange) returns anyrange. I do think we'll probably want to have functions to modify the boundary type (open or closed) anyway, so it wouldn't be that huge of a deal if the range constructor didn't let you specify them. Empty ranges would be constructed by ARRAY[]::int8range (Hm, ok, now I'm cheating... Currently you'd need to write ARRAY[]::int8[]::int8range, but fixing that only needs a few lines in the transformExpression* function that makes ARRAY[]::int8[] work). >>> * It still suffers similar problems as casting back and forth to text: >>> ANYARRAY is too general, doesn't really take advantage of the type >>> system, and not a great fit anyway. >> >> I believe it alleviates the gravest problems of casting back and forth >> to text. It doesn't have quoting issues and it doesn't potentially lose >> information. > > I think it still circumvents the type system to a degree. We're just > putting stuff in an array with no intention of really using it that way. Well, arrays are containers, and we need two values to construct a range, so putting them into a container first and then creating the range from that doesn't seem so bad to me. We do use the full set of features that arrays provide, since we only ever expect zero, one or two entries. But I don't think this is different from functions who only support single-dimensional arrays - they too choose to use only a subset of the features set of arrays. >> In any case, I wouldn't expect this to *stay* the only way to construct >> a range forever. But I does have it's virtues for a first incarnation of >> range type, I believe, mostly because it's completely unintrusive and >> won't cause any backwards-compatbility headaches in the future > > I'm not sure that your overloading of arrays is completely immune from > backwards-compatibility problems, should we decide to change it later. > > But regardless, we have quite a lot of time to make a decision before > 9.2 is released; so let's do it once and do it right. > >> I fear that the intermediate type will turn out to be quite intrusive, >> at least if we try to handle all the corner cases and loose ends. And if >> we don't, I'm concerned that we're painting ourselves into a corner here... > > Can you expand on some of the corner-cases and loose ends you're > concerned about? Does marking it as a pseudotype and making the IO > functions throw exceptions handle them? Hm, I guess. I'm sill no huge fan of RANGEINPUT, but if we prevent it from being used as a column type and from being used as an argument type, then I guess it's workable... Btw, what happened to the idea of making RANGE(...) a special syntactic construct instead of a normal function call? Did we discard that for its intrusiveness, or were there other reasons? best regards, Florian Pflug
On Thu, 2011-06-30 at 09:11 +0200, Florian Pflug wrote: > > How would the system catalogs be initialized under that theory: surely > > you're not going to seed (nr. of types) * (nr. of collations) * (nr. of > > opclasses) range types in initdb? > > There's CREATE RANGE. Right. In that respect, it's more like a record type: many possible record types exist, but you only define the ones you want. > By default, no range types would exists I believe. I was planning to include _some_ by default. Probably not text ranges, but integer and timestamp[tz] ranges. If nothing else, it makes it easier to document. Regards,Jeff Davis
On Wed, 2011-06-29 at 10:15 -0700, David E. Wheeler wrote: > On Jun 29, 2011, at 10:13 AM, Florian Pflug wrote: > > > Because there might be more than one range type for a > > base type. Say there are two range types over text, one > > with collation 'de_DE' and one with collation 'en_US'. > > What would the type of > > range('foo', 'foooo') > > be? > > The one that corresponds to the current LC_COLLATE setting. Then how do you get a text range that doesn't correspond to the LC_COLLATE setting? Does that mean you couldn't dump/reload from a system with one collation and get the same values in a system with a different collation? That would be very strange. Or, what about other types that just happen to have multiple useful total orders? Regards,Jeff Davis
On Wed, 2011-06-29 at 12:34 -0400, Robert Haas wrote: > But now that I'm thinking about this a little more, I'm worried about this case: > > CREATE TABLE foo AS RANGE('something'::funkytype, 'somethingelse'::funktype); > DROP TYPE funkytype; > > It seems to me that the first statement had better fail, or else the > second one is going to create a hopeless mess (imagine that a new type > comes along and gets the OID of funkytype). Interesting point. I don't think it's a problem because pseudo-types can't be used that way, so that provides us a mechanism to stop it. But it means that we have to be a little more sure that such values can't persist anywhere. Regards,Jeff Davis
On Jun 30, 2011, at 9:29 AM, Jeff Davis wrote: > Right. In that respect, it's more like a record type: many possible > record types exist, but you only define the ones you want. Well, okay. How is this same problem handled for RECORD types, then? >> By default, no range types would exists I believe. > > I was planning to include _some_ by default. Probably not text ranges, > but integer and timestamp[tz] ranges. If nothing else, it makes it > easier to document. +1 David
On Jun 30, 2011, at 9:34 AM, Jeff Davis wrote: > Then how do you get a text range that doesn't correspond to the > LC_COLLATE setting? You cast it. > Does that mean you couldn't dump/reload from a > system with one collation and get the same values in a system with a > different collation? That would be very strange. No, pg_dump should always explicitly cast things. But there should be a reasonable default behavior if I'm in psql and don'tcast. > Or, what about other types that just happen to have multiple useful > total orders? Cast where you need it explicit, and have a reasonable default when it's not cast. Best, David
On Thu, 2011-06-30 at 09:58 -0700, David E. Wheeler wrote: > On Jun 30, 2011, at 9:29 AM, Jeff Davis wrote: > > > Right. In that respect, it's more like a record type: many possible > > record types exist, but you only define the ones you want. > > Well, okay. How is this same problem handled for RECORD types, then? What problem, exactly? For a given list of subtypes, there is only one valid record type. Also, record is not a great example. The implementation uses at least one pretty horrible hack. Regards,Jeff Davis
On Thu, 2011-06-30 at 09:59 -0700, David E. Wheeler wrote: > On Jun 30, 2011, at 9:34 AM, Jeff Davis wrote: > > > Then how do you get a text range that doesn't correspond to the > > LC_COLLATE setting? > > You cast it. My original solution was something like this, except involving domains. With a sufficient amount of casting of all arguments to anything involving a range type, it works, but it's a little too un-SQL-like. There was at least one fairly strong objection to my approach, but if you have some further thoughts along that line, I'm open to suggestion. Also, what if the LC_COLLATE is C, and you want to cast it to en_US collation? range('a','Z') would be invalid in the C locale, and it would fail before you had a chance to cast it. > Cast where you need it explicit, and have a reasonable default when > it's not cast. I thought about that, too, but it's not ideal, either. That means that something might start out as the only range type for a given subtype, and doesn't need explicit casts. Then you define another range type over that subtype, and all the original queries break because they are now ambiguous. I think the fundamental differences with range types that we're dealing with are:1. multiple range types might reasonbly exist for a single subtype2. the order is a fundamental part of the typedefinition, not just an extra argument useful for operations on the range type Regards,Jeff Davis
On Thu, 2011-06-30 at 12:28 +0200, Florian Pflug wrote: > Well, arrays are containers, and we need two values to construct a range, What about empty ranges? What about infinite ranges? It seems quite a bit more awkward to shoehorn ranges into an array than to use a real type (even if it's intermediate and otherwise useless). > Hm, I guess. I'm sill no huge fan of RANGEINPUT, but if we prevent > it from being used as a column type and from being used as an argument > type, then I guess it's workable... > > Btw, what happened to the idea of making RANGE(...) a special syntactic > construct instead of a normal function call? Did we discard that for its > intrusiveness, or were there other reasons? It has not been discarded; as far as I'm concerned it's still on the table. The main advantage is that it doesn't require an intermediate type, and that requiring a cast (or some specification of the range type) might be a little more natural. The downside is that, well, it's new syntax, and there's a little inertia there. But if it's actually better, we should do it. If an intermediate type seems to be problematic, or if people think it's strange to require casting, then I think this is reasonable. Regards,Jeff Davis
On Fri, Jul 1, 2011 at 2:09 AM, Jeff Davis <pgsql@j-davis.com> wrote: > On Thu, 2011-06-30 at 12:28 +0200, Florian Pflug wrote: >> Well, arrays are containers, and we need two values to construct a range, > > What about empty ranges? What about infinite ranges? > > It seems quite a bit more awkward to shoehorn ranges into an array than > to use a real type (even if it's intermediate and otherwise useless). > >> Hm, I guess. I'm sill no huge fan of RANGEINPUT, but if we prevent >> it from being used as a column type and from being used as an argument >> type, then I guess it's workable... >> >> Btw, what happened to the idea of making RANGE(...) a special syntactic >> construct instead of a normal function call? Did we discard that for its >> intrusiveness, or were there other reasons? > > It has not been discarded; as far as I'm concerned it's still on the > table. The main advantage is that it doesn't require an intermediate > type, and that requiring a cast (or some specification of the range > type) might be a little more natural. The downside is that, well, it's > new syntax, and there's a little inertia there. > > But if it's actually better, we should do it. If an intermediate type > seems to be problematic, or if people think it's strange to require > casting, then I think this is reasonable. I don't understand how the bespoke syntax avoids the need for a cast? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, 2011-07-05 at 10:06 -0400, Robert Haas wrote: > > But if it's actually better, we should do it. If an intermediate type > > seems to be problematic, or if people think it's strange to require > > casting, then I think this is reasonable. > > I don't understand how the bespoke syntax avoids the need for a cast? It doesn't, it just avoids the need for an intermediate type. What I meant was that it might be strange to require a cast on the result of a function call, because we don't really do that anywhere else. Florian pointed out that it's common to require casting the ARRAY[] constructor, so that has more of a precedent. I'm not really sure how much that matters. I'm OK with the intermediate type, but Florian seems skeptical of that idea. Regards,Jeff Davis
On Tue, Jul 5, 2011 at 11:11 AM, Jeff Davis <pgsql@j-davis.com> wrote: > On Tue, 2011-07-05 at 10:06 -0400, Robert Haas wrote: >> > But if it's actually better, we should do it. If an intermediate type >> > seems to be problematic, or if people think it's strange to require >> > casting, then I think this is reasonable. >> >> I don't understand how the bespoke syntax avoids the need for a cast? > > It doesn't, it just avoids the need for an intermediate type. > > What I meant was that it might be strange to require a cast on the > result of a function call, because we don't really do that anywhere > else. Florian pointed out that it's common to require casting the > ARRAY[] constructor, so that has more of a precedent. I'm not really > sure how much that matters. > > I'm OK with the intermediate type, but Florian seems skeptical of that > idea. How about the idea of creating a family of four constructor functions for each new range type? The functions would be named after the range type, with "_cc", "_co", "_oc", and "_oo" appended. So, then, instead of writing: RANGE(1,8,'c','o')::int8range ...or somesuch, you could just say: int8range_co(1,8) ...which is both more compact and less ugly, IMHO, and seems to circumvent all the type system problems as well. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Jul5, 2011, at 17:11 , Jeff Davis wrote: > I'm OK with the intermediate type, but Florian seems skeptical of that > idea. I'm starting to get used to it, though ;-) I do now believe that it can be made safe against accidental miss-use, it seem that I was overly anxious there. What I still don't like about it is that it feels like a workaround for a feature missing in our type system - the possibility of having function with a polymorphic return type, but no polymorphic arguments. I feel somewhat strongly about this, because it bit me when I tried to implement record_getfield() and record_setfield() to get and set a record's field based on it's name. However, placing the burden of solving that onto the range type patch doesn't seem fair. Plus, I've realized now that a RANGEINPUT type would allow us to easily support some things that otherwise seem hard. We could, for example, make the cast from RANGEINPUT to the individual range types an assignment cast (or even implicit), thereby removing the need for an explicit cast in a lot of common cases like insert into a table with a range column. best regards, Florian Pflug
On Tue, Jul 5, 2011 at 10:26 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Tue, Jul 5, 2011 at 11:11 AM, Jeff Davis <pgsql@j-davis.com> wrote: >> On Tue, 2011-07-05 at 10:06 -0400, Robert Haas wrote: >>> > But if it's actually better, we should do it. If an intermediate type >>> > seems to be problematic, or if people think it's strange to require >>> > casting, then I think this is reasonable. >>> >>> I don't understand how the bespoke syntax avoids the need for a cast? >> >> It doesn't, it just avoids the need for an intermediate type. >> >> What I meant was that it might be strange to require a cast on the >> result of a function call, because we don't really do that anywhere >> else. Florian pointed out that it's common to require casting the >> ARRAY[] constructor, so that has more of a precedent. I'm not really >> sure how much that matters. >> >> I'm OK with the intermediate type, but Florian seems skeptical of that >> idea. > > How about the idea of creating a family of four constructor functions > for each new range type? The functions would be named after the range > type, with "_cc", "_co", "_oc", and "_oo" appended. So, then, instead > of writing: > > RANGE(1,8,'c','o')::int8range > > ...or somesuch, you could just say: > > int8range_co(1,8) > > ...which is both more compact and less ugly, IMHO, and seems to > circumvent all the type system problems as well. +1 on this (so you wouldn't even then directly cast to a range?) merlin
On Tue, Jul 5, 2011 at 12:23 PM, Merlin Moncure <mmoncure@gmail.com> wrote: > On Tue, Jul 5, 2011 at 10:26 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Tue, Jul 5, 2011 at 11:11 AM, Jeff Davis <pgsql@j-davis.com> wrote: >>> On Tue, 2011-07-05 at 10:06 -0400, Robert Haas wrote: >>>> > But if it's actually better, we should do it. If an intermediate type >>>> > seems to be problematic, or if people think it's strange to require >>>> > casting, then I think this is reasonable. >>>> >>>> I don't understand how the bespoke syntax avoids the need for a cast? >>> >>> It doesn't, it just avoids the need for an intermediate type. >>> >>> What I meant was that it might be strange to require a cast on the >>> result of a function call, because we don't really do that anywhere >>> else. Florian pointed out that it's common to require casting the >>> ARRAY[] constructor, so that has more of a precedent. I'm not really >>> sure how much that matters. >>> >>> I'm OK with the intermediate type, but Florian seems skeptical of that >>> idea. >> >> How about the idea of creating a family of four constructor functions >> for each new range type? The functions would be named after the range >> type, with "_cc", "_co", "_oc", and "_oo" appended. So, then, instead >> of writing: >> >> RANGE(1,8,'c','o')::int8range >> >> ...or somesuch, you could just say: >> >> int8range_co(1,8) >> >> ...which is both more compact and less ugly, IMHO, and seems to >> circumvent all the type system problems as well. > > +1 on this (so you wouldn't even then directly cast to a range?) You wouldn't need to, because these functions would be declared to return the range type. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, 2011-07-05 at 11:26 -0400, Robert Haas wrote: > How about the idea of creating a family of four constructor functions > for each new range type? The functions would be named after the range > type, with "_cc", "_co", "_oc", and "_oo" appended. So, then, instead > of writing: > > RANGE(1,8,'c','o')::int8range It would be something like: range_co(1,8)::int8range (just so we're comparing apples to apples) The intermediate type proposal doesn't require that we move the "c" and "o" into the parameter list. > int8range_co(1,8) > > ...which is both more compact and less ugly, IMHO, and seems to > circumvent all the type system problems as well. I brought that up before: http://archives.postgresql.org/pgsql-hackers/2011-06/msg02046.php It certainly circumvents the polymorphic type problems, but the problem is that it adds up to quite a few permutations. Not only are there cc/co/oc/oo, but there are also variations for infinite bounds and empty ranges. So I think we're talking 10+ functions per range type rather than 4. Also, if someone has an idea for another constructor, like the one you mention above: range(1,8,'c','o') then they have to create it for every range type, and they can't anticipate new range types that someone might create. In other words, the constructors wouldn't benefit from the polymorphism. However, if we used an intermediate type, then they could create the above constructor and it would work for any range type automatically. I don't object to this idea, but we'll need to come up with a pretty exhaustive list of possibly-useful constructors. Regards,Jeff Davis
On Tue, Jul 5, 2011 at 12:54 PM, Jeff Davis <pgsql@j-davis.com> wrote: > On Tue, 2011-07-05 at 11:26 -0400, Robert Haas wrote: >> How about the idea of creating a family of four constructor functions >> for each new range type? The functions would be named after the range >> type, with "_cc", "_co", "_oc", and "_oo" appended. So, then, instead >> of writing: >> >> RANGE(1,8,'c','o')::int8range > > It would be something like: range_co(1,8)::int8range > > (just so we're comparing apples to apples) > > The intermediate type proposal doesn't require that we move the "c" and > "o" into the parameter list. Well, you have to specify the bounds somewhere... >> int8range_co(1,8) >> >> ...which is both more compact and less ugly, IMHO, and seems to >> circumvent all the type system problems as well. > > I brought that up before: > http://archives.postgresql.org/pgsql-hackers/2011-06/msg02046.php > > It certainly circumvents the polymorphic type problems, but the problem > is that it adds up to quite a few permutations. Not only are there > cc/co/oc/oo, but there are also variations for infinite bounds and empty > ranges. So I think we're talking 10+ functions per range type rather > than 4. OK, so let's pass the information on the bounds as a separate argument. Like this: int8range(1,8,'co') Then you can instead pass 'o' for open or 'i' for infinity (passing NULL for the corresponding argument position in that case). The third argument can be optional and default to 'cc'. For empty ranges I doubt we need a separate constructor function; presumably the representation of an empty range is some fixed string and users can just write 'empty'::int8range or similar. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, 2011-07-05 at 13:06 -0400, Robert Haas wrote: > On Tue, Jul 5, 2011 at 12:54 PM, Jeff Davis <pgsql@j-davis.com> wrote: > > It would be something like: range_co(1,8)::int8range > > > > (just so we're comparing apples to apples) > > > > The intermediate type proposal doesn't require that we move the "c" and > > "o" into the parameter list. > > Well, you have to specify the bounds somewhere... That's true. In my example it's in the function name. > OK, so let's pass the information on the bounds as a separate > argument. Like this: > > int8range(1,8,'co') That has a lot going for it, in the sense that it avoids dealing with the type problems. > Then you can instead pass 'o' for open or 'i' for infinity (passing > NULL for the corresponding argument position in that case). The third > argument can be optional and default to 'cc'. The fact that there can be a default for the third argument makes this quite a lot more appealing than I had originally thought (although I think 'co' is the generally-accepted default). There's some slight ugliness around the NULL/infinity business, but I think that I could be convinced. I'd like to avoid confusion between NULL and infinity if possible. Regards,Jeff Davis
On Wed, Jul 6, 2011 at 1:19 AM, Jeff Davis <pgsql@j-davis.com> wrote: > On Tue, 2011-07-05 at 13:06 -0400, Robert Haas wrote: >> On Tue, Jul 5, 2011 at 12:54 PM, Jeff Davis <pgsql@j-davis.com> wrote: >> > It would be something like: range_co(1,8)::int8range >> > >> > (just so we're comparing apples to apples) >> > >> > The intermediate type proposal doesn't require that we move the "c" and >> > "o" into the parameter list. >> >> Well, you have to specify the bounds somewhere... > > That's true. In my example it's in the function name. > >> OK, so let's pass the information on the bounds as a separate >> argument. Like this: >> >> int8range(1,8,'co') > > That has a lot going for it, in the sense that it avoids dealing with > the type problems. > >> Then you can instead pass 'o' for open or 'i' for infinity (passing >> NULL for the corresponding argument position in that case). The third >> argument can be optional and default to 'cc'. > > The fact that there can be a default for the third argument makes this > quite a lot more appealing than I had originally thought (although I > think 'co' is the generally-accepted default). > > There's some slight ugliness around the NULL/infinity business, but I > think that I could be convinced. I'd like to avoid confusion between > NULL and infinity if possible. I was thinking that if you passed 'i' for one of the bounds, it would ignore the supplied argument and substitute its special infinity value. But you'd still need to supply some argument in that position, which could be NULL or anything else. It doesn't really seem worth having additional constructor functions to handle the case where one or both arguments are infinite. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, 2011-07-06 at 09:10 -0400, Robert Haas wrote: > > There's some slight ugliness around the NULL/infinity business, but I > > think that I could be convinced. I'd like to avoid confusion between > > NULL and infinity if possible. > > I was thinking that if you passed 'i' for one of the bounds, it would > ignore the supplied argument and substitute its special infinity > value. But you'd still need to supply some argument in that position, > which could be NULL or anything else. It doesn't really seem worth > having additional constructor functions to handle the case where one > or both arguments are infinite. Right, that's what I assumed that you meant. I can't think of anything better, either, because I like the fact that two arguments are there so that you can visually see which sides are bounded/unbounded. I suppose we could have constructors like: range(text, subtype) and range(subtype, text) where the text field is used to specify "infinity". But that has the obvious problem "what if the subtype is text?". So, of course, we make a special new pseudotype to represent infinity... ;) But seriously, your idea is starting to look more appealing. To get into some more details: how exactly would this constructor be generated on the fly? Clearly we want only one underlying C function that accepts something like: range_internal(lower, upper, flags, Oid rangetype) So how do we get the rangetype in there? I suppose a default 4th argument? That would be kind of an interesting option, but what if someone actually specified that 4th argument? We couldn't allow that. Also, are default arguments always applied in all the contexts where this function might be called? Regards,Jeff Davis
On Wed, Jul 6, 2011 at 12:22 PM, Jeff Davis <pgsql@j-davis.com> wrote: > To get into some more details: how exactly would this constructor be > generated on the fly? Clearly we want only one underlying C function > that accepts something like: > range_internal(lower, upper, flags, Oid rangetype) > So how do we get the rangetype in there? I think that the C function could call get_call_result_type() and get the return type OID back via the second argument. > Also, are default arguments always applied in all the contexts where > this function might be called? Uh, I'm not sure. But I don't see why it would need different handling than any other function which takes default arguments. It shouldn't be needed during bootstrapping or anything funky like that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, 2011-07-06 at 12:51 -0400, Robert Haas wrote: > On Wed, Jul 6, 2011 at 12:22 PM, Jeff Davis <pgsql@j-davis.com> wrote: > > To get into some more details: how exactly would this constructor be > > generated on the fly? Clearly we want only one underlying C function > > that accepts something like: > > range_internal(lower, upper, flags, Oid rangetype) > > So how do we get the rangetype in there? > > I think that the C function could call get_call_result_type() and get > the return type OID back via the second argument. I'm also a little unclear on the rules for when that might be set properly or not. I ran into problems with that before... I think with the I/O functions. I don't think that's a problem here, but I thought I'd ask. Regards,Jeff Davis
Jeff Davis <pgsql@j-davis.com> writes: > On Wed, 2011-07-06 at 12:51 -0400, Robert Haas wrote: >> On Wed, Jul 6, 2011 at 12:22 PM, Jeff Davis <pgsql@j-davis.com> wrote: >>> To get into some more details: how exactly would this constructor be >>> generated on the fly? Clearly we want only one underlying C function >>> that accepts something like: >>> range_internal(lower, upper, flags, Oid rangetype) >>> So how do we get the rangetype in there? >> I think that the C function could call get_call_result_type() and get >> the return type OID back via the second argument. > I'm also a little unclear on the rules for when that might be set > properly or not. > I ran into problems with that before... I think with the I/O functions. > I don't think that's a problem here, but I thought I'd ask. I think it'd probably be all right to do that. The places where you might find shortcuts being taken are where functions are called directly by C code, such as I/O function calls --- but these constructors should only ever get invoked from SQL queries, no? regards, tom lane
On Wed, 2011-07-06 at 15:14 -0400, Tom Lane wrote: > > I ran into problems with that before... I think with the I/O functions. > > I don't think that's a problem here, but I thought I'd ask. > > I think it'd probably be all right to do that. The places where you > might find shortcuts being taken are where functions are called directly > by C code, such as I/O function calls --- but these constructors should > only ever get invoked from SQL queries, no? Perhaps index expressions/predicates as well (which are also fine). I was more worried about some case that I hadn't thought of. Regards,Jeff Davis