Thread: extended operator classes vs. type interfaces
There are a couple of features that were considered but not committed for 9.0 which require additional information about the properties of various data types. At PGeast I had a chance to talk with Jeff Davis about this briefly, and wanted to write up some of what we talked about plus some further thoughts of my own before they went out of my head. The features I'm thinking about are: 1. knngist wants to use index scans to speed up queries of the form SELECT ... ORDER BY <column> <op> <constant> (as opposed to the existing machinery which only knows how to use an index for SELECT ... ORDER BY <column>). 2. Window functions want to define windows over a range of values defined by the underlying data type. To do this, we need to define what addition and subtraction mean for a particular data type. 3. Jeff Davis is interested in implementing range types. When the underlying base type is discrete, e.g. integers, you can say that [1,3] = [1,4), but only if you know that 3 and 4 are consecutive (in that order). All of these problems seem loosely related: we're teaching the database more about how certain types behave. But there are some important differences. In case #1, we're trying to teach the planner that if it sees a certain operator, it can map that operator onto an AM strategy - so the knowledge is fundamentally AM-specific. In the remaining cases, the information needed is a property of the underlying data type with no natural or logical relationship to an access method. For that reason, I don't think there's going to be any clean way to create a single mechanism that will encompass all of these needs, though maybe someone has a clever idea I'm not thinking of. What we discussed doing before for #1 (and it make sense to me) is to add a column pg_amop.amopcategory. The existing operator class functions will constitute one category - map a boolean operator onto an index strategy number that can be used to treat a filter condition as an index qual. The functions knngist cares about will be a second category - map an operator that returns some arbitrary type that is legal in the context of an ORDER BY clause onto an index strategy number that can regurgitate the tuples in the order defined by the return type. While it might be possible to shoehorn the remaining cases into the operator class machinery, it seems likely that it will be nothing but ugly. The whole charter of the operator class machinery at least AIUI is to map operators onto AM-specific index strategy numbers, and there is neither an applicable AM nor a strategy number for it in any of these cases. So I think it's time to create a separate concept of type interfaces (Jeff Davis proposed this name, and I like it). What might this look like? Given a type T, I think we'd like to be able to define a type U as "the natural type to be added to or subtracted from T". As Jeff pointed out to me, this is not necessarily the same as the underlying type. For example, if T is a timestamp, U is an interval; if T is a numeric, U is also a numeric; if T is a cidr, U is an integer. Then we'd like to define a canonical addition operator and a canonical subtraction operator. I think that would be sufficient for the needs of RANGE BETWEEN ... PRECEDING AND ... FOLLOWING. It would also be nearly sufficient for range types, but in that case you also need to specify the unit increment within U - i.e. a "1" value for the datatype. It may or may not be worth building the concept of a unit increment into the type interface machinery, though: one could imagine two different range types built over the same base type with different unit increments - e.g. one timestamp range with unit increment = 1s, and one with unit increment = 1m. Under the first type [4pm,5pm) = [4pm,4:59:59pm], while under the second [4pm,5pm) = [4pm,4:59pm]. In a previous thread on this topic, in response to a question about other possible needs for operator knowledge, Hitoshi Harada mentioned range partitioning as another possible case. If we have a partition where x ranges from 1 to 100, that's basically saying that x >= 1 and x <= 100, for suitable values of >= and <=. I think we're already habituated to doing this kind of thing by looking at the default btree opclass for the data type and looking for the operator that implements, e.g. BTLessEqualsStrategyNumber. It might be cleaner to get all this information from type interfaces, but I'm not sure whether it's reasonable (either for reasons of complexity or of performance) to think about untangling all the places where we've already made this assumption and redoing them. Thoughts? ...Robert
Hi, First, I like the way you got back to the needs before trying to organize an approach to find a solution. Having said it allows me to cut a lot of your text, it's the one I agree with :) Robert Haas <robertmhaas@gmail.com> writes: > Given a type T, I think we'd like to be able to define a type U as > "the natural type to be added to or subtracted from T". As Jeff > pointed out to me, this is not necessarily the same as the underlying > type. For example, if T is a timestamp, U is an interval; if T is a > numeric, U is also a numeric; if T is a cidr, U is an integer. Then > we'd like to define a canonical addition operator and a canonical > subtraction operator. I think that would be sufficient for the needs > of RANGE BETWEEN ... PRECEDING AND ... FOLLOWING. It would also be > nearly sufficient for range types, but in that case you also need to > specify the unit increment within U - i.e. a "1" value for the > datatype. It may or may not be worth building the concept of a unit > increment into the type interface machinery, though: one could imagine > two different range types built over the same base type with different > unit increments - e.g. one timestamp range with unit increment = 1s, > and one with unit increment = 1m. Under the first type [4pm,5pm) = > [4pm,4:59:59pm], while under the second [4pm,5pm) = [4pm,4:59pm]. Do we want to enable support for string based ranges, as in the contributed prefix_range type? > Thoughts? I like the type interface approach and I think this concept has been studied in great details in math and that we should start from existing concepts, even if most of them are way over my head. The ORDER BY problem refers to a metric space, defined by a distance function. Continuing your proposal the distance function return type would be of domain U. KNNGist is then a way to use the GiST index to sort by distance. http://archives.postgresql.org/pgsql-hackers/2010-02/msg01107.php You'll see in this mail a proposal for an operator group notion, which could get renamed to type interface if we think we won't need rings and such rather than just groups in the future. And there's opportunity for multi-type interfaces too (think families), like what's the distance between a point and a circle? The math groups already have a notion of neutral element, which for the addition is 0 (zero), we could expand our version of it with a "unity" element, which would be in the T domain. Then the range type could expand on this and provide a different unity value in their own interface, in the U domain this time. IMO tying the precision of the range interval into the type interface is a bad abstraction. As you said we want to possibly have several ranges types atop this. We can say that [1,3] = [1,4) when considering a "default" integer range because 4-3 = unity(integer). When considering a range over timestamps with a range interval unity of 1s, we are still able to do the math, and we can have another range over timestamps with a range interval unity of 10 mins in the same database. (I'm using this later example with the period datatype in a real application). While speaking of all that, in the prefix_range case, it'd be useful to have a new kind of typemod system at the range level, to allow for defining prefix text range with the '/' separator, say. Then greater_prefix('/etc/bar', '/etc/baz') = '/etc' (or is it '/etc/'?) Whereas currently => select '/etc/baz'::prefix_range | '/etc/bar'; ?column? -------------- /etc/ba[r-z] (1 row) Regards, -- dim
Robert Haas wrote: > Under the first type [4pm,5pm) = > [4pm,4:59:59pm], while under the second [4pm,5pm) = [4pm,4:59pm]. > > Thoughts? > The examples with units look a lot like the IVL<PQ> datatype from HL7, see http://www.hl7.org/v3ballot/html/infrastructure/datatypes_r2/datatypes_r2.htm About a type interface, the HL7 spec talks about promotion from e.g. a timestamp to an interval (hl7 speak for range) of timestamps (a range), and demotion for the back direction. Every 'quantity type', which is any type with a (possibly partially) lineair ordered domain, can be promoted to an interval of that type. In PostgreSQL terms, this could perhaps mean that by 'tagging' a datatype as a lineair order, it could automatically have a range type defined on it, like done for the array types currently. regards, Yeb Havinga
On Fri, Apr 9, 2010 at 7:55 AM, Yeb Havinga <yebhavinga@gmail.com> wrote: > Robert Haas wrote: >> >> Under the first type [4pm,5pm) = >> [4pm,4:59:59pm], while under the second [4pm,5pm) = [4pm,4:59pm]. >> >> Thoughts? >> > > The examples with units look a lot like the IVL<PQ> datatype from HL7, see > http://www.hl7.org/v3ballot/html/infrastructure/datatypes_r2/datatypes_r2.htm > > About a type interface, the HL7 spec talks about promotion from e.g. a > timestamp to an interval (hl7 speak for range) of timestamps (a range), and > demotion for the back direction. Every 'quantity type', which is any type > with a (possibly partially) lineair ordered domain, can be promoted to an > interval of that type. In PostgreSQL terms, this could perhaps mean that by > 'tagging' a datatype as a lineair order, it could automatically have a range > type defined on it, like done for the array types currently. The way we've handled array types is, quite frankly, horrible. It's bad enough that we now have two catalog entries in pg_type for each base type; what's even worse is that if we actually wanted to enforce things like the number of array dimensions we'd need even more - say, seven per base type, one for the base type itself, one for a one-dimensional array, one for a two-dimensional array, one for a three-dimensional array. And then if we want to support range types that's another one for every base type, maybe more if there's more than one kind of range over a base type. It's just not feasible to handle derived types in a way that require a new instance of each base type to be created for each kind of derived type. It scales as O(number of base types * number of kinds of derived type), and that rapidly gets completely out of hand ...Robert
On Fri, Apr 9, 2010 at 10:33 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Fri, Apr 9, 2010 at 7:55 AM, Yeb Havinga <yebhavinga@gmail.com> wrote: >> Robert Haas wrote: >>> >>> Under the first type [4pm,5pm) = >>> [4pm,4:59:59pm], while under the second [4pm,5pm) = [4pm,4:59pm]. >>> >>> Thoughts? >>> >> >> The examples with units look a lot like the IVL<PQ> datatype from HL7, see >> http://www.hl7.org/v3ballot/html/infrastructure/datatypes_r2/datatypes_r2.htm >> >> About a type interface, the HL7 spec talks about promotion from e.g. a >> timestamp to an interval (hl7 speak for range) of timestamps (a range), and >> demotion for the back direction. Every 'quantity type', which is any type >> with a (possibly partially) lineair ordered domain, can be promoted to an >> interval of that type. In PostgreSQL terms, this could perhaps mean that by >> 'tagging' a datatype as a lineair order, it could automatically have a range >> type defined on it, like done for the array types currently. > > The way we've handled array types is, quite frankly, horrible. It's > bad enough that we now have two catalog entries in pg_type for each > base type; what's even worse is that if we actually wanted to enforce > things like the number of array dimensions we'd need even more - say, > seven per base type, one for the base type itself, one for a > one-dimensional array, one for a two-dimensional array, one for a > three-dimensional array. And then if we want to support range types > that's another one for every base type, maybe more if there's more > than one kind of range over a base type. It's just not feasible to > handle derived types in a way that require a new instance of each base > type to be created for each kind of derived type. It scales as > O(number of base types * number of kinds of derived type), and that > rapidly gets completely out of hand ...which by the way, doesn't mean that your idea is bad (although it might not be what I would choose to do), just that I don't think our current infrastructure can support it. ...Robert
On Fri, Apr 9, 2010 at 4:10 AM, Dimitri Fontaine <dfontaine@hi-media.com> wrote: > Do we want to enable support for string based ranges, as in the > contributed prefix_range type? Yes, probably, but that doesn't require as much knowledge of the underlying data type, so I didn't feel it needed to be brought up in this context. There is no x such that ['a','b') = ['a',x]; it's generally impossible to convert between open and closed intervals in this type of range type. That's the case where type interfaces are needed; if you're not converting between different kinds of intervals then you can probably get by with the existing system of using the default btree opclass to find equality and comparison operators. > I like the type interface approach and I think this concept has been > studied in great details in math and that we should start from existing > concepts, even if most of them are way over my head. I'm not too excited about patterning this too closely after mathematical concepts; I think we need to have a pragmatic approach that focuses on what the database actually needs. We need to think generally enough about what we're trying to provide that we don't box ourselves into a corner, but we're not trying to build a theorem-prover. > You'll see in this mail a proposal for an operator group notion, which > could get renamed to type interface if we think we won't need rings and > such rather than just groups in the future. And there's opportunity for > multi-type interfaces too (think families), like what's the distance > between a point and a circle? Yeah, that needs some thought. > The math groups already have a notion of neutral element, which for the > addition is 0 (zero), we could expand our version of it with a "unity" > element, which would be in the T domain. I don't know what that would mean in this case. We're trying to add and subtract from T, so a unit or identity element makes sense for U, but not for T. > Then the range type could expand on this and provide a different unity > value in their own interface, in the U domain this time. IMO tying the > precision of the range interval into the type interface is a bad > abstraction. As you said we want to possibly have several ranges types > atop this. Right - so I think there's no point in specifying this in the type interface at all. We can always add it later if we find a real need for it. > We can say that [1,3] = [1,4) when considering a "default" integer range > because 4-3 = unity(integer). When considering a range over timestamps > with a range interval unity of 1s, we are still able to do the math, and > we can have another range over timestamps with a range interval unity of > 10 mins in the same database. (I'm using this later example with the > period datatype in a real application). > > While speaking of all that, in the prefix_range case, it'd be useful to > have a new kind of typemod system at the range level, to allow for > defining prefix text range with the '/' separator, say. Then > > greater_prefix('/etc/bar', '/etc/baz') = '/etc' (or is it '/etc/'?) > > Whereas currently > > => select '/etc/baz'::prefix_range | '/etc/bar'; > ?column? > -------------- > /etc/ba[r-z] > (1 row) Not sure I'm really following this. ...Robert
Robert Haas wrote: > On Fri, Apr 9, 2010 at 10:33 AM, Robert Haas <robertmhaas@gmail.com> wrote: > >> On Fri, Apr 9, 2010 at 7:55 AM, Yeb Havinga <yebhavinga@gmail.com> wrote: >> >>> Robert Haas wrote: >>> >>>> Under the first type [4pm,5pm) = >>>> [4pm,4:59:59pm], while under the second [4pm,5pm) = [4pm,4:59pm]. >>>> >>>> Thoughts? >>>> >>>> >>> The examples with units look a lot like the IVL<PQ> datatype from HL7, see >>> http://www.hl7.org/v3ballot/html/infrastructure/datatypes_r2/datatypes_r2.htm >>> >>> About a type interface, the HL7 spec talks about promotion from e.g. a >>> timestamp to an interval (hl7 speak for range) of timestamps (a range), and >>> demotion for the back direction. Every 'quantity type', which is any type >>> with a (possibly partially) lineair ordered domain, can be promoted to an >>> interval of that type. In PostgreSQL terms, this could perhaps mean that by >>> 'tagging' a datatype as a lineair order, it could automatically have a range >>> type defined on it, like done for the array types currently. >>> >> The way we've handled array types is, quite frankly, horrible. It's >> bad enough that we now have two catalog entries in pg_type for each >> base type; what's even worse is that if we actually wanted to enforce >> things like the number of array dimensions we'd need even more - say, >> seven per base type, one for the base type itself, one for a >> one-dimensional array, one for a two-dimensional array, one for a >> three-dimensional array. And then if we want to support range types >> that's another one for every base type, maybe more if there's more >> than one kind of range over a base type. It's just not feasible to >> handle derived types in a way that require a new instance of each base >> type to be created for each kind of derived type. It scales as >> O(number of base types * number of kinds of derived type), and that >> rapidly gets completely out of hand >> > > ...which by the way, doesn't mean that your idea is bad (although it > might not be what I would choose to do), just that I don't think our > current infrastructure can support it. > Well yeah the idea was to 'automagically' have a range type available, if the underlying type would support it, i.e. has a lineair order and therefore <,>,= etc defined on it, "just like the array types", from a user / datatype developer perspective. From the implementers perspective, IMHO an extra catalog entry in pg_type is not bad on its own, you would have one anyway if the range type was explicitly programmed. About different kinds of range types - I would not know how to 'promote' integer into anything else but just one kind of 'range of integer' type. So the number of extra pg_types would be more like O(number of linear ordered base types). regards, Yeb Havinga
> From the implementers perspective, IMHO an extra catalog entry in > pg_type is not bad on its own, you would have one anyway if the range > type was explicitly programmed. About different kinds of range types - > I would not know how to 'promote' integer into anything else but just > one kind of 'range of integer' type. So the number of extra pg_types > would be more like O(number of linear ordered base types). .. I now see the example of different ranges in your original mail with different unit increments. Making that more general so there could be continuous and discrete ranges and for the latter, what would the increment be.. OTOH is a range of integers with increment x a different type from range of integers with increment y, if x<>y? Maybe the increment step and continuous/discrete could be typmods. regards Yeb Havinga
On Fri, Apr 9, 2010 at 11:07 AM, Yeb Havinga <yebhavinga@gmail.com> wrote: > >> From the implementers perspective, IMHO an extra catalog entry in pg_type >> is not bad on its own, you would have one anyway if the range type was >> explicitly programmed. About different kinds of range types - I would not >> know how to 'promote' integer into anything else but just one kind of 'range >> of integer' type. So the number of extra pg_types would be more like >> O(number of linear ordered base types). > > .. I now see the example of different ranges in your original mail with > different unit increments. Making that more general so there could be > continuous and discrete ranges and for the latter, what would the increment > be.. OTOH is a range of integers with increment x a different type from > range of integers with increment y, if x<>y? Maybe the increment step and > continuous/discrete could be typmods. Nope, not enough bits available there. This is fundamentally why the typid/typmod system is so broken - representing a type as a fixed size object is extremely limiting. A fixed size object that MUST consist of a 32-bit unsigned OID and a 32-bit signed integer is even more limiting. Fortunately, we don't need to solve that problem in order to implement range types: we can just have people explicitly create the ones they need. This will, for example, avoid creating ranges over every composite type that springs into existence because a table is created, even though in most cases a fairly well-defined range type could be constructed. ...Robert
On 04/09/2010 07:33 AM, Robert Haas wrote: > On Fri, Apr 9, 2010 at 7:55 AM, Yeb Havinga <yebhavinga@gmail.com> wrote: >> 'tagging' a datatype as a lineair order, it could automatically have a range >> type defined on it, like done for the array types currently. > > The way we've handled array types is, quite frankly, horrible. It's > bad enough that we now have two catalog entries in pg_type for each > base type; what's even worse is that if we actually wanted to enforce > things like the number of array dimensions we'd need even more - say, > seven per base type, one for the base type itself, one for a > one-dimensional array, one for a two-dimensional array, one for a > three-dimensional array. And then if we want to support range types > that's another one for every base type, maybe more if there's more > than one kind of range over a base type. It's just not feasible to > handle derived types in a way that require a new instance of each base > type to be created for each kind of derived type. It scales as > O(number of base types * number of kinds of derived type), and that > rapidly gets completely out of hand Perhaps off the original topic (and thinking out loud), but I agree with you on the handling of array types. I have long thought (and at least once played with the idea) that a single array type, anyarray, made up of elements, anyelement, could be made to work. Further, anyelement should be defined to be any valid existing type, including anyarray. Essentially, at least by my reading of the SQL spec, a multidimensional array ought to be an array of arrays, which is different in subtle ways from what we have today. Joe
Robert Haas <robertmhaas@gmail.com> wrote: > Given a type T, I think we'd like to be able to define a type U as > "the natural type to be added to or subtracted from T". As Jeff > pointed out to me, this is not necessarily the same as the > underlying type. For example, if T is a timestamp, U is an > interval; if T is a numeric, U is also a numeric; if T is a cidr, > U is an integer. Then we'd like to define a canonical addition > operator and a canonical subtraction operator. As it is de rigueur for someone to escalate the proposed complexity of an idea by at least one order of magnitude, and everyone else has fallen down on this one: ;-) I've often thought that if we rework the type system, it would be very nice to support a concept of hierarchy. If you could "subclass" money to have a subclass like assessable, which in turn has subclasses of fine, fee, restitution, etc. you could then automatically do anything with a subclass which you could do with the superclass, and support such things as treating the sum of various classes as the lowest common subclass. It seems like this sort of approach, if done right, might allow some easier way to establish sensible operations between types (like distance / speed = time or speed * time = distance). Just a thought.... -Kevin
On Fri, Apr 9, 2010 at 1:13 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: > As it is de rigueur for someone to escalate the proposed complexity > of an idea by at least one order of magnitude, and everyone else has > fallen down on this one: ;-) Gee, thanks for filling in? > I've often thought that if we rework the type system, it would be > very nice to support a concept of hierarchy. If you could > "subclass" money to have a subclass like assessable, which in turn > has subclasses of fine, fee, restitution, etc. you could then > automatically do anything with a subclass which you could do with > the superclass, and support such things as treating the sum of > various classes as the lowest common subclass. It seems like this > sort of approach, if done right, might allow some easier way to > establish sensible operations between types (like distance / speed = > time or speed * time = distance). > > Just a thought.... I dowanna rework the type system. I'm not even 100% sure I want to implement what I actually proposed. I do want to find out if people think the framework makes sense and whether it's the right way forward for those projects that need these features. What you're proposing here sounds suspiciously like something that should be handled by creating domains - but in any case it's almost entirely unrelated to what I was talking about. ...Robert
Robert Haas <robertmhaas@gmail.com> wrote: > I dowanna rework the type system. I'm not even 100% sure I want > to implement what I actually proposed. I do want to find out if > people think the framework makes sense and whether it's the right > way forward for those projects that need these features. What you proposed sounds like it would be cleaner and less work than further perverting the index system as a source of information about types or hard-coding knowledge anywhere else. > What you're proposing here sounds suspiciously like something that > should be handled by creating domains Not really. Unless I've missed something domains are a single-level layer over a data type. I find them very useful and use them heavily, but the standard implementation is rather limited. Perhaps that would be the area to add the functionality I suggested, though. I'm totally at the hand-waving stage on it, with no concrete ideas. I just thought that if you were adding more type information, oriented aournd the types themselves rather than index AMs, some form of inheritence might fit in gracefully. > in any case it's almost entirely unrelated to what I was talking > about. OK -Kevin
Robert Haas wrote: > On Fri, Apr 9, 2010 at 11:07 AM, Yeb Havinga <yebhavinga@gmail.com> wrote: > >> .. I now see the example of different ranges in your original mail with >> different unit increments. Making that more general so there could be >> continuous and discrete ranges and for the latter, what would the increment >> be.. OTOH is a range of integers with increment x a different type from >> range of integers with increment y, if x<>y? Maybe the increment step and >> continuous/discrete could be typmods. >> > > Nope, not enough bits available there. This is fundamentally why the > typid/typmod system is so broken - representing a type as a fixed size > object is extremely limiting. A fixed size object that MUST consist > of a 32-bit unsigned OID and a 32-bit signed integer is even more > limiting. Fortunately, we don't need to solve that problem in order > to implement range types: we can just have people explicitly create > the ones they need. This will, for example, avoid creating ranges > over every composite type that springs into existence because a table > is created, even though in most cases a fairly well-defined range type > could be constructed. > Ok, no typmod, not default extra types for base types, but the concept of an still there is one aspect of ranges (may I say intervals?) of 'things' is something generic, and code to handle intervals of things could be shared between datatype implementations. A way to have generic support without automatic new types could be with something that looks like: What about CREATE TYPE ivl_int AS INTERVAL OF integer; SELECT '[1;2]'::ivl_int; etc regards, Yeb Havinga
On Fri, Apr 9, 2010 at 4:01 PM, Yeb Havinga <yebhavinga@gmail.com> wrote: > Robert Haas wrote: >> >> On Fri, Apr 9, 2010 at 11:07 AM, Yeb Havinga <yebhavinga@gmail.com> wrote: >> >>> >>> .. I now see the example of different ranges in your original mail with >>> different unit increments. Making that more general so there could be >>> continuous and discrete ranges and for the latter, what would the >>> increment >>> be.. OTOH is a range of integers with increment x a different type from >>> range of integers with increment y, if x<>y? Maybe the increment step and >>> continuous/discrete could be typmods. >>> >> >> Nope, not enough bits available there. This is fundamentally why the >> typid/typmod system is so broken - representing a type as a fixed size >> object is extremely limiting. A fixed size object that MUST consist >> of a 32-bit unsigned OID and a 32-bit signed integer is even more >> limiting. Fortunately, we don't need to solve that problem in order >> to implement range types: we can just have people explicitly create >> the ones they need. This will, for example, avoid creating ranges >> over every composite type that springs into existence because a table >> is created, even though in most cases a fairly well-defined range type >> could be constructed. >> > > Ok, no typmod, not default extra types for base types, but the concept of an > still there is one aspect of ranges (may I say intervals?) of 'things' is > something generic, and code to handle intervals of things could be shared > between datatype implementations. A way to have generic support without > automatic new types could be with something that looks like: > > What about > CREATE TYPE ivl_int AS INTERVAL OF integer; > > SELECT '[1;2]'::ivl_int; > etc Yeah, that's how it has to work, I think. ...Robert
On Fri, 2010-04-09 at 12:50 -0500, Kevin Grittner wrote: > I just thought that if you were adding more type information, > oriented aournd the types themselves rather than index AMs, some form > of inheritence might fit in gracefully. There are already some specific proposals for inheritance in database theory literature. For instance: "Databases, Types, and the Relational Model" by C.J. Date addresses inheritance explicitly (and the appendices have some interesting discussion). I'm not sure how compatible it is with SQL, though; and I am not very optimistic that we could accomplish such a restructuring of the type system while maintaining a reasonable level of backwards compatibility. Either way, I think it's a separate topic. Two types that are not related by any subtype/supertype relationship (like strings and ints) can conform to the same interface (total ordering); while the very same type can conform to two different interfaces. Regards,Jeff Davis
On Fri, 2010-04-09 at 11:14 -0400, Robert Haas wrote: > > range of integers with increment y, if x<>y? Maybe the increment step and > > continuous/discrete could be typmods. > > Nope, not enough bits available there. I think the problem is deeper than that. Typmods aren't carried along as part of the result of a function call, so typmod is not really a part of the type at all -- it's more a property of the column. Regards,Jeff Davis
On Thu, 2010-04-08 at 22:29 -0400, Robert Haas wrote: > 1. knngist wants to use index scans to speed up queries of the form > SELECT ... ORDER BY <column> <op> <constant> (as opposed to the > existing machinery which only knows how to use an index for SELECT ... > ORDER BY <column>). > 2. Window functions want to define windows over a range of values > defined by the underlying data type. To do this, we need to define > what addition and subtraction mean for a particular data type. > 3. Jeff Davis is interested in implementing range types. When the > underlying base type is discrete, e.g. integers, you can say that > [1,3] = [1,4), but only if you know that 3 and 4 are consecutive (in > that order). To give some context, I started a thread a while ago: http://archives.postgresql.org/pgsql-hackers/2009-10/msg01403.php Tom provided some interesting suggestions in that thread, but I'm not sure they would work for #1 or #2. > It may or may not be worth building the concept of a unit > increment into the type interface machinery, though: one could imagine > two different range types built over the same base type with different > unit increments - e.g. one timestamp range with unit increment = 1s, > and one with unit increment = 1m. Under the first type [4pm,5pm) = > [4pm,4:59:59pm], while under the second [4pm,5pm) = [4pm,4:59pm]. Right. Part of the interface could be a unit() function, and that can return whatever you want. I was originally thinking about it in terms of next() and prev(), but you could build those from +, -, and unit(). Regards,Jeff Davis
On Fri, Apr 9, 2010 at 5:49 PM, Jeff Davis <pgsql@j-davis.com> wrote: >> It may or may not be worth building the concept of a unit >> increment into the type interface machinery, though: one could imagine >> two different range types built over the same base type with different >> unit increments - e.g. one timestamp range with unit increment = 1s, >> and one with unit increment = 1m. Under the first type [4pm,5pm) = >> [4pm,4:59:59pm], while under the second [4pm,5pm) = [4pm,4:59pm]. > > Right. Part of the interface could be a unit() function, and that can > return whatever you want. > > I was originally thinking about it in terms of next() and prev(), but > you could build those from +, -, and unit(). The advantage of specifying a + and a - in the type interface is that the unit definition can then be specified as part of the type declaration itself. So you can do: CREATE TYPE ts_sec AS RANGE OVER timestamp (UNIT = '1s'); CREATE TYPE ts_min AS RANGE OVER timestamp (UNIT = '1m'); All of the stuff about defining + and - is hidden from the user - it's part of the type interface, which is pre-created. ...Robert
> The advantage of specifying a + and a - in the type interface is that > the unit definition can then be specified as part of the type > declaration itself. So you can do: > > CREATE TYPE ts_sec AS RANGE OVER timestamp (UNIT = '1s'); > CREATE TYPE ts_min AS RANGE OVER timestamp (UNIT = '1m'); > > All of the stuff about defining + and - is hidden from the user - it's > part of the type interface, which is pre-created. > The disadvantage is that it does not permit irregularly spaced units. -Nathan
On Fri, Apr 9, 2010 at 7:18 PM, Nathan Boley <npboley@gmail.com> wrote: >> The advantage of specifying a + and a - in the type interface is that >> the unit definition can then be specified as part of the type >> declaration itself. So you can do: >> >> CREATE TYPE ts_sec AS RANGE OVER timestamp (UNIT = '1s'); >> CREATE TYPE ts_min AS RANGE OVER timestamp (UNIT = '1m'); >> >> All of the stuff about defining + and - is hidden from the user - it's >> part of the type interface, which is pre-created. > > The disadvantage is that it does not permit irregularly spaced units. True. The only types I can think of that have irregularly spaced units would be things based on floating points, and I was assuming that people would only want continuous intervals on those. If someone really wants to be able to deduce that [1.0,3.0) = [1.0,3.0-epsilon), then we need a different design. But I find it hard to believe that's very useful. Maybe you feel otherwise? ...Robert
On Fri, Apr 9, 2010 at 7:53 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Fri, Apr 9, 2010 at 7:18 PM, Nathan Boley <npboley@gmail.com> wrote: >>> The advantage of specifying a + and a - in the type interface is that >>> the unit definition can then be specified as part of the type >>> declaration itself. So you can do: >>> >>> CREATE TYPE ts_sec AS RANGE OVER timestamp (UNIT = '1s'); >>> CREATE TYPE ts_min AS RANGE OVER timestamp (UNIT = '1m'); >>> >>> All of the stuff about defining + and - is hidden from the user - it's >>> part of the type interface, which is pre-created. >> >> The disadvantage is that it does not permit irregularly spaced units. > > True. The only types I can think of that have irregularly spaced > units would be things based on floating points, and I was assuming > that people would only want continuous intervals on those. If someone > really wants to be able to deduce that [1.0,3.0) = [1.0,3.0-epsilon), > then we need a different design. But I find it hard to believe that's > very useful. Maybe you feel otherwise? Er, that [1.0,3.0) = [1.0,3.0-epsilon], rather. ...Robert
Robert Haas wrote: > The advantage of specifying a + and a - in the type interface is that > the unit definition can then be specified as part of the type > declaration itself. So you can do: > > CREATE TYPE ts_sec AS RANGE OVER timestamp (UNIT = '1s'); > CREATE TYPE ts_min AS RANGE OVER timestamp (UNIT = '1m'); > The problem with mixing units with ranges is that units are properties of some underlying datatype but not all types on which ranges can be defined. regards, Yeb Havinga
Jeff Davis wrote: > To give some context, I started a thread a while ago: > > http://archives.postgresql.org/pgsql-hackers/2009-10/msg01403.php > Interesting, a join type for overlaps, which makes me think a bit of the staircase join for pre-post coordinates. However, does a join operator type need certain kinds of properties of the operator involved, e.g. being commutative, transitive etc? Else the join reordering fails. The latter fails for the overlap operator. regards, Yeb Havinga
On Sat, Apr 10, 2010 at 12:05 PM, Yeb Havinga <yebhavinga@gmail.com> wrote: > Jeff Davis wrote: >> >> To give some context, I started a thread a while ago: >> >> http://archives.postgresql.org/pgsql-hackers/2009-10/msg01403.php >> > > Interesting, a join type for overlaps, which makes me think a bit of the > staircase join for pre-post coordinates. However, does a join operator type > need certain kinds of properties of the operator involved, e.g. being > commutative, transitive etc? Else the join reordering fails. The latter > fails for the overlap operator. I don't think I follow this. As far as I know, the join order constraints don't depend on the choice of operator. ...Robert
Robert Haas wrote: > On Sat, Apr 10, 2010 at 12:05 PM, Yeb Havinga <yebhavinga@gmail.com> wrote: > >> Jeff Davis wrote: >> >>> To give some context, I started a thread a while ago: >>> >>> http://archives.postgresql.org/pgsql-hackers/2009-10/msg01403.php >>> >>> >> Interesting, a join type for overlaps, which makes me think a bit of the >> staircase join for pre-post coordinates. However, does a join operator type >> need certain kinds of properties of the operator involved, e.g. being >> commutative, transitive etc? Else the join reordering fails. The latter >> fails for the overlap operator. >> > > I don't think I follow this. As far as I know, the join order > constraints don't depend on the choice of operator. > I was thinking of a case for instance for ranges a,b,c in relations A,B,C respectively, where a && b and b && c, but not a && c. Would the planner consider a join path of table A and C first, then that result with B. After looking in doxygen, it looks like having && defined without MERGES is what prevents this unwanted behaviour, since that prevents a,b and c to become members of the same equivalence class. Sorry for the spam on the list. regards, Yeb Havinga
On Sat, 2010-04-10 at 20:25 +0200, Yeb Havinga wrote: > I was thinking of a case for instance for ranges a,b,c in relations > A,B,C respectively, where a && b and b && c, but not a && c. Would the > planner consider a join path of table A and C first, then that result > with B. After looking in doxygen, it looks like having && defined > without MERGES is what prevents this unwanted behaviour, since that > prevents a,b and c to become members of the same equivalence class. Interesting, I would have to make sure that didn't happen. Most likely there would be a new property like "RANGEMERGES", it wouldn't reuse the existing MERGES property. > Sorry for the spam on the list. Not at all, it's an interesting point. Regards,Jeff Davis
On Fri, Apr 9, 2010 at 5:49 PM, Jeff Davis <pgsql@j-davis.com> wrote: > On Thu, 2010-04-08 at 22:29 -0400, Robert Haas wrote: >> 1. knngist wants to use index scans to speed up queries of the form >> SELECT ... ORDER BY <column> <op> <constant> (as opposed to the >> existing machinery which only knows how to use an index for SELECT ... >> ORDER BY <column>). >> 2. Window functions want to define windows over a range of values >> defined by the underlying data type. To do this, we need to define >> what addition and subtraction mean for a particular data type. >> 3. Jeff Davis is interested in implementing range types. When the >> underlying base type is discrete, e.g. integers, you can say that >> [1,3] = [1,4), but only if you know that 3 and 4 are consecutive (in >> that order). > > To give some context, I started a thread a while ago: > > http://archives.postgresql.org/pgsql-hackers/2009-10/msg01403.php > > Tom provided some interesting suggestions in that thread, but I'm not > sure they would work for #1 or #2. <rereads thread> The "map && to <<" case is interesting. It doesn't seem like it's really a good candidate for type interfaces, because you you're not really looking for "the" strictly-left-of operator; you're looking for the strictly-left-of operator associated with the overlaps operator actually specified. And ideally there might be an index strategy number available for <<, too, so that you could consider doing an index scan instead of a sort, but not necessarily. ...Robert
On Sat, Apr 10, 2010 at 2:30 PM, Jeff Davis <pgsql@j-davis.com> wrote: > On Sat, 2010-04-10 at 20:25 +0200, Yeb Havinga wrote: >> I was thinking of a case for instance for ranges a,b,c in relations >> A,B,C respectively, where a && b and b && c, but not a && c. Would the >> planner consider a join path of table A and C first, then that result >> with B. After looking in doxygen, it looks like having && defined >> without MERGES is what prevents this unwanted behaviour, since that >> prevents a,b and c to become members of the same equivalence class. > > Interesting, I would have to make sure that didn't happen. Most likely > there would be a new property like "RANGEMERGES", it wouldn't reuse the > existing MERGES property. > >> Sorry for the spam on the list. > > Not at all, it's an interesting point. Yeah... I agree. ...Robert
Robert Haas wrote: > On Fri, Apr 9, 2010 at 11:07 AM, Yeb Havinga <yebhavinga@gmail.com> wrote: > > > >> From the implementers perspective, IMHO an extra catalog entry in pg_type > >> is not bad on its own, you would have one anyway if the range type was > >> explicitly programmed. About different kinds of range types - I would not > >> know how to 'promote' integer into anything else but just one kind of 'range > >> of integer' type. So the number of extra pg_types would be more like > >> O(number of linear ordered base types). > > > > .. I now see the example of different ranges in your original mail with > > different unit increments. Making that more general so there could be > > continuous and discrete ranges and for the latter, what would the increment > > be.. OTOH is a range of integers with increment x a different type from > > range of integers with increment y, if x<>y? Maybe the increment step and > > continuous/discrete could be typmods. > > Nope, not enough bits available there. This is fundamentally why the > typid/typmod system is so broken - representing a type as a fixed size > object is extremely limiting. A fixed size object that MUST consist > of a 32-bit unsigned OID and a 32-bit signed integer is even more > limiting. You mean the typmod system we developed 13 years ago needs updating? Seems unlikely. ;-) -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com
Jeff Davis wrote: > On Fri, 2010-04-09 at 12:50 -0500, Kevin Grittner wrote: >> I just thought that if you were adding more type information, >> oriented aournd the types themselves rather than index AMs, some form >> of inheritence might fit in gracefully. > > There are already some specific proposals for inheritance in database > theory literature. For instance: "Databases, Types, and the Relational > Model" by C.J. Date addresses inheritance explicitly (and the appendices > have some interesting discussion). > > I'm not sure how compatible it is with SQL, though; and I am not very > optimistic that we could accomplish such a restructuring of the type > system while maintaining a reasonable level of backwards compatibility. > > Either way, I think it's a separate topic. Two types that are not > related by any subtype/supertype relationship (like strings and ints) > can conform to the same interface (total ordering); while the very same > type can conform to two different interfaces. > > Regards, > Jeff Davis Well I've been doing a lot of work with range abstract data types in Oracle lately. And I've got to say that the OO features in Oracle make it really nice. Of course its Oracle, so its like a half baked OO in cobol syntax, lol. But I for one think it would be great if Postgres had object data types that had methods and could besubclassed. For those not familiar with ADT's in Oracle, here's an example: CREATE TYPE period AS OBJECT ( beginning DATE, ending DATE, CONSTRUCTOR FUNCTION period ( self IN OUT NOCOPYperiod, beginning DATE, ending DATE ) RETURN SELF AS RESULT, -- config functions MEMBER FUNCTION granuleRETURN INTERVAL DAY TO SECOND, MEMBER FUNCTION def_inc RETURN NUMBER, MEMBER FUNCTION range_union(p2 period) RETURNperiod ... ) NOT FINAL; CREATE TYPE date_range UNDER period ( OVERRIDING MEMBER FUNCTION granule RETURN INTERVAL DAY TO SECOND, OVERRIDING MEMBERFUNCTION def_inc RETURN NUMBER, ... ); Scott Bailey
On Fri, 2010-04-16 at 23:18 -0700, Scott Bailey wrote: > Well I've been doing a lot of work with range abstract data types in > Oracle lately. And I've got to say that the OO features in Oracle make > it really nice. Of course its Oracle, so its like a half baked OO in > cobol syntax, lol. But I for one think it would be great if Postgres had > object data types that had methods and could be subclassed. That's interesting. I think the most critical piece of that is "subclassing" in the sense of a type interface. There have already been a couple threads about type interfaces: http://archives.postgresql.org/pgsql-hackers/2009-10/msg01403.php http://archives.postgresql.org/pgsql-hackers/2010-04/msg00443.php Regards,Jeff Davis