Thread: Operator class group proposal
After further thought about the mergejoinable-operators issue and some other longstanding planner problems, I have a modest proposal to make: we should invent the notion of "operator class groups", which identify sets of compatible operator classes. (I'm not wedded to the name "class group"; it seems a bit redundant. Anyone have a better idea?) We've touched on related ideas in past threads but never really put together a concrete proposal. Here is one. A class group is associated with a specific index AM and can contain only opclasses for that AM. We might for instance invent "numeric" and "numeric_reverse" groups for btree, to contain the default opclasses and reverse-sort opclasses for the standard arithmetic types. We further require that any given opclass be a member of at most one class group (this simplifies matters, and there isn't any application I can see for one opclass being in more than one group), and that a class group contain at most one opclass for a given datatype (ditto). Recall that the member operators of an opclass all have the opclass' indexable datatype as their lefthand inputs, but the righthand input datatype can vary. These rules imply that any given operator can stand in at most one relationship to a class group: there is only one potential class member it could be in. It's still possible for an operator to appear in different groups (e.g., within both normal and reverse-sort groups). For btree opclasses, if two opclasses are in the same group, it means that they have compatible sort ordering and equality rules. In particular, for any "=" operators of member opclasses and values A, B, C of the appropriate datatypes, we requireA = B and B = C implies A = C Likewise for "<" operators:A < B and B < C implies A < C and likewise for <=, >, >= (I think these cases are implied if the first two must hold). I believe these rules are sufficient to guarantee that any "=" operator in the opclass group can be mergejoined using "<" and ">" operators taken from the appropriate member opclasses. Therefore we can entirely eliminate the oprlsortop etc. fields of pg_operator and rely on looking into the opclass catalogs to determine how to do mergejoins. This is an improvement over the current situation (and also my proposal of yesterday) because it will be possible to support both ascending and descending mergejoins. In the current structure we can only associate one lsortop ("<") with any "=", but if there are reverse-sort opclasses in existence it would be nice to be able to perform a mergejoin using ">" as well. Furthermore, the class group structure provides a solid foundation for the planner's transitive-equality deductions, which are currently pretty bogus as discussed yesterday. Lastly, the group structure fixes a nasty problem in predicate testing: currently, we tend to fail to prove implications involving cross-datatype operators because we can't find operators that are in the same opclass and compare the two types we need to compare. With class groups, we can use any two operators in the same class group to make a deduction. (This also cleans up btree scankey analysis for cross-type situations; it has the same problem of not knowing how to determine which key values are lesser or greater.) For hash opclasses, placing two opclasses in the same group means that they have compatible hash functions. This allows us to support hashing of cross-type comparisons. As an example, "int4 eq int8" can be hashed if the int8 hash function guarantees that for any int8 value that could fit in int4, it will produce the same hash code as the int4 hash function does. (Which, in fact, our current integer hash functions are designed to do.) Currently we cannot exploit this because we have no way to identify which two hash functions to use. With opclass groups, we could put cross-type equality operators into hash opclasses the same way as we do for btree opclasses, and the rule would be to use the hash functions for the group's opclasses matching the left and right datatypes. I don't have any idea whether opclass groups would be useful for GiST or GIN indexes, but maybe Oleg and Teodor can think of applications. As far as syntax goes, I was envisioning CREATE/DROP OPERATOR CLASS GROUP name USING index_method; and adding an optional "GROUP name" clause to CREATE OPERATOR CLASS. We could imagine adding a lot of additional stuff such as being able to ALTER an existing opclass' group membership, but at least for the first cut I see no need for anything more than this. (In particular, since we'd probably only allow these commands to superusers, we needn't bother with a concept of ownership for class groups.) In the system catalogs, we'd need to add a "pg_opclassgroup" catalog with OIDs and columns for name, namespace, and index AM's OID; and pg_opclass would get a new column for the OID of the class's group, or 0 if not in any group (which really effectively means it's a group of one opclass). BTW, I'm also pretty strongly tempted to denormalize the catalogs a bit by adding group OID and index AM OID to individual pg_amop entries, thereby reducing lookup time when trying to identify an operator's properties. Currently we need an extra probe into pg_opclass to see which index AM the opclass belongs to, and with this proposal we'd also be looking at the group OID quite a lot. Comments? regards, tom lane
On Wed, Dec 13, 2006 at 04:27:09PM -0500, Tom Lane wrote: > After further thought about the mergejoinable-operators issue and some > other longstanding planner problems, I have a modest proposal to make: > we should invent the notion of "operator class groups", which identify > sets of compatible operator classes. (I'm not wedded to the name "class > group"; it seems a bit redundant. Anyone have a better idea?) We've > touched on related ideas in past threads but never really put together > a concrete proposal. Here is one. <snip> I think it's a good idea, though I would point out that in the examples given it's the underlying types that are compatable, not the classes. But I'm unsure if there's a way to use that. These groups seem a reasonable addition to the existing system. Other names I can think of: - type class - type group - compatability class - operator class set None of which sound any good :( It's good is that this provides more information about the underlying types to the system, which improves the possibility of optimisation (and correctness). Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Martijn van Oosterhout <kleptog@svana.org> writes: > On Wed, Dec 13, 2006 at 04:27:09PM -0500, Tom Lane wrote: >> we should invent the notion of "operator class groups", which identify >> sets of compatible operator classes. > I think it's a good idea, though I would point out that in the examples > given it's the underlying types that are compatable, not the classes. Well, I didn't try very hard to provide a complete set of examples, but here is one addressing that point: the string types have one set of opclasses involving < = > and one set involving ~<~ ~=~ ~>~ (the "pattern_ops" opclasses). These would need to be distinct class groups since in fact they have incompatible semantics. Reverse-sort opclasses would be another example. > Other names I can think of: > - type class > - type group > - compatability class > - operator class set > None of which sound any good :( Yeah, I'm drawing a blank on good names too. In the absence of any better idea I'm inclined to re-use some word that's already a keyword, rather than invent a new one. (GROUP is already a fully reserved word, I think because it's used in GROUP BY, so it wouldn't pose any parsing problems.) regards, tom lane
I wrote: > We further require that any given opclass be a member of at most one class > group (this simplifies matters, and there isn't any application I can see > for one opclass being in more than one group), and that a class group > contain at most one opclass for a given datatype (ditto). BTW, I forgot to mention one of the motivations for that last restriction: I'm thinking it would be convenient to allow index declarations to accept either an opclass name or a class group name. Thus you could say "pattern_ops" instead of being specific about "varchar_pattern_ops" or "text_pattern_ops". Not sure whether there's a need to worry about name collisions ... regards, tom lane
[ replying to myself again, how tacky :-( ] I wrote: > BTW, I forgot to mention one of the motivations for that last > restriction: I'm thinking it would be convenient to allow index > declarations to accept either an opclass name or a class group name. > Thus you could say "pattern_ops" instead of being specific about > "varchar_pattern_ops" or "text_pattern_ops". Some more thought about that yielded what might or might not be a good idea: why not just collapse all the members of a "class group" into one opclass? In other words, not make a distinction between groups and individual opclasses? So for example, int2_ops int4_ops and int8_ops would all collapse into one big opclass. In SQL terms this would mean allowing several "FOR TYPE foo" clauses in a single CREATE OPERATOR CLASS command, and we'd likely have to give up the shorthand where you can omit argument types for operators (or at least, it'd only work in the case where you say FOR TYPE just once). At the system catalog level this'd require some rearrangement of pg_opclass, and the unique keys for pg_amop and pg_amproc would probably need another column, but it certainly seems doable. One rather large objection is that existing pg_dump files may contain references to specific opclasses such as "varchar_pattern_ops". How could we deal with converting that to a world where there's only one "pattern_ops" opclass? In the past we've resorted to hard-wired substitutions in gram.y, but that got too bletcherous to bear quite some time ago. I'm thinking about allowing opclasses to have "aliases" recorded in the system catalogs as a more general solution to the problem; it'd let us get rid of those old hacks in gram.y, too. The other thing that'd have to be dealt with is that for GIST and GIN, the STORAGE clause needs to be associated with a specific indexable type. I'm thinking the syntax might be like CREATE OPERATOR CLASS name USING index_method AS { [ DEFAULT ] FOR TYPE data_type [ STORAGE storage_type ] [ ALIAS name ] | OPERATOR strategy_number operator_name [ (op_type, op_type) ] [ RECHECK ] | FUNCTION support_number funcname (argument_type[, ...]) } [, ... ] plus a backwards-compatible variant syntax where there's just one FOR TYPE clause before the USING part. With either this or the original proposal, or for that matter our existing releases, there's a bit of an issue with needing to modify existing opclasses. Consider the situation of someone adding say "tinyint" as a contrib module, and wanting it to interact reasonably with the existing datatypes. He has to create some cross-type operators, which is no big deal, but then he needs to plug them into already-existing opclasses, and there's no provision for doing that. I had been thinking that the scheme of adding opclasses to a pre-existing group would fix that, but it doesn't really: you still need to add more cross-type operators to the existing group member opclasses. So this leads me to think that we need to allowALTER OPERATOR CLASS foo ADD [ TYPE/OPERATOR/FUNCTION ]ALTER OPERATOR CLASSfoo DROP [ TYPE/OPERATOR/FUNCTION ] and the process for the would-be tinyint implementor is to add on entries to an existing "numeric_ops" operator class, instead of starting his own class. Among other things this would require finer granularity in the pg_depends entries, since dropping a type would involve deleting elements of an opclass but not necessarily the whole thing. (That may well mean that we still need an object equivalent to the current per-datatype opclass, as something to be named in pg_depends; in which case the above restructuring becomes largely syntactic sugar.) It may be that the separate-GROUP approach is about as clean as this once all the dust settles, but we'd still need to do something about allowing additions to an existing operator class. Thoughts? regards, tom lane
> > Some more thought about that yielded what might or might not be a good > idea: why not just collapse all the members of a "class group" into one > opclass? In other words, not make a distinction between groups and > individual opclasses? So for example, int2_ops int4_ops and int8_ops > would all collapse into one big opclass. IMHO, this is better than groups, but in any case exact opclass will be defined by pair: (opgroup, opclass) in first case or (opclass, alias) in second. Pair is needed to find support function(s). For instance, contrib/intarray makes two GiST opclasses with different storage types. > I don't have any idea whether opclass groups would be useful for GiST or> GIN indexes, but maybe Oleg and Teodor can thinkof applications. I'm afraid it isn't useful for GiST/GIN - strategies are not defined at all and planner can't known about semantic meaning of operation/strategies. Examples of opclasses which can be joined in one group: 1) geometry opclasses 2) gist__int_ops and gist__intbig_ops for GiST from contrib/intarray (BTW, gin__int_ops for GIN supports the same operationsand the same strategies as gist__int*_ops) 3) some of opclasses from btree_gist. 4) builtin GIN oplasses for arrays and it possible to add cross-type operations for arrays ( ie int4[] && int8[] ) BTW, unpleasant behaviour of postgres: postgres=# CREATE TABLE foo (a varchar); postgres=# CREATE INDEX fooidx ON foo (a varchar_pattern_ops); postgres=# ALTER OPERATOR CLASS varchar_pattern_ops USING btree RENAME TO bar; % pg_dump postgres | psql wow ... ERROR: operator class "bar" does not exist for access method "btree" ... May be, it will be useful to forbids altering/renaming of builtin objects... -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
> CREATE OPERATOR CLASS name USING index_method AS > { [ DEFAULT ] FOR TYPE data_type [ STORAGE storage_type ] [ > ALIAS name ] > | OPERATOR strategy_number operator_name [ (op_type, > op_type) ] [ RECHECK ] > | FUNCTION support_number funcname (argument_type [, ...]) > } [, ... ] > > plus a backwards-compatible variant syntax where there's just > one FOR TYPE > clause before the USING part. > > With either this or the original proposal, or for that matter our > existing releases, there's a bit of an issue with needing to modify > existing opclasses. Consider the situation of someone adding say > "tinyint" as a contrib module, and wanting it to interact reasonably > with the existing datatypes. He has to create some cross-type > operators, which is no big deal, but then he needs to plug them into > already-existing opclasses, and there's no provision for doing that. > I had been thinking that the scheme of adding opclasses to a > pre-existing group would fix that, but it doesn't really: you > still need > to add more cross-type operators to the existing group member > opclasses. > > So this leads me to think that we need to allow > ALTER OPERATOR CLASS foo ADD [ TYPE/OPERATOR/FUNCTION ] > ALTER OPERATOR CLASS foo DROP [ TYPE/OPERATOR/FUNCTION ] > and the process for the would-be tinyint implementor is to add on > entries to an existing "numeric_ops" operator class, instead of > starting his own class. I think it would be easier to understand if we do not merge the opclasses for different types into one statement. We would add type to the key and thus have: CREATE OPERATOR CLASS name USING index_method FOR TYPE data_type ... DROP OPERATOR CLASS name USING index_method FOR TYPE data_type ... I would then name the opclasses btree_ops (for the default btree opclass) with no reference to type, but only to functionality. (maybe we need to diff text and number) Classes with the same name and index_method would implicitly be a class group. I generally like the idea to simplify this area. As an aside, Informix decided to name compatible operator functions identically and define an opclass as those names: opclassname owner amid opclassid ops support btree_ops informix 1 1 lessthan;lessthanorequal;equal;greaterthanorequal;greaterthan; compare; rtree_ops informix 2 2 overlap;equal;contains;within; union;size;inter; This is also simple to use and understand. Andreas
On Wed, Dec 13, 2006 at 07:28:20PM -0500, Tom Lane wrote: > Some more thought about that yielded what might or might not be a good > idea: why not just collapse all the members of a "class group" into one > opclass? In other words, not make a distinction between groups and > individual opclasses? So for example, int2_ops int4_ops and int8_ops > would all collapse into one big opclass. I think it may be useful to maintain the distinction between groups and classes for users, because at that level the whole concept is easier to understand. Dropping and recreating operator classes is easier to handle than playing strange tricks with ALTER OPERATOR CLASS GROUP. And probably easier to get right/harder to screw up. How the backend implements it may be easier as one single large opclass. Essentially the operater class table becomes merely a placeholder for the name (maybe also aliases?), which can still be used in index declarations, but is never used by the backend otherwise. That would preserve backward compatability while still allowing all the benefits. BTW, in some aspects this resembles some of the changes needed for SQL collate support. There also I tried to merge operator classes into something larger. However, there the point was that the group becomes a common pathkey identifier for all the underlying types, which is not in your proposal. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
"Zeugswetter Andreas ADI SD" <ZeugswetterA@spardat.at> writes: > I think it would be easier to understand if we do not merge the > opclasses for different types into one statement. Agreed, huge CREATE OPERATOR CLASS commands would be no fun, which is one reason for my recommendation to improve ALTER OPERATOR CLASS. I think that in practice people would use ALTER to add one type at a time to an opclass. > Classes with the same name and > index_method would implicitly be a class group. [ itch... ] I've never cared for the idea that semantics should depend fundamentally on the mere name of something. I think we want class groups to be real objects in one form or another, not chance associations. As a specific objection, under this rule it would never become possible to allow unprivileged users to create opclasses, because they could break the behavior of someone else's opclass just by creating another one of the same name with not-really-compatible behavior. > As an aside, Informix decided to name compatible operator functions > identically and define an opclass as those names: Interesting. Probably too much water under the bridge now for us to consider forcing function/operator renames, though. regards, tom lane
Martijn van Oosterhout <kleptog@svana.org> writes: > I think it may be useful to maintain the distinction between groups and > classes for users, because at that level the whole concept is easier to > understand. Dropping and recreating operator classes is easier to > handle than playing strange tricks with ALTER OPERATOR CLASS GROUP. And > probably easier to get right/harder to screw up. I'm not sure. The problem that I'm seeing is that currently, cross-type comparisons go into the opclass associated with their left-hand argument type. Therefore, if say you want to add "tinyint" to an opclass group, you not only need to add an opclass for tinyint (containing tinyint vs tinyint as well as tinyint vs other-type operators), but you also need to add other-type vs tinyint operators to the *other* members of the group. So the notion of the classes being separate objects seems a bit artificial to me. I think that "if I want to make tinyint part of the numeric_ops index opclass, I just add the type and all these operators to that opclass" is at least as clear. regards, tom lane
"Tom Lane" <tgl@sss.pgh.pa.us> writes: > I'm not sure. The problem that I'm seeing is that currently, cross-type > comparisons go into the opclass associated with their left-hand argument > type. Therefore, if say you want to add "tinyint" to an opclass group, > you not only need to add an opclass for tinyint (containing tinyint vs > tinyint as well as tinyint vs other-type operators), but you also need > to add other-type vs tinyint operators to the *other* members of the > group. So the notion of the classes being separate objects seems a bit > artificial to me. I think that "if I want to make tinyint part of the > numeric_ops index opclass, I just add the type and all these operators to > that opclass" is at least as clear. Hm, would we still need all the cross-data-type btree operators? Could the planner be taught that as long as the two types belong to the same opclassclass that it's ok to use a btree operator that requires a cast first before being used? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Gregory Stark <stark@enterprisedb.com> writes: > Hm, would we still need all the cross-data-type btree operators? Yes, I think so; remember all the pain we had when we didn't have indexable cross-type operators and spent years looking for a non-broken way of introducing casts to solve the problem. Those were fundamental semantic problems and AFAICS we'd be right back into that if we take cross-type operators out of the opclasses again. Basically what I'm on about here is that the way we shoehorned cross-type operators into opclasses was a kluge. Which was not a bad idea when we weren't yet sure it would solve the problem. But now it's looking better and better to take the next step and allow opclasses to support multiple types explicitly. regards, tom lane
Martijn van Oosterhout <kleptog@svana.org> writes: > I think it may be useful to maintain the distinction between groups and > classes for users, because at that level the whole concept is easier to > understand. Dropping and recreating operator classes is easier to > handle than playing strange tricks with ALTER OPERATOR CLASS GROUP. And > probably easier to get right/harder to screw up. After further thought I'm beginning to see the merits of this suggestion. The consideration that drives it is this: what is an index going to depend on in pg_depends? If we have a massive operator class handling multiple types, we certainly don't want to have to drop every index using that opclass if we want to drop just one of the datatypes. (This would be especially true for things like "numeric_ops" which would support both user indexes and system catalogs ... after adding "tinyint" to that, what if you want to back it out again?) So it seems like in a multi-type-operator-class system there'd still have to be an explicit notion of an opclass-and-indexed-datatype combination --- which we might as well keep calling an operator class for old times' sake. So we're back to calling the larger structure an "operator class group" unless someone's got a better idea. However, thinking about the effects of DROP leads me to suggest that an operator class in this sense should ideally contain only the "core" stuff that you must have to support an index on a column of its datatype. In particular, cross-type comparisons are generally not "core", and could be dropped without breaking indexes that use the class. So I suggest that cross-type comparison operators could be declared to be members of an operator class group but not directly members of any one of its operator classes. In other words the object hierarchy looks like Operator class group Operator class "core" ops and support functions for class Operator class "core" ops and support functions for class ... Cross-type ops and support functions The dependencies would be set up so that dropping a "core" operator forces dropping its operator class and any dependent indexes, same as today. But dropping a cross-type op doesn't force anything, and an operator class group per se never goes away unless explicitly destroyed. (It could validly exist with no contained objects, in fact that would be its initial state.) I think the CREATE OPERATOR CLASS GROUP command becomes just CREATE OPERATOR CLASS GROUP name USING am; and then you add opclasses to it by creating them with a GROUP option used in CREATE OPERATOR CLASS, and/or add individual cross-type ops to it with an ALTER command that would look like ALTER OPERATOR CLASS GROUP name ADD { OPERATOR ... | FUNCTION ... } ... As now, there's no ALTER OPERATOR CLASS, you can only drop the whole thing (with any dependent indexes) and re-create it. This seems a good idea because we lack any infrastructure for rebuilding dependent indexes after an opclass redefinition. That doesn't seem like a problem for fooling with individual cross-type ops in a class group though. For backwards compatibility, we'd still accept cross-type ops listed in CREATE OPERATOR CLASS, but this would be deprecated for the reason that dropping such an op would force dropping the class (as indeed it does today) and obviously you'd rather not have indexes dependent on non-core operators. Note that which operators are "core" to an opclass is not hardwired into this structure, it's the class author's choice. > How the backend implements it may be easier as one single large > opclass. Essentially the operater class table becomes merely a > placeholder for the name (maybe also aliases?), which can still be used > in index declarations, but is never used by the backend otherwise. To represent this cleanly in the catalogs, we'd probably want nominally stand-alone opclasses to belong to implicitly created single-member groups, because the individual entries in pg_amop and pg_amproc are always going to be shown as belonging to groups --- the membership hierarchy above is only interesting in pg_depends, not for index operations. I think that's about the same thing you're saying here. Comments? regards, tom lane
On Thu, Dec 14, 2006 at 08:31:59PM -0500, Tom Lane wrote: > > How the backend implements it may be easier as one single large > > opclass. Essentially the operater class table becomes merely a > > placeholder for the name (maybe also aliases?), which can still be used > > in index declarations, but is never used by the backend otherwise. > > To represent this cleanly in the catalogs, we'd probably want nominally > stand-alone opclasses to belong to implicitly created single-member > groups, because the individual entries in pg_amop and pg_amproc are > always going to be shown as belonging to groups --- the membership > hierarchy above is only interesting in pg_depends, not for index > operations. I think that's about the same thing you're saying here. I think we're on the same page. I thought of another motivation also: protections/permissions. We don't currently allow people to mess with definitions needed by system catalogs, so you don't want to allow users to mess with the btree(int4) class, but you want to allow them to modify the group as a whole to add new things and remove things they've added. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Martijn van Oosterhout <kleptog@svana.org> writes: > I think we're on the same page. I thought of another motivation also: > protections/permissions. We don't currently allow people to mess with > definitions needed by system catalogs, so you don't want to allow users > to mess with the btree(int4) class, but you want to allow them to > modify the group as a whole to add new things and remove things they've > added. Yeah. We're a long way from letting non-superusers manipulate opclass definitions, but it'd be good if the structure were not one that would forbid using permissions in future. I'm still not real happy with the phrase "operator class group"; it seems unwieldy, and it's not even technically accurate in this design, since the structure can contain operators directly not only operator classes. What do people think about calling it just an "operator group"? That's sort of reasonable since really what we're doing is saying that all these operators are compatible. (A lot of the things the planner wants to do with this info have nothing to do with indexes anyway.) The main objection I can think of is that a novice seeing the terms "operator class" and "operator group" isn't going to have any context to know which is which, but I'm not sure that we can devise any terms that would help much. regards, tom lane
"Tom Lane" <tgl@sss.pgh.pa.us> writes: > The main objection I can think of is that a novice seeing the terms > "operator class" and "operator group" isn't going to have any context > to know which is which, but I'm not sure that we can devise any terms > that would help much. Operator Superclass ? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Gregory Stark <stark@enterprisedb.com> writes: > Operator Superclass ? Yeah, I thought about that too, but I don't like it much ... can't entirely put my finger on why not, except that class/superclass usually implies that the objects you're talking about are all the same kind of thing, whereas what we have here is a very definite distinction between two kinds of objects. On the same grounds, I'd object to calling schemas "directories" or "folders", unless they could be nested. regards, tom lane
"Tom Lane" <tgl@sss.pgh.pa.us> writes: > Gregory Stark <stark@enterprisedb.com> writes: >> Operator Superclass ? > > Yeah, I thought about that too, but I don't like it much ... can't > entirely put my finger on why not, except that class/superclass usually > implies that the objects you're talking about are all the same kind of > thing, whereas what we have here is a very definite distinction between > two kinds of objects. I'm kind of glad to hear that since I had similar discomfort with it myself. I thought it might be the least worst option though so I figured I would say it out loud. Perhaps something like Operator Class and Data Type Class Data type classes happens to involve operator classes but it sounds like you're looking for them to specify other behaviours of how data types inter-relate than just their operator classes anyways. > On the same grounds, I'd object to calling schemas "directories" or > "folders", unless they could be nested. (Actually that's a bit of an odd case since real-world folders aren't generally nestable and actually many early operating systems didn't allow them to be nested either. We're just so used to the computer metaphor that we forget what it's originally supposed to represent.) -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Gregory Stark <stark@enterprisedb.com> writes: > Perhaps something like > Operator Class > and > Data Type Class > Data type classes happens to involve operator classes but it sounds like > you're looking for them to specify other behaviours of how data types > inter-relate than just their operator classes anyways. Well, actually I think of this more as something that tells me how a bunch of *operators* relate. As an example, "=" and friends over the string datatypes will be one group, and "~=~" and friends will be another group over the very same types. So to me "data type class" would really miss the point. The alternatives I'd been thinking about were "operator set" and "operator collection", but I don't really see any advantage over "operator group" for either ... >> On the same grounds, I'd object to calling schemas "directories" or >> "folders", unless they could be nested. > (Actually that's a bit of an odd case since real-world folders aren't > generally nestable True, ya got me ;-) regards, tom lane
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Fri, Dec 15, 2006 at 06:44:10PM -0500, Tom Lane wrote: > Gregory Stark <stark@enterprisedb.com> writes: > > Operator Superclass ? > > Yeah, I thought about that too, but I don't like it much ... can't > entirely put my finger on why not [...] I think I can ;-) "Operator class group", unwieldy as it is, conveys the meaning that we are talking about _sets of operator classes_. The nicer terms I have seen all lose a bit of that ring to me. Regards - -- tomás -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) iD8DBQFFg4aYBcgs9XrR2kYRAp1mAJ9+ISc7Ex1qCBV2dKgNJSUAOSmR/ACeKt6O KKp1DI9OkSrlO4VpJbb2xFM= =KDb3 -----END PGP SIGNATURE-----
tomas@tuxteam.de writes: > "Operator class group", unwieldy as it is, conveys the meaning that we > are talking about _sets of operator classes_. The nicer terms I have > seen all lose a bit of that ring to me. The thing is that in the proposal as it currently stands, we're *not* talking about sets of operator classes, because a group can contain "free standing" operators as well. So the apparent technical accuracy is really a bit misleading. As I'm currently thinking about it, a group is a collection of compatible operators, and the fact that it has some of those operators in common with an opclass is almost incidental --- not from the index AM's point of view maybe, but there will be large chunks of the system that work with groups without ever thinking about opclasses. Another thing that struck me this morning is that we make very heavy use of the shorthand "opclass" in both the code and the docs. If we call these things "operator groups" then we can use the parallel construction "opgroup", which seems to read OK to me. But if they're "operator class groups" we'll be stuck with something like "opclassgroup" ... ugh. regards, tom lane
Tom Lane wrote: > Gregory Stark <stark@enterprisedb.com> writes: > > Perhaps something like > > > Operator Class > > and > > Data Type Class > > > Data type classes happens to involve operator classes but it sounds like > > you're looking for them to specify other behaviours of how data types > > inter-relate than just their operator classes anyways. > > Well, actually I think of this more as something that tells me how a > bunch of *operators* relate. As an example, "=" and friends over the > string datatypes will be one group, and "~=~" and friends will be > another group over the very same types. So to me "data type class" > would really miss the point. > > The alternatives I'd been thinking about were "operator set" and > "operator collection", but I don't really see any advantage over > "operator group" for either ... I guess the problem is finding a term that makes it very obvious that is "bigger than a class", but at the same time conveying the fact that you can have lone operators in there, as well as whole classes. So "collection", "group" and "set" fail because they can be easily seen as being in the same category as a "class": a bunch of things. How would anybody tell which one is the bigger one? The "superclass" term passes that test, but fails because it is somehow expected that you can't put a class inside a superclass. How about "hyperclass" then? :-) Maybe "school" :-D Maybe somebody should ask a philologist about a greek or latin prefix with that meaning.
Alvaro Herrera wrote: > The "superclass" term passes that test, but fails because it is somehow > expected that you can't put a class inside a superclass. > > How about "hyperclass" then? :-) Maybe "school" :-D > > Maybe somebody should ask a philologist about a greek or latin prefix > with that meaning. > I did wonder if "operator metaclass" might fit the bill ... finding a term with meaning that is both adequate and accurate isn't easy. cheers andrew
Alvaro Herrera <alvherre@commandprompt.com> writes: > How about "hyperclass" then? :-) Maybe "school" :-D So I finally went and looked at a thesaurus, and came up with some possibilities: category variety family crowd cluster flock pack troop gang clique circle cabal association society union ensemble "Operator family" seems like it might be a good choice. One small problem with it is that in biology, "family" is a smaller group than "class" ... but if we want to follow the biologists, the only bigger groups than classes are "phylums" and "kingdoms", neither of which seem like good words to use. "Operator cluster" isn't too bad either, except that it might create confusion with the other index-related meaning of "cluster". regards, tom lane
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sat, Dec 16, 2006 at 11:14:02AM -0500, Tom Lane wrote: > tomas@tuxteam.de writes: > > "Operator class group", unwieldy as it is, conveys the meaning that we > > are talking about _sets of operator classes_. The nicer terms I have > > seen all lose a bit of that ring to me. > > The thing is that in the proposal as it currently stands, we're *not* > talking about sets of operator classes, because a group can contain > "free standing" operators as well. So the apparent technical accuracy > is really a bit misleading. Hm. Singleton classes? > As I'm currently thinking about it, a group is a collection of > compatible operators, and the fact that it has some of those operators > in common with an opclass is almost incidental --- not from the index > AM's point of view maybe, but there will be large chunks of the system > that work with groups without ever thinking about opclasses. Can you imagine a class straddling two groups? [...] > "opclassgroup" ... ugh. Indeed. regards - -- tomás -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) iD8DBQFFhCpZBcgs9XrR2kYRArzKAJ46mOwDkfW+bIC+HEKBROCYwHbk7wCfQCu+ yc0pj2yMXf+HUdJiVwq3Q/o= =gA/y -----END PGP SIGNATURE-----
On Sat, Dec 16, 2006 at 11:48:48AM -0500, Tom Lane wrote: > So I finally went and looked at a thesaurus, and came up with some > possibilities: > > category variety family crowd cluster flock pack troop gang clique > circle cabal association society union ensemble > > "Operator family" seems like it might be a good choice. I like this one more than opcluster. opfamily has a good ring to it also. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
On Sat, 2006-12-16 at 11:48 -0500, Tom Lane wrote: > Alvaro Herrera <alvherre@commandprompt.com> writes: > > How about "hyperclass" then? :-) Maybe "school" :-D > > So I finally went and looked at a thesaurus, and came up with some > possibilities: > > category variety family crowd cluster flock pack troop gang clique > circle cabal association society union ensemble > > "Operator family" seems like it might be a good choice. > One small problem with it is that in biology, "family" is a smaller > group than "class" ... but if we want to follow the biologists, > the only bigger groups than classes are "phylums" and "kingdoms", > neither of which seem like good words to use. > > "Operator cluster" isn't too bad either, except that it might create > confusion with the other index-related meaning of "cluster". group? Joshua D. Drake > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly > -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
Martijn van Oosterhout wrote: > On Sat, Dec 16, 2006 at 11:48:48AM -0500, Tom Lane wrote: > > So I finally went and looked at a thesaurus, and came up with some > > possibilities: > > > > category variety family crowd cluster flock pack troop gang clique > > circle cabal association society union ensemble > > > > "Operator family" seems like it might be a good choice. > > I like this one more than opcluster. opfamily has a good ring to it > also. +1 on opfamily.
Alvaro Herrera <alvherre@commandprompt.com> writes: > Martijn van Oosterhout wrote: >> On Sat, Dec 16, 2006 at 11:48:48AM -0500, Tom Lane wrote: >>> "Operator family" seems like it might be a good choice. >> >> I like this one more than opcluster. opfamily has a good ring to it also. > +1 on opfamily. Done, unless anyone really doesn't like it. regards, tom lane
>>>> "Operator family" seems like it might be a good choice. >>> I like this one more than opcluster. opfamily has a good ring to it also. >> +1 on opfamily. > Done, unless anyone really doesn't like it. So I started revising my notes on what to change the catalog structure to, and immediately came up with pg_opfamily OIDopfamid access method OIDopfname family nameopfnamespace family namespaceopfowner owner's OID But "opfamid" is a really unfortunate name, because it looks way too much like it's supposed to be the ID of the family itself, rather than of the index AM it's associated with. I'm thinking of changing it to "opfmethod", and similarly rename pg_opclass.opcamid to opcmethod. That might break a few bits of code that needn't be broken, but on the whole this patch will whack the opclass-related catalogs around enough that most code looking at them will need fixes anyway. Comments? regards, tom lane
> pg_opfamily > > OID > opfamid access method OID > opfname family name > opfnamespace family namespace > opfowner owner's OID > > But "opfamid" is a really unfortunate name, because it looks way too > much like it's supposed to be the ID of the family itself, rather than > of the index AM it's associated with. I think by convention it is clear that we are talking about opf -- amid, opfmethod imho removes the connex to amid so I wouldn't do it. Andreas
"Zeugswetter Andreas ADI SD" <ZeugswetterA@spardat.at> writes: >> But "opfamid" is a really unfortunate name, because it looks way too >> much like it's supposed to be the ID of the family itself, rather than >> of the index AM it's associated with. > I think by convention it is clear that we are talking about opf -- amid, > opfmethod imho removes the connex to amid so I wouldn't do it. I think that's fairly clear when you look at the catalog definition by itself, but when you come across the name in running code it's likely to be confusing. And it's not like "amid" is widely used to refer to index access method OIDs. regards, tom lane
"Tom Lane" <tgl@sss.pgh.pa.us> writes: > A class group is associated with a specific index AM and can contain only > opclasses for that AM. We might for instance invent "numeric" and > "numeric_reverse" groups for btree, to contain the default opclasses and > reverse-sort opclasses for the standard arithmetic types. I thought that would just be formalizing what we currently have. But I just discovered to my surprise tat it's not. I don't see any cross-data-type operators between any of the integer types and numeric, or between any of the floating point types and numeric, or between any of the integers and the floating point types. So does that mean we currently have three separate arithmetic "operator class groups" such as they currently exist and you can't currently do merge joins between some combinations of these arithmetic types? What puzzles me is that we used to have problems with bigint columns where people just did "WHERE bigint_col = 1". But my testing shows similar constructs between integer and numeric or other types with no cross-data-type comparator don't lead to similar problems. The system happily introduces casts now and uses the btree operator. So I must have missed another change that was also relevant to this in addition to the cross datatype operators. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Gregory Stark <stark@enterprisedb.com> writes: > I thought that would just be formalizing what we currently have. But I just > discovered to my surprise tat it's not. I don't see any cross-data-type > operators between any of the integer types and numeric, or between any of the > floating point types and numeric, or between any of the integers and the > floating point types. Correct. > So does that mean we currently have three separate arithmetic "operator class > groups" such as they currently exist and you can't currently do merge joins > between some combinations of these arithmetic types? No, what you'll get is something like int4var::float8 float8eq float8var which is perfectly mergejoinable ... however, it's not clear that the planner will make very good estimates about the value of the cast expression. I'm not sure if it's worth introducing a pile more crosstype operators to change that situation --- improving the selectivity functions to handle casts better might be a wiser approach. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > No, what you'll get is something like > > int4var::float8 float8eq float8var > > which is perfectly mergejoinable ... however, it's not clear that the > planner will make very good estimates about the value of the cast > expression. I'm not sure if it's worth introducing a pile more > crosstype operators to change that situation --- improving > the selectivity functions to handle casts better might be a wiser > approach. So the only reason we needed the cross-data-type operators was to get better estimates? I thought without them you couldn't get an index-based plan at all. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Gregory Stark <gsstark@mit.edu> writes: > So the only reason we needed the cross-data-type operators was to get better > estimates? I thought without them you couldn't get an index-based plan at all. Oh, hm, there is that --- you won't get a nestloop with inner indexscan unless the join expression uses the unmodified inner variable (unless you do something weird like provide an index on the casted value...) However, we need to be pretty wary about widening the families unless we're sure that the semantics are right. In particular, I think that numeric-vs-float crosstype operators would violate the transitive law: you could have values for which A=B and B=C but A!=C. This is because we smash numerics to float for comparison, and so there are distinct numeric values that can compare equal to the same float. bigint against float same problem. It'd be OK to integrate integers and numeric into one class, but how much real value is there in that? regards, tom lane
Hope you had a nice holiday. "Tom Lane" <tgl@sss.pgh.pa.us> writes: > Gregory Stark <gsstark@mit.edu> writes: >> So the only reason we needed the cross-data-type operators was to get better >> estimates? I thought without them you couldn't get an index-based plan at all. > > Oh, hm, there is that --- you won't get a nestloop with inner indexscan > unless the join expression uses the unmodified inner variable (unless > you do something weird like provide an index on the casted value...) Hm, so you're saying this case: postgres=# create table a (a numeric); CREATE TABLE postgres=# create table b (b integer); CREATE TABLE postgres=# create index aa on a(a); CREATE INDEX postgres=# create index bb on b(b); CREATE INDEX postgres=# set enable_seqscan = off; SET So this case works (though I find the actual plan chosen to be a bit surprising here): postgres=# explain select * from a where a = 1; QUERY PLAN -----------------------------------------------------------------Bitmap Heap Scan on a (cost=4.05..13.51 rows=6 width=32) Recheck Cond: (a = 1::numeric) -> Bitmap Index Scan on aa (cost=0.00..4.05 rows=6 width=0) Index Cond:(a = 1::numeric) (4 rows) But this case doesn't work: postgres=# explain select * from a join b on (a=b); QUERY PLAN --------------------------------------------------------------------------------Merge Join (cost=100000149.78..100000448.70rows=13161 width=36) Merge Cond: (a.a = "inner"."?column2?") -> Index Scan using aa ona (cost=0.00..62.45 rows=1230 width=32) -> Sort (cost=100000149.78..100000155.13 rows=2140 width=4) Sort Key:(b.b)::numeric -> Seq Scan on b (cost=100000000.00..100000031.40 rows=2140 width=4) (6 rows) That doesn't seem to even fit your description; the join is in fact on the unmodified inner variable here, no? Is this a bug? > However, we need to be pretty wary about widening the families unless > we're sure that the semantics are right. In particular, I think that > numeric-vs-float crosstype operators would violate the transitive law: > you could have values for which A=B and B=C but A!=C. This is because > we smash numerics to float for comparison, and so there are distinct > numeric values that can compare equal to the same float. bigint against > float same problem. It'd be OK to integrate integers and numeric into > one class, but how much real value is there in that? I'm not sure. In all of my schemas I've rarely used NUMERIC and never for an indexed column. But I'm sure there are users out there with databases (probably existing databases from other database products) which use NUMERIC extensively. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Gregory Stark <stark@enterprisedb.com> writes: > Merge Join (cost=100000149.78..100000448.70 rows=13161 width=36) > Merge Cond: (a.a = "inner"."?column2?") > -> Index Scan using aa on a (cost=0.00..62.45 rows=1230 width=32) > -> Sort (cost=100000149.78..100000155.13 rows=2140 width=4) > Sort Key: (b.b)::numeric > -> Seq Scan on b (cost=100000000.00..100000031.40 rows=2140 width=4) > That doesn't seem to even fit your description; the join is in fact on the > unmodified inner variable here, no? Is this a bug? No, the join condition is "a.a = b.b::numeric", because the only usable = operator is numeric-eq-numeric. If we were to create a numeric-eq-int operator then use of an indexscan on b would be possible. Whether this is an issue in real-world usage is debatable, though. Certainly my advice to anyone worried about the performance of such a join would be to change the numeric column to an integer type --- without that you're going to have performance problems anyway, just because numeric arithmetic is slow. regards, tom lane
"Tom Lane" <tgl@sss.pgh.pa.us> writes: > Gregory Stark <stark@enterprisedb.com> writes: >> Merge Join (cost=100000149.78..100000448.70 rows=13161 width=36) >> Merge Cond: (a.a = "inner"."?column2?") >> -> Index Scan using aa on a (cost=0.00..62.45 rows=1230 width=32) >> -> Sort (cost=100000149.78..100000155.13 rows=2140 width=4) >> Sort Key: (b.b)::numeric >> -> Seq Scan on b (cost=100000000.00..100000031.40 rows=2140 width=4) > >> That doesn't seem to even fit your description; the join is in fact on the >> unmodified inner variable here, no? Is this a bug? > > No, the join condition is "a.a = b.b::numeric", because the only usable = > operator is numeric-eq-numeric. If we were to create a numeric-eq-int > operator then use of an indexscan on b would be possible. Hm, that's a thought. We could add some choice operators such as = and < even if we didn't complete the set of operators to make a full transitively merge-joinable set of operators. Not sure that buys much except some brain cells dealing with the pg_operator columns though. It does seem like those merge-joinable operator columns ought to be automatically filled in based on the btree strategy numbers. > is an issue in real-world usage is debatable, though. Certainly my > advice to anyone worried about the performance of such a join would be > to change the numeric column to an integer type --- without that you're > going to have performance problems anyway, just because numeric > arithmetic is slow. Well I don't think those performance problems are comparable. Numeric arithmetic is slow, and numeric storage is inefficient but both of those costs are proportional. A user may well find it entirely tolerable that his database runs 50% slower than his DBA tells him it can. But if his application runs 1000x slower or worse then effectively Postgres just doesn't work for his application at all. To be honest I'm just as happy to be told it's something we're not too concerned about. pg_operator entries are a pain to maintain. We haven't seen many of the "why is my query using a sequential scan" type queries since the cross-data-type operators went in so perhaps my concern is misplaced. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com