Thread: pre-proposal: type interfaces
I am starting to plan a few features that are important for temporal data, and one prerequisite for several of them is the ability to find an operator that fills a certain role. For instance, one feature that I'm considering now is a "temporal join" which is a join on "overlaps" rather than "equals", e.g.: SELECT * FROM a, b WHERE a.x && b.x; I might try to provide a modified merge join algorithm to implement that more efficiently in some cases. I don't mean to discuss the feature in detail here (I will make a separate proposal) but the algorithm requires that I find the "strictly left of" operator. So, after I recognize that a temporal join is required, I need to be able to identify the << operator. But how? In other languages, it would probably be done with something like an "interface", but we don't have that concept here. The internals generally use operators attached to the default btree opclass, but I don't think that works very well here. The way I see it, we have two approaches:1. Try to make the current system work by standardizing the strategy numbers for GiST somehow, and then use the default GiST operator class, if available.2. Invent a new system, perhaps interfaces, perhaps something else.3. Use extra flags in CREATE OPERATOR somehow Thoughts? Regards,Jeff Davis
Jeff Davis <pgsql@j-davis.com> writes: > For instance, one feature that I'm considering now is a "temporal join" > which is a join on "overlaps" rather than "equals", e.g.: > SELECT * FROM a, b WHERE a.x && b.x; > I might try to provide a modified merge join algorithm to implement that > more efficiently in some cases. I don't mean to discuss the feature in > detail here (I will make a separate proposal) but the algorithm requires > that I find the "strictly left of" operator. Well, actually you need *two* things. The first prerequisite is to know that the operator named && has the semantics of "overlaps" in some generalized sense. The second prerequisite is to identify the associated "strictly left of" operator. As you say, we've done this in the past using operator classes. That's worked reasonably well because what the existing code needs to know about is equality, ordering, and hashing, and that all matches up nicely with btree and hash opclasses. > The way I see it, we have two approaches: > 1. Try to make the current system work by standardizing the strategy > numbers for GiST somehow, and then use the default GiST operator class, > if available. This proposal is a non-starter, because one of the whole points of GIST is that it can support multiple sets of semantics. There is no reason at all to assume that every GIST opclass must include operators having the semantics of overlaps and to-left-of. > 2. Invent a new system, perhaps interfaces, perhaps something else. > 3. Use extra flags in CREATE OPERATOR somehow The thing that would require the least amount of invention is to use the operator class/family machinery with a new index type. Perhaps a dummy entry in pg_am would be acceptable. (You would probably create just operator families, with no contained classes, since the only point of a class is to identify the minimum support set for an index and there won't be any indexes.) An alternative is to somehow mark those GIST opclasses that do meet the expectation about operator semantics. regards, tom lane
On Fri, Oct 23, 2009 at 7:34 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> The way I see it, we have two approaches: >> 1. Try to make the current system work by standardizing the strategy >> numbers for GiST somehow, and then use the default GiST operator class, >> if available. > > This proposal is a non-starter, because one of the whole points of GIST > is that it can support multiple sets of semantics. There is no reason > at all to assume that every GIST opclass must include operators having > the semantics of overlaps and to-left-of. I always thought it was strange that the GIST strategy numbers were completely meaningless. It does seem like assigning meaning to strategy numbers gradually as we learn new interrelated indexable strategies. We would still have a range of values for new non-standard semantics, but at least the common ones would be nailed down. I'm not sure that solves the problem because the "default" gist operator class isn't necessarily going to be the one with the strategies this needs, though I suppose normally people would want to make the default operator class for each data type one which supports the most strategies. -- greg
Greg Stark <gsstark@mit.edu> writes: > I always thought it was strange that the GIST strategy numbers were > completely meaningless. It does seem like assigning meaning to > strategy numbers gradually as we learn new interrelated indexable > strategies. We would still have a range of values for new non-standard > semantics, but at least the common ones would be nailed down. Well, the problem with that is that GIST strategy numbers are historically an internal implementation detail for any particular opclass, and so trying to standardize them now is going to mean lots of incompatibility with third-party opclasses. I'd feel more comfortable with being able to add some flags to an opclass (or more likely an opfamily) that assert that its strategy numbers agree with some convention or other. Maybe a bitmap so that there's room for multiple future conventions of this kind? For instance it's certainly conceivable that a GIST class could support both btree-compatible and overlaps operators. regards, tom lane
Greg Stark <gsstark@mit.edu> writes: > I'm not sure that solves the problem because the "default" gist > operator class isn't necessarily going to be the one with the > strategies this needs, Forgot to mention: I do not think default-ness of opclasses enters into it at all. The meaning of the query is fully defined by the operator that is in it. All we need to know is what are the semantics of that operator. If we can find it in the "overlaps" position of *any* opclass, we are entitled to suppose that it behaves like overlaps and the associated left-of operator can be used to optimize it. Conceivably we could get different left-of operators out of different opclasses, but if they don't behave effectively the same, the user has messed up the opclasses. As an example, suppose we are trying to implement DISTINCT via sort-and-unique, and we've chosen an appropriate '=' operator that defines what distinct-ness means. That operator might appear in more than one opclass having more than one sort order, but it doesn't matter which one we choose. As long as the opclasses are self-consistent, we should get correct answers with any one. The case where default-ness of opclasses matters is where we are trying to assign specific meaning to some generic construct like DISTINCT or ORDER BY. For instance, it makes sense to require that ORDER BY be interpreted by reference to a default opclass, because otherwise you don't have a way to know which sort ordering the user wants. regards, tom lane
On Fri, 2009-10-23 at 16:25 -0400, Tom Lane wrote: > Forgot to mention: I do not think default-ness of opclasses enters > into it at all. The meaning of the query is fully defined by the > operator that is in it. All we need to know is what are the > semantics of that operator. If we can find it in the "overlaps" > position of *any* opclass, we are entitled to suppose that it > behaves like overlaps and the associated left-of operator can be > used to optimize it. Interesting, that sounds we've got a good approach to the problem now. This thread has been useful. > Conceivably we could get different left-of > operators out of different opclasses, but if they don't behave > effectively the same, the user has messed up the opclasses. It would probably be worthwhile to make an attempt to throw a useful error there, but I agree it's not really a problem. > The case where default-ness of opclasses matters is where we are > trying to assign specific meaning to some generic construct like > DISTINCT or ORDER BY. For instance, it makes sense to require > that ORDER BY be interpreted by reference to a default opclass, > because otherwise you don't have a way to know which sort ordering > the user wants. That makes sense. Thanks,Jeff Davis
Jeff Davis <pgsql@j-davis.com> writes: > On Fri, 2009-10-23 at 16:25 -0400, Tom Lane wrote: >> Conceivably we could get different left-of >> operators out of different opclasses, but if they don't behave >> effectively the same, the user has messed up the opclasses. > It would probably be worthwhile to make an attempt to throw a useful > error there, but I agree it's not really a problem. Sure, right after we solve the halting problem ;-). The point I was trying to make is that getting different operators is not wrong. It's only wrong if their behavior isn't consistent with what the opclass asserts, and there's no practical way to determine that. We have to trust the opclass maker. (This is one of the main reasons why CREATE OPERATOR CLASS is superuser-only.) regards, tom lane
Still on the phone... -- dim Le 23 oct. 2009 à 21:16, Tom Lane <tgl@sss.pgh.pa.us> a écrit : > I'd feel more comfortable with being able to add some flags to an > opclass (or more likely an opfamily) that assert that its strategy > numbers agree with some convention or other. The overlap operator only concern multi dimensional data types I think, it seems to me we can use the term "range". So whart about applying semantics to strategy numbers for RANGE OPERATOR CLASS ... GIST ... I'm not sure how scalable we want to be there, so maybe those (options, range, ...) would be better, but the flexibility Will certainly not be that good as each option Will require unique strategy numbers... Regards,