Thread: Re: type conversion discussion
Tom Lane writes: > If there are multiple possibilities, we choose the one which is the > "least promoted" by some yet-to-be-determined metric. A "metric" is often one step away from "unpredictable behaviour", at least for users. But if you look in practice then there are not really any existing functions that could benefit from this, at least not if the other proposals go through and/or a few function entries are added or removed. Let's say you have a function foo(float8, int2) and one foo(float4, int8) and you call it with (you guessed it) float4 and int2. Which do you take? If there is a good reason that these two functions exist separate in the way they are then the decision should probably not be made by some casting-cost metric but by the user. If there is no good reason that the functions are like this then perhaps the implementator should be made aware of the (useless?) ambiguity and maybe provide a grand unified (float8, int8) version. If you do want to support this sort of ambiguity resolution then the metric should IMHO be how many arguments you had to cast away from the input type at all. That *might* give reasonable behaviour for users, though I'm not completely sure. -- Peter Eisentraut Sernanders väg 10:115 peter_e@gmx.net 75262 Uppsala http://yi.org/peter-e/ Sweden
Peter Eisentraut <peter_e@gmx.net> writes: > But if you look in practice then there are not really any existing > functions that could benefit from this, While looking at existing functions can help detect problems in a proposal like this, the standard set of functions is *not* the be-all and end-all; the whole point is that we are trying to build an extensible system. So I'm pretty suspicious of any argument that begins in the above fashion ;-) > Let's say you have a function foo(float8, int2) and one foo(float4, int8) > and you call it with (you guessed it) float4 and int2. Which do you take? A good point; I wouldn't object to returning an error if we determine that there are multiple equally-good possibilities. But, again, the sticky question is equally good according to what metric? > If you do want to support this sort of ambiguity resolution then the > metric should IMHO be how many arguments you had to cast away from the > input type at all. Most of the cases we will be dealing with in practice are operators of one or two arguments, so a metric that only has two or three possible values (respectively) is not going to be very useful... especially since the exact-match case is not interesting. With the above metric we'd never be able to resolve any ambiguous unary-operator cases at all! regards, tom lane
Tom Lane writes: > > Let's say you have a function foo(float8, int2) and one foo(float4, int8) > > and you call it with (you guessed it) float4 and int2. Which do you take? > > A good point; I wouldn't object to returning an error if we determine > that there are multiple equally-good possibilities. But, again, the > sticky question is equally good according to what metric? IMO that metric should be "existance". Anything else is bound to be non-obvious. Surely some breadth-first or depth-first search through the imaginary casting tree would yield reasonable results, but it's still confusing. I don't see any good reasons why one would have such ambiguously overloaded functions, though I'd be very interested to see one. In fact one might consider preventing *creation* of such setups. That's what programming languages would do. If I'm not mistaken then what I'm saying is that overloaded functions must form a lattice when ordered according to the elsewhere proposed promotion hierarchy of their arguments. That ought to be a doable thing to check for and then we could also use lattice concepts to find the best fitting function. Gotta work this out in detail though. -- Peter Eisentraut Sernanders väg 10:115 peter_e@gmx.net 75262 Uppsala http://yi.org/peter-e/ Sweden
I wrote: > If I'm not mistaken then what I'm saying is that overloaded functions > must form a lattice [...] Indeed, I was. This attacks the problem from a different side, namely not allowing setups that are ambiguous in the first place. In detail: If T1 and T2 are datatypes then T1 < T2 iff T1 is allowed to be implicitly converted to T2, according to the promotion scheme du jour. If neither T1 promotes to T2 nor vice versa then T1 and T2 are not comparable. Let A and B be functions with argument types (a1, a2, ... an) and (b1, b2, ... bn) respectively. Then we say that A < B iff ai < bi for all i=1..n. If neither A < B nor B > A then A and B are not comparable. (That would include such cases as foo(int2, float8) and foo(int8, float4).) Define a partition on the set of all functions. Two functions F(f1, f2, ... fm) and G(g1, g2, ... gn) are in the same equivalence class iff "F" = "G" (same name), m = n, and fi and gi are comparable for all i = 1..m. Now I propose that you always ensure (preferably at function creation time) that each equivalence class is a lattice under the partial order described in the previous paragraph. This helps because... Given a function call Q that you want to resolve you first identify its equivalence class. Then there are three possibilities: 1) Q is not comparable to any other function in the equivalence class. That means that the provided set of functions is incomplete. Proof: Assume there was a function P which could handle call Q. Then, since you can only cast "up", the arguments of P must at each position be >= those of Q. But then P > Q. 2) Q is greater than any element in the equivalence class. This also means that the set of provided function is unable to handle Q. 3) There exists at least one function P in the equivalence class such that Q <= P. Consider the set A of all functions in the equivalence class that are >= Q. Since the class is a lattice, A has a (unique) greatest lower bound. That's the function you call. Raise your hand if you are lost... An example, say you define foo(int2, float8) and now want to add foo(int8, float4), then the system would reject that because of potential ambiguity and require adding foo(int8, float8) and foo(int2, float4) before creating foo(int8, float4); or you would perhaps realize what you are doing and instead just provide foo(int8, float8), or whatever you really wanted in the first place. (Actually, the requirement that you add foo(int8, float8) is useless, since there is no down-casting and we don't use the least upper bound property of lattices, so probably in practice one could settle for a "half-lattice" requirement.) Now I realize that this a pretty deep proposal, but at least it's a solidly founded one (I think), it prevents the programmer-user from doing unwise things, and it avoids any unpredictable "metrics". -- Peter Eisentraut Sernanders väg 10:115 peter_e@gmx.net 75262 Uppsala http://yi.org/peter-e/ Sweden
Peter Eisentraut <peter_e@gmx.net> writes: > I propose that you always ensure (preferably at function creation time) > that each equivalence class is a lattice under the partial order described > in the previous paragraph. Er ... weren't you just objecting to metric-based resolution on the grounds that it'd be unintelligible to users? This has got that problem ten times worse. In fact, I'm not sure *you* understand it. > 3) There exists at least one function P in the equivalence class such that > Q <= P. Consider the set A of all functions in the equivalence class that > are >= Q. Since the class is a lattice, A has a (unique) greatest lower > bound. That's the function you call. I don't think so. The lattice property only says that the set A has a glb within the equivalence class. AFAICT it doesn't promise that the glb will be >= Q, so you can't necessarily use the glb as the function to call. The reason why is that Q isn't necessarily part of the set of given functions, so it's not necessarily one of the lower bounds that the lattice property says there is a greatest of. If that wasn't what you had in mind, then you are confusing a lattice on the set of *all possible* functions with a lattice over the set of functions that actually are present ... and it looks to me like different parts of your argument assume different things. > An example, say you define foo(int2, float8) and now want to add foo(int8, > float4), then the system would reject that because of potential ambiguity If your equivalence set contains just the four functions (abbreviating in the obvious way)f(i2, i2)f(i2, f8)f(i8, f4)f(f8, f8) then each two-element subset of this set has a unique glb and lub within the set, which makes it a lattice if I'm reading my dusty old abstract algebra textbook correctly. There may be a property that makes things work the way you suggest but it's not the lattice property. A more general comment is that mathematical purity is only one of the considerations here, and not necessarily even one of the most important considerations ;-) regards, tom lane
Tom Lane writes: > Er ... weren't you just objecting to metric-based resolution on the > grounds that it'd be unintelligible to users? Well, since punting in the face of multiple possibilities isn't really an improvement over that status quo, I kept looking. The key difference between this (or similarly-minded) proposals and the context in which you originally mentioned the metrics is that mine works at function creation time, not at call resolution time. A function creation is usually a well-controlled event, so an error message along the lines of "if you create this function then you create an ambiguity with function x because a call y could not be resolved deterministically" is helpful. "Cannot resolve call x between y and z" at call time is annoying (get the programer on the phone and ask him to clean up his mess). Part of my proposal was a method for statically determining ambiguities before it's too late. Secondly, yes, lattices and related business are more complicated than say a breadth-first search. But that need only be internally. Externally, it tends to give intuitive behaviour because there is never an ambiguity, whereas a BFS would presumably rely on the order of the arguments or other such incidental things. > > 3) There exists at least one function P in the equivalence class such that > > Q <= P. Consider the set A of all functions in the equivalence class that > > are >= Q. Since the class is a lattice, A has a (unique) greatest lower > > bound. That's the function you call. > > I don't think so. The lattice property only says that the set A has a > glb within the equivalence class. AFAICT it doesn't promise that the > glb will be >= Q, so you can't necessarily use the glb as the function > to call. Since all functions in A are >=Q by definition, Q is at least _a_ lower bound on A. The glb(A) is also a lower bound on A, and since it's the greatest it must also be >=Q. The case where glb(A)=Q is when the call Q matches one existing signature exactly, in that case you call that function. Otherwise the glb represents the "next function up", and if there is an algorithm to find the glb of a poset then there is also an algorithm to resolve function calls. > A more general comment is that mathematical purity is only one of the > considerations here, and not necessarily even one of the most important > considerations ;-) Well, at least it was fun for me to work it out. :-) No, seriously, what are the considerations? * ease of implementation * efficiency * predictability * intuitiveness * verifyability * scalability * simplicity I think I got at least half of that covered, with no contenders yet at the surface. -- Peter Eisentraut Sernanders väg 10:115 peter_e@gmx.net 75262 Uppsala http://yi.org/peter-e/ Sweden
Peter Eisentraut <peter_e@gmx.net> writes: >> I don't think so. The lattice property only says that the set A has a >> glb within the equivalence class. AFAICT it doesn't promise that the >> glb will be >= Q, so you can't necessarily use the glb as the function >> to call. > Since all functions in A are >=Q by definition, Q is at least _a_ lower > bound on A. The glb(A) is also a lower bound on A, and since it's the > greatest it must also be >=Q. No, you're not catching my point. glb(A) is the greatest lower bound *within the set of available functions*. Q, the requested call signature, is *not* in that set (if it were then we'd not have any ambiguity to resolve, because there's an exact match). The fact that the set of available functions forms a lattice gives you no guarantee whatever that glb(A) >= Q, because Q is not constrained by the lattice property. regards, tom lane
One thing that has gotten lost here is whether there is any market at all for putting in some line of defence against (a tbd degree of) ambiguity at function creation time to reduce the possible problems (implementation and user-side) at call time? What do you think? Tom Lane writes: > glb(A) is the greatest lower bound *within the set of available > functions*. Correct. > Q, the requested call signature, is *not* in that set Correct. > The fact that the set of available functions forms a lattice gives you > no guarantee whatever that glb(A) >= Q, because Q is not constrained > by the lattice property. I know. I don't use the lattice property to deduce that fact hat glb(A)>=Q. I use the lattice property to derive the existance of glb(A). The result glb(A)>=Q comes from 1. Q is a lower bound on A (by definition of A) 2. glb(A) is a lower bound on A (by definition of glb) 3. glb(A)>=Q (by definiton of "greatest") Recall that A was defined as the set of functions >=Q in Q's equivalence class, and was guaranteed to be non-empty by treating the other cases separately. I think it works. :) In all but the most complicated cases this really decays to the obvious behaviour, but on the other hand it scales infinitely. -- Peter Eisentraut Sernanders väg 10:115 peter_e@gmx.net 75262 Uppsala http://yi.org/peter-e/ Sweden