Thread: Re: type conversion discussion

Re: type conversion discussion

From
Peter Eisentraut
Date:
Tom Lane writes:

> If there are multiple possibilities, we choose the one which is the
> "least promoted" by some yet-to-be-determined metric.

A "metric" is often one step away from "unpredictable behaviour", at least
for users.

But if you look in practice then there are not really any existing
functions that could benefit from this, at least not if the other
proposals go through and/or a few function entries are added or removed.

Let's say you have a function foo(float8, int2) and one foo(float4, int8)
and you call it with (you guessed it) float4 and int2. Which do you take?
If there is a good reason that these two functions exist separate in the
way they are then the decision should probably not be made by some
casting-cost metric but by the user. If there is no good reason that the
functions are like this then perhaps the implementator should be made
aware of the (useless?) ambiguity and maybe provide a grand unified
(float8, int8) version.

If you do want to support this sort of ambiguity resolution then the
metric should IMHO be how many arguments you had to cast away from the
input type at all. That *might* give reasonable behaviour for users,
though I'm not completely sure.

-- 
Peter Eisentraut                  Sernanders väg 10:115
peter_e@gmx.net                   75262 Uppsala
http://yi.org/peter-e/            Sweden



Re: type conversion discussion

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> But if you look in practice then there are not really any existing
> functions that could benefit from this,

While looking at existing functions can help detect problems in a
proposal like this, the standard set of functions is *not* the be-all
and end-all; the whole point is that we are trying to build an
extensible system.  So I'm pretty suspicious of any argument that
begins in the above fashion ;-)

> Let's say you have a function foo(float8, int2) and one foo(float4, int8)
> and you call it with (you guessed it) float4 and int2. Which do you take?

A good point; I wouldn't object to returning an error if we determine
that there are multiple equally-good possibilities.  But, again, the
sticky question is equally good according to what metric?

> If you do want to support this sort of ambiguity resolution then the
> metric should IMHO be how many arguments you had to cast away from the
> input type at all.

Most of the cases we will be dealing with in practice are operators of
one or two arguments, so a metric that only has two or three possible
values (respectively) is not going to be very useful... especially
since the exact-match case is not interesting.  With the above metric
we'd never be able to resolve any ambiguous unary-operator cases at all!
        regards, tom lane


Re: type conversion discussion

From
Peter Eisentraut
Date:
Tom Lane writes:

> > Let's say you have a function foo(float8, int2) and one foo(float4, int8)
> > and you call it with (you guessed it) float4 and int2. Which do you take?
> 
> A good point; I wouldn't object to returning an error if we determine
> that there are multiple equally-good possibilities.  But, again, the
> sticky question is equally good according to what metric?

IMO that metric should be "existance". Anything else is bound to be
non-obvious. Surely some breadth-first or depth-first search through the
imaginary casting tree would yield reasonable results, but it's still
confusing. I don't see any good reasons why one would have such
ambiguously overloaded functions, though I'd be very interested to see
one.

In fact one might consider preventing *creation* of such setups. That's
what programming languages would do. If I'm not mistaken then what I'm
saying is that overloaded functions must form a lattice when ordered
according to the elsewhere proposed promotion hierarchy of their
arguments. That ought to be a doable thing to check for and then we could
also use lattice concepts to find the best fitting function. Gotta work
this out in detail though.


-- 
Peter Eisentraut                  Sernanders väg 10:115
peter_e@gmx.net                   75262 Uppsala
http://yi.org/peter-e/            Sweden



Re: type conversion discussion

From
Peter Eisentraut
Date:
I wrote:

> If I'm not mistaken then what I'm saying is that overloaded functions
> must form a lattice [...]

Indeed, I was. This attacks the problem from a different side, namely not
allowing setups that are ambiguous in the first place.

In detail:

If T1 and T2 are datatypes then T1 < T2 iff T1 is allowed to be implicitly
converted to T2, according to the promotion scheme du jour. If neither T1
promotes to T2 nor vice versa then T1 and T2 are not comparable.

Let A and B be functions with argument types (a1, a2, ... an) and (b1, b2,
... bn) respectively. Then we say that A < B iff ai < bi for all i=1..n.
If neither A < B nor B > A then A and B are not comparable. (That would
include such cases as foo(int2, float8) and foo(int8, float4).)

Define a partition on the set of all functions. Two functions F(f1, f2,
... fm) and G(g1, g2, ... gn) are in the same equivalence class iff "F" =
"G" (same name), m = n, and fi and gi are comparable for all i = 1..m. Now
I propose that you always ensure (preferably at function creation time)
that each equivalence class is a lattice under the partial order described
in the previous paragraph.

This helps because... Given a function call Q that you want to resolve you
first identify its equivalence class. Then there are three possibilities:

1) Q is not comparable to any other function in the equivalence class.
That means that the provided set of functions is incomplete. Proof: Assume
there was a function P which could handle call Q. Then, since you can only
cast "up", the arguments of P must at each position be >= those of Q. But
then P > Q.

2) Q is greater than any element in the equivalence class. This also means
that the set of provided function is unable to handle Q.

3) There exists at least one function P in the equivalence class such that
Q <= P. Consider the set A of all functions in the equivalence class that
are >= Q. Since the class is a lattice, A has a (unique) greatest lower
bound. That's the function you call.

Raise your hand if you are lost...

An example, say you define foo(int2, float8) and now want to add foo(int8,
float4), then the system would reject that because of potential ambiguity
and require adding foo(int8, float8) and foo(int2, float4) before creating
foo(int8, float4); or you would perhaps realize what you are doing and
instead just provide foo(int8, float8), or whatever you really wanted in
the first place.

(Actually, the requirement that you add foo(int8, float8) is useless,
since there is no down-casting and we don't use the least upper bound
property of lattices, so probably in practice one could settle for a
"half-lattice" requirement.)


Now I realize that this a pretty deep proposal, but at least it's a
solidly founded one (I think), it prevents the programmer-user from doing
unwise things, and it avoids any unpredictable "metrics".


-- 
Peter Eisentraut                  Sernanders väg 10:115
peter_e@gmx.net                   75262 Uppsala
http://yi.org/peter-e/            Sweden



Re: type conversion discussion

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> I propose that you always ensure (preferably at function creation time)
> that each equivalence class is a lattice under the partial order described
> in the previous paragraph.

Er ... weren't you just objecting to metric-based resolution on the
grounds that it'd be unintelligible to users?  This has got that
problem ten times worse.  In fact, I'm not sure *you* understand it.

> 3) There exists at least one function P in the equivalence class such that
> Q <= P. Consider the set A of all functions in the equivalence class that
> are >= Q. Since the class is a lattice, A has a (unique) greatest lower
> bound. That's the function you call.

I don't think so.  The lattice property only says that the set A has a
glb within the equivalence class.  AFAICT it doesn't promise that the
glb will be >= Q, so you can't necessarily use the glb as the function
to call.  The reason why is that Q isn't necessarily part of the set of
given functions, so it's not necessarily one of the lower bounds that
the lattice property says there is a greatest of.

If that wasn't what you had in mind, then you are confusing a lattice
on the set of *all possible* functions with a lattice over the set of
functions that actually are present ... and it looks to me like
different parts of your argument assume different things.

> An example, say you define foo(int2, float8) and now want to add foo(int8,
> float4), then the system would reject that because of potential ambiguity

If your equivalence set contains just the four functions (abbreviating
in the obvious way)f(i2, i2)f(i2, f8)f(i8, f4)f(f8, f8)
then each two-element subset of this set has a unique glb and lub within
the set, which makes it a lattice if I'm reading my dusty old abstract
algebra textbook correctly.  There may be a property that makes things
work the way you suggest but it's not the lattice property.

A more general comment is that mathematical purity is only one of the
considerations here, and not necessarily even one of the most important
considerations ;-)
        regards, tom lane


Re: type conversion discussion

From
Peter Eisentraut
Date:
Tom Lane writes:

> Er ... weren't you just objecting to metric-based resolution on the
> grounds that it'd be unintelligible to users?

Well, since punting in the face of multiple possibilities isn't really an
improvement over that status quo, I kept looking. The key difference
between this (or similarly-minded) proposals and the context in which you
originally mentioned the metrics is that mine works at function creation
time, not at call resolution time.

A function creation is usually a well-controlled event, so an error
message along the lines of "if you create this function then you create an
ambiguity with function x because a call y could not be resolved
deterministically" is helpful. "Cannot resolve call x between y and z" at
call time is annoying (get the programer on the phone and ask him to clean
up his mess). Part of my proposal was a method for statically determining
ambiguities before it's too late.

Secondly, yes, lattices and related business are more complicated than say
a breadth-first search. But that need only be internally. Externally, it
tends to give intuitive behaviour because there is never an ambiguity,
whereas a BFS would presumably rely on the order of the arguments or other
such incidental things.

> > 3) There exists at least one function P in the equivalence class such that
> > Q <= P. Consider the set A of all functions in the equivalence class that
> > are >= Q. Since the class is a lattice, A has a (unique) greatest lower
> > bound. That's the function you call.
> 
> I don't think so.  The lattice property only says that the set A has a
> glb within the equivalence class.  AFAICT it doesn't promise that the
> glb will be >= Q, so you can't necessarily use the glb as the function
> to call.

Since all functions in A are >=Q by definition, Q is at least _a_ lower
bound on A. The glb(A) is also a lower bound on A, and since it's the
greatest it must also be >=Q.

The case where glb(A)=Q is when the call Q matches one existing signature
exactly, in that case you call that function. Otherwise the glb represents
the "next function up", and if there is an algorithm to find the glb of a
poset then there is also an algorithm to resolve function calls.


> A more general comment is that mathematical purity is only one of the
> considerations here, and not necessarily even one of the most important
> considerations ;-)

Well, at least it was fun for me to work it out. :-)

No, seriously, what are the considerations?

* ease of implementation
* efficiency
* predictability
* intuitiveness
* verifyability
* scalability
* simplicity

I think I got at least half of that covered, with no contenders yet at the
surface.


-- 
Peter Eisentraut                  Sernanders väg 10:115
peter_e@gmx.net                   75262 Uppsala
http://yi.org/peter-e/            Sweden



Re: type conversion discussion

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
>> I don't think so.  The lattice property only says that the set A has a
>> glb within the equivalence class.  AFAICT it doesn't promise that the
>> glb will be >= Q, so you can't necessarily use the glb as the function
>> to call.

> Since all functions in A are >=Q by definition, Q is at least _a_ lower
> bound on A. The glb(A) is also a lower bound on A, and since it's the
> greatest it must also be >=Q.

No, you're not catching my point.  glb(A) is the greatest lower bound
*within the set of available functions*.  Q, the requested call
signature, is *not* in that set (if it were then we'd not have any
ambiguity to resolve, because there's an exact match).  The fact that
the set of available functions forms a lattice gives you no guarantee
whatever that glb(A) >= Q, because Q is not constrained by the lattice
property.
        regards, tom lane


Re: type conversion discussion

From
Peter Eisentraut
Date:
One thing that has gotten lost here is whether there is any market at all
for putting in some line of defence against (a tbd degree of) ambiguity at
function creation time to reduce the possible problems (implementation and
user-side) at call time? What do you think?

Tom Lane writes:

> glb(A) is the greatest lower bound *within the set of available
> functions*.

Correct.

> Q, the requested call signature, is *not* in that set

Correct.

> The fact that the set of available functions forms a lattice gives you
> no guarantee whatever that glb(A) >= Q, because Q is not constrained
> by the lattice property.

I know. I don't use the lattice property to deduce that fact hat
glb(A)>=Q. I use the lattice property to derive the existance of glb(A).
The result glb(A)>=Q comes from

1. Q is a lower bound on A (by definition of A)
2. glb(A) is a lower bound on A (by definition of glb)
3. glb(A)>=Q (by definiton of "greatest")

Recall that A was defined as the set of functions >=Q in Q's equivalence
class, and was guaranteed to be non-empty by treating the other cases
separately.

I think it works. :) In all but the most complicated cases this really
decays to the obvious behaviour, but on the other hand it scales
infinitely.


-- 
Peter Eisentraut                  Sernanders väg 10:115
peter_e@gmx.net                   75262 Uppsala
http://yi.org/peter-e/            Sweden