Thread: Unknown-type resolution rules, redux

Unknown-type resolution rules, redux

From
Tom Lane
Date:
parse_coerce.c contains the following conversation --- I believe the
first XXX comment is from me and the second from you:
   /*    * Still too many candidates? Try assigning types for the unknown    * columns.    *    * We do this by
examiningeach unknown argument position to see if all    * the candidates agree on the type category of that slot.  If
so,and    * if some candidates accept the preferred type in that category,    * eliminate the candidates with other
inputtypes.  If we are down to    * one candidate at the end, we win.    *    * XXX It's kinda bogus to do this
left-to-right,isn't it?  If we    * eliminate some candidates because they are non-preferred at the    * first slot, we
won'tnotice that they didn't have the same type    * category for a later slot.    * XXX Hmm. How else would you do
this?These candidates are here because    * they all have the same number of matches on arguments with explicit    *
types,so from here on left-to-right resolution is as good as any.    * Need a counterexample to see otherwise...    */
 

The comment is out of date anyway because it fails to mention the new
rule about preferring STRING category.  But to answer your request for
a counterexample: consider
SELECT foo('bar', 'baz')

First, suppose the available candidates are
foo(float8, int4)foo(float8, point)

In this case, we examine the first argument position, see that all the
candidates agree on NUMERIC category, so we consider resolving the first
unknown input to float8.  That eliminates neither candidate so we move
on to the second argument position.  Here there is a conflict of
categories so we can't eliminate anything, and we decide the call is
ambiguous.  That's correct (or at least Operating As Designed ;-)).

But now suppose we have
foo(float8, int4)foo(float4, point)

Here, at the first position we will still see that all candidates agree
on NUMERIC category, and then we will eliminate candidate 2 because it
isn't the preferred type in that category.  Now when we come to the
second argument position, there's only one candidate left so there's
no category conflict.  Result: this call is considered non-ambiguous.

This means there is a left-to-right bias in the algorithm.  For example,
the exact same call *would* be considered ambiguous if the candidates'
argument orders were reversed:
foo(int4, float8)foo(point, float4)

I do not like that.  You could maybe argue that earlier arguments are
more important than later ones for functions, but it's harder to make
that case for binary operators --- and in any case this behavior is
extremely difficult to explain in prose.

To fix this, I think we need to split the loop into two passes.
The first pass does *not* remove any candidates.  What it does is to
look separately at each UNKNOWN-argument position and attempt to deduce
a probable category for it, using the following rules:

* If any candidate has an input type of STRING category, use STRING
category; else if all candidates agree on the category, use that
category; else fail because no resolution can be made.

* The first pass must also remember whether any candidates are of a
preferred type within the selected category.

The probable categories and exists-preferred-type booleans are saved in
local arrays.  (Note this has to be done this way because
IsPreferredType currently allows more than one type to be considered
preferred in a category ... so the first pass cannot try to determine a
unique type, only a category.)

If we find a category for every UNKNOWN arg, then we enter a second loop
in which we discard candidates.  In this pass we discard a candidate if
(a) it is of the wrong category, or (b) it is of the right category but
is not of preferred type in that category, *and* we found candidate(s)
of preferred type at this slot.

If we end with exactly one candidate then we win.

It is clear in this algorithm that there is no order dependency: the
conditions for keeping or discarding a candidate are fixed before we
start the second pass, and do not vary depending on which other
candidates were discarded before it.

Comments?
        regards, tom lane


Re: Unknown-type resolution rules, redux

From
Thomas Lockhart
Date:
> It is clear in this algorithm that there is no order dependency: the
> conditions for keeping or discarding a candidate are fixed before we
> start the second pass, and do not vary depending on which other
> candidates were discarded before it.

I won't argue strongly for either solution, but have the deep-seating
(but vague) feeling that a left to right resolution algorithm is easier
to explain, hence to understand, hence to predict, hence to use. An
extra pass will solve the edge case you describe in perhaps a "better"
order.

I do think that the two algorithms under discussion are better than what
we've had in the past. Comments from others?
                   - Thomas


Re: Unknown-type resolution rules, redux

From
Bruce Momjian
Date:
Added to TODO.detail.

> parse_coerce.c contains the following conversation --- I believe the
> first XXX comment is from me and the second from you:
> 
>     /*
>      * Still too many candidates? Try assigning types for the unknown
>      * columns.
>      *
>      * We do this by examining each unknown argument position to see if all
>      * the candidates agree on the type category of that slot.  If so, and
>      * if some candidates accept the preferred type in that category,
>      * eliminate the candidates with other input types.  If we are down to
>      * one candidate at the end, we win.
>      *
>      * XXX It's kinda bogus to do this left-to-right, isn't it?  If we
>      * eliminate some candidates because they are non-preferred at the
>      * first slot, we won't notice that they didn't have the same type
>      * category for a later slot.
>      * XXX Hmm. How else would you do this? These candidates are here because
>      * they all have the same number of matches on arguments with explicit
>      * types, so from here on left-to-right resolution is as good as any.
>      * Need a counterexample to see otherwise...
>      */
> 
> The comment is out of date anyway because it fails to mention the new
> rule about preferring STRING category.  But to answer your request for
> a counterexample: consider
> 
>     SELECT foo('bar', 'baz')
> 
> First, suppose the available candidates are
> 
>     foo(float8, int4)
>     foo(float8, point)
> 
> In this case, we examine the first argument position, see that all the
> candidates agree on NUMERIC category, so we consider resolving the first
> unknown input to float8.  That eliminates neither candidate so we move
> on to the second argument position.  Here there is a conflict of
> categories so we can't eliminate anything, and we decide the call is
> ambiguous.  That's correct (or at least Operating As Designed ;-)).
> 
> But now suppose we have
> 
>     foo(float8, int4)
>     foo(float4, point)
> 
> Here, at the first position we will still see that all candidates agree
> on NUMERIC category, and then we will eliminate candidate 2 because it
> isn't the preferred type in that category.  Now when we come to the
> second argument position, there's only one candidate left so there's
> no category conflict.  Result: this call is considered non-ambiguous.
> 
> This means there is a left-to-right bias in the algorithm.  For example,
> the exact same call *would* be considered ambiguous if the candidates'
> argument orders were reversed:
> 
>     foo(int4, float8)
>     foo(point, float4)
> 
> I do not like that.  You could maybe argue that earlier arguments are
> more important than later ones for functions, but it's harder to make
> that case for binary operators --- and in any case this behavior is
> extremely difficult to explain in prose.
> 
> To fix this, I think we need to split the loop into two passes.
> The first pass does *not* remove any candidates.  What it does is to
> look separately at each UNKNOWN-argument position and attempt to deduce
> a probable category for it, using the following rules:
> 
> * If any candidate has an input type of STRING category, use STRING
> category; else if all candidates agree on the category, use that
> category; else fail because no resolution can be made.
> 
> * The first pass must also remember whether any candidates are of a
> preferred type within the selected category.
> 
> The probable categories and exists-preferred-type booleans are saved in
> local arrays.  (Note this has to be done this way because
> IsPreferredType currently allows more than one type to be considered
> preferred in a category ... so the first pass cannot try to determine a
> unique type, only a category.)
> 
> If we find a category for every UNKNOWN arg, then we enter a second loop
> in which we discard candidates.  In this pass we discard a candidate if
> (a) it is of the wrong category, or (b) it is of the right category but
> is not of preferred type in that category, *and* we found candidate(s)
> of preferred type at this slot.
> 
> If we end with exactly one candidate then we win.
> 
> It is clear in this algorithm that there is no order dependency: the
> conditions for keeping or discarding a candidate are fixed before we
> start the second pass, and do not vary depending on which other
> candidates were discarded before it.
> 
> Comments?
> 
>             regards, tom lane
> 


--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: Unknown-type resolution rules, redux

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Added to TODO.detail.

Might as well take it out again; that's already done.
        regards, tom lane


Re: Unknown-type resolution rules, redux

From
Bruce Momjian
Date:
Removed.  Thanks.

> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Added to TODO.detail.
> 
> Might as well take it out again; that's already done.
> 
>             regards, tom lane
> 


--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026