Re: Doing better at HINTing an appropriate column within errorMissingColumn() - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Doing better at HINTing an appropriate column within errorMissingColumn()
Date
Msg-id CAM3SWZRS=9vK01UHvz1b9_y2r5-AeB-dkyR8JQJFf_Z9-F6FXA@mail.gmail.com
Whole thread Raw
In response to Re: Doing better at HINTing an appropriate column within errorMissingColumn()  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Doing better at HINTing an appropriate column within errorMissingColumn()  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Tue, Nov 18, 2014 at 3:29 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Nov 17, 2014 at 3:04 PM, Peter Geoghegan <pg@heroku.com> wrote:
>> postgres=# select qty from orderlines ;
>> ERROR:  42703: column "qty" does not exist
>> LINE 1: select qty from orderlines ;
>>                ^
>> HINT:  Perhaps you meant to reference the column "orderlines"."quantity".
>> """
>
> I don't buy this example, because it would give you the same hint if
> you told it you wanted to access a column called ant, or uay, or tit.
> And that's clearly ridiculous.  The reason why quantity looks like a
> reasonable suggestion for qty is because it's a conventional
> abbreviation, but an extremely high percentage of comparable cases
> won't be.

Is that so terrible? Yes, if those *exact* strings are tried, that'll
happen. But the vast majority of 3 letter strings will not do that
(including many 3 letter strings that include one of the letters 'q',
't' and 'y', such as "qqq", "ttt", and "yyy"). Why, in practice, would
someone even attempt those strings? I'm worried about Murphy, not
Machiavelli. That seems like a pretty important distinction here.

I maintain that omission of part of the correct spelling should be
weighed less. I am optimizing for the case where the user has a rough
idea of the structure and spelling of things - if they're typing in
random strings, or totally distinct synonyms, there is little we can
do about that. As I said, there will always be the most marginal case
that still gets a suggestion. I see no reason to hurt the common case
where we help in order to save the user from seeing a "ridiculous"
suggestion. I have a final test for the absolute quality of a
suggestion, but I think we could easily be too conservative about
that. At worst, our "ridiculous" suggestion makes apparent that the
user's incorrect spelling was itself ridiculous. With larger strings,
we can afford to be more conservative, and we are, because we have
more information to go on. Terse column names are not uncommon,
though.

>>> +            /*
>>> +             * Charge extra (for inexact matches only) when an alias was
>>> +             * specified that differs from what might have been used to
>>> +             * correctly qualify this RTE's closest column
>>> +             */
>>> +            if (wrongalias)
>>> +                rtestate.distance += 3;
>>>
>>> I don't understand what situation this is catering to.  Can you
>>> explain?  It seems to account for a good deal of complexity.

> I guess I'm confused at a broader level.  If the alias is wrong, why
> are we considering names in this RTE *at all*?

Because it's a common mistake when writing ad-hoc queries. People may
forget which exact table their column comes from. We certainly want to
weigh the fact that an alias was specified, but it shouldn't totally
limit our guess to that RTE. If nothing else, the fact that there was
a much closer match from another RTE ought to result in forcing there
to be no suggestion (due to there being too many equally good
suggestions). That's because, as I said, an *absolute* test for the
quality of a match is problematic (which, again, is why I err on the
side of letting the final, "absolute quality" test not limit
suggestions, particularly with short strings).

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Doing better at HINTing an appropriate column within errorMissingColumn()
Next
From: Tom Lane
Date:
Subject: Re: Doing better at HINTing an appropriate column within errorMissingColumn()