Re: Doing better at HINTing an appropriate column within errorMissingColumn() - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Doing better at HINTing an appropriate column within errorMissingColumn()
Date
Msg-id 362.1402977399@sss.pgh.pa.us
Whole thread Raw
In response to Re: Doing better at HINTing an appropriate column within errorMissingColumn()  (Peter Geoghegan <pg@heroku.com>)
Responses Re: Doing better at HINTing an appropriate column within errorMissingColumn()  (Peter Geoghegan <pg@heroku.com>)
List pgsql-hackers
Peter Geoghegan <pg@heroku.com> writes:
> On Mon, Jun 16, 2014 at 7:09 PM, Ian Barwick <ian@2ndquadrant.com> wrote:
>> Howver in this particular use case, as long as it doesn't produce false
>> positives (I haven't looked at the patch) I don't think it would cause
>> any problems (of the kind which would require actively excluding certain
>> languages/character sets), it just wouldn't be quite as useful.

> I'm not sure what you mean by false positives. The patch just shows a
> HINT, where before there was none. It's possible for any number of
> reasons that it isn't the most useful possible suggestion, since
> Levenshtein distance is used as opposed to any other scheme that might
> be better sometimes. I think that the hint given is a generally useful
> piece of information in the event of an ERRCODE_UNDEFINED_COLUMN
> error. Obviously I think the patch is worthwhile, but fundamentally
> the HINT given is just a guess, as with the existing HINTs.

Not having looked at the patch, but: I think the probability of
useless-noise HINTs could be substantially reduced if the code prints a
HINT only when there is a single available alternative that is clearly
better than the others in Levenshtein distance.  I'm not sure how much
better is "clearly better", but I exclude "zero" from that.  I see that
the original description of the patch says that it will arbitrarily
choose one alternative when there are several with equal Levenshtein
distance, and I'd say that's a bad idea.

You could possibly answer this objection by making the HINT list *all*
the alternatives meeting the minimum Levenshtein distance.  But I think
that's probably overcomplicated and of uncertain value anyhow.  I'd rather
have a rule that "we print only the choice that is at least K units better
than any other choice", where K remains to be determined exactly.
        regards, tom lane



pgsql-hackers by date:

Previous
From: Ian Barwick
Date:
Subject: Re: Doing better at HINTing an appropriate column within errorMissingColumn()
Next
From: Amit Kapila
Date:
Subject: Re: postgresql.auto.conf read from wrong directory