Re: Doing better at HINTing an appropriate column within errorMissingColumn() - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Doing better at HINTing an appropriate column within errorMissingColumn()
Date
Msg-id CA+TgmoaxzagWBzO1ybTi8VTYW6Q0uBpN7ea05xW5p5N3hNX-Sg@mail.gmail.com
Whole thread Raw
In response to Re: Doing better at HINTing an appropriate column within errorMissingColumn()  (Peter Geoghegan <pg@heroku.com>)
Responses Re: Doing better at HINTing an appropriate column within errorMissingColumn()  (Peter Geoghegan <pg@heroku.com>)
List pgsql-hackers
On Tue, Nov 25, 2014 at 7:13 PM, Peter Geoghegan <pg@heroku.com> wrote:
> Alright, so let me summarize what I think are the next steps in
> working towards getting this patch committed. I should produce a new
> revision which:
>
> * Uses default costings.
>
> * Applies a generic final quality check that enforces a distance of no
> greater than 50% of the total string size. (The use of default
> costings removes any reason to continue to do this)
>
> * Work through Robert's suggestions on other aspects that need work
> [1], most of which I already agreed to.

Sounds good so far.

> What is unclear is whether or not I should continue to charge extra
> for non-matching user supplied alias (and, I think more broadly,
> consider multiple RTEs iff the user did use an alias) - Robert was
> skeptical, but didn't seem to have made his mind up. I still think I
> should cost things based on aliases, and consider multiple RTEs even
> when the user supplied an alias (the penalty should just be a distance
> of 1 and not 3, though, in light of other changes to the
> weighing/costing). If I don't hear anything in the next day or two,
> I'll more or less preserve aliases-related aspects of the patch.

Basically, the case in which I think it's helpful to issue a
suggestion here is when the user has used the table name rather than
the alias name.  I wonder if it's worth checking for that case
specifically, in lieu of what you've done here, and issuing a totally
different hint in that case ("HINT: You must refer to this as column
as "prime_minister.id" rather than "cameron.id").

Another idea, which I think I like less well, is to check the
Levenshtein distance between the allowed alias and the entered alias
and, if that's within the half-the-shorter-length threshold, consider
possible matches from that RTE, charge the distance between the
correct alias and the entered alias as a penalty to each potential
column match.

What I think won't do is to look at a situation where the user has
entered automobile.id and suggest that maybe they meant student.iq, or
even student.id. The amount of difference between the names has got to
matter for the RTE names, just as it does for the column names.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Emre Hasegeli
Date:
Subject: Re: Selectivity estimation for inet operators
Next
From: Stephen Frost
Date:
Subject: Re: superuser() shortcuts