Re: Doing better at HINTing an appropriate column within errorMissingColumn() - Mailing list pgsql-hackers

From Oleg Bartunov
Subject Re: Doing better at HINTing an appropriate column within errorMissingColumn()
Date
Msg-id CAF4Au4w2f3_kRt50yTzaBNLikxFRmNkrCV78-qtdUe7wox-hxQ@mail.gmail.com
Whole thread Raw
In response to Doing better at HINTing an appropriate column within errorMissingColumn()  (Peter Geoghegan <pg@heroku.com>)
List pgsql-hackers
<p dir="ltr">Very interesting idea,  I'd think about optionally add similarity hinting support to psql tab.  With, 
say, 80% of similarity matching,  it shouldn't be  very annoying.  For interactive usage there is no risk of slowdown.
<divclass="gmail_quote">On Mar 27, 2014 11:11 PM, "Peter Geoghegan" <<a
href="mailto:pg@heroku.com">pg@heroku.com</a>>wrote:<br type="attribution" /><blockquote class="gmail_quote"
style="margin:00 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> With the addition of LATERAL subqueries, Tom
fixedup the mechanism<br /> for keeping track of which relations are visible for column references<br /> while the FROM
clauseis being scanned. That allowed<br /> errorMissingColumn() to give a more useful error to the one produced<br />
bythe prior coding of that mechanism, with an errhint sometimes<br /> proffering: 'There is a column named "foo" in
table"bar", but it<br /> cannot be referenced from this part of the query'.<br /><br /> I wondered how much further
thiscould be taken. Attached patch<br /> modifies contrib/fuzzystrmatch, moving its Levenshtein distance code<br />
intocore without actually moving the relevant SQL functions too. That<br /> change allowed me to modify
errorMissingColumn()to make more useful<br /> suggestions as to what might have been intended under other<br />
circumstances,like when someone fat-fingers a column name. psql tab<br /> completion is good, but not so good that this
doesn'thappen all the<br /> time. It's good practice to consistently name columns and tables such<br /> that it's
possibleto intuit the names of columns from the names of<br /> tables and so on, but it's still pretty common to forget
ifa column<br /> name from the table "orders" is "order_id", "orderid", or "ordersid",<br /> particularly if you're
someonewho regularly interacts with many<br /> databases. This problem is annoying in a low intensity kind of way.<br
/><br/> Consider the following sample sessions of mine, made with the<br /> dellstore2 sample database:<br /><br />
[local]/postgres=#select * from orders o join orderlines ol on<br /> o.orderid = ol.orderids limit 1;<br /> ERROR:
 42703:column ol.orderids does not exist<br /> LINE 1: ...* from orders o join orderlines ol on o.orderid =
ol.orderid...<br/>                                                              ^<br /> HINT:  Perhaps you meant to
referencethe column "ol"."orderid".<br /> LOCATION:  errorMissingColumn, parse_relation.c:2989<br /> [local]/postgres=#
select* from orders o join orderlines ol on<br /> o.orderid = ol.orderid limit 1;<br />  orderid | orderdate  |
customerid| netamount |  tax  | totalamount |<br /> orderlineid | orderid | prod_id | quantity | orderdate<br />
---------+------------+------------+-----------+-------+-------------+-------------+---------+---------+----------+------------<br
/>       1 | 2004-01-27 |       7888 |    313.24 | 25.84 |      339.08 |<br />           1 |       1 |    9117 |      
 1| 2004-01-27<br /> (1 row)<br /><br /> [local]/postgres=# select ordersid from orders o join orderlines ol on<br />
o.orderid= ol.orderid limit 1;<br /> ERROR:  42703: column "ordersid" does not exist<br /> LINE 1: select ordersid from
orderso join orderlines ol on o.orderi...<br />                ^<br /> HINT:  Perhaps you meant to reference the column
"o"."orderid".<br/> LOCATION:  errorMissingColumn, parse_relation.c:2999<br /> [local]/postgres=# select ol.ordersid
fromorders o join orderlines ol<br /> on o.orderid = ol.orderid limit 1;<br /> ERROR:  42703: column ol.ordersid does
notexist<br /> LINE 1: select ol.ordersid from orders o join orderlines ol on o.ord...<br />                ^<br />
HINT: Perhaps you meant to reference the column "ol"."orderid".<br /> LOCATION:  errorMissingColumn,
parse_relation.c:2989<br/><br /> We try to give the most useful possible HINT here, charging extra for<br /> a
non-matchingalias, and going through the range table in order and<br /> preferring the first column observed to any
subsequentcolumn whose<br /> name is of the same distance as an earlier Var. The fuzzy string<br /> matching works well
enoughthat it seems possible in practice to<br /> successfully have the parser make the right suggestion, even when
the<br/> user's original guess was fairly far off. I've found it works best to<br /> charge half as much for a
characterdeletion, so that's what is<br /> charged.<br /><br /> I have some outstanding concerns about the proposed
patch:<br/><br /> * It may be the case that dense logosyllabic or morphographic writing<br /> systems, for example
Kanjimight consistently present, say, Japanese<br /> users with a suggestion that just isn't very useful, to the point
of<br/> being annoying. Perhaps some Japanese hackers can comment on the<br /> actual risks here.<br /><br /> * Perhaps
Ishould have moved the Levenshtein distance functions into<br /> core and be done with it. I thought that given the
presentrestriction<br /> that the implementation imposes on source and target string lengths,<br /> it would be best to
leavethe user-facing SQL functions in contrib.<br /> That restriction is not relevant to the internal use of
Levenshtein<br/> distance added here, though.<br /><br /> Thoughts?<br /> --<br /> Peter Geoghegan<br /><br /><br />
--<br/> Sent via pgsql-hackers mailing list (<a
href="mailto:pgsql-hackers@postgresql.org">pgsql-hackers@postgresql.org</a>)<br/> To make changes to your
subscription:<br/><a href="http://www.postgresql.org/mailpref/pgsql-hackers"
target="_blank">http://www.postgresql.org/mailpref/pgsql-hackers</a><br/><br /></blockquote></div> 

pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: trgm regex index peculiarity
Next
From: Albe Laurenz
Date:
Subject: Re: Doing better at HINTing an appropriate column within errorMissingColumn()