Re: Doing better at HINTing an appropriate column within errorMissingColumn() - Mailing list pgsql-hackers
From | Oleg Bartunov |
---|---|
Subject | Re: Doing better at HINTing an appropriate column within errorMissingColumn() |
Date | |
Msg-id | CAF4Au4w2f3_kRt50yTzaBNLikxFRmNkrCV78-qtdUe7wox-hxQ@mail.gmail.com Whole thread Raw |
In response to | Doing better at HINTing an appropriate column within errorMissingColumn() (Peter Geoghegan <pg@heroku.com>) |
List | pgsql-hackers |
<p dir="ltr">Very interesting idea, I'd think about optionally add similarity hinting support to psql tab. With, say, 80% of similarity matching, it shouldn't be very annoying. For interactive usage there is no risk of slowdown. <divclass="gmail_quote">On Mar 27, 2014 11:11 PM, "Peter Geoghegan" <<a href="mailto:pg@heroku.com">pg@heroku.com</a>>wrote:<br type="attribution" /><blockquote class="gmail_quote" style="margin:00 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> With the addition of LATERAL subqueries, Tom fixedup the mechanism<br /> for keeping track of which relations are visible for column references<br /> while the FROM clauseis being scanned. That allowed<br /> errorMissingColumn() to give a more useful error to the one produced<br /> bythe prior coding of that mechanism, with an errhint sometimes<br /> proffering: 'There is a column named "foo" in table"bar", but it<br /> cannot be referenced from this part of the query'.<br /><br /> I wondered how much further thiscould be taken. Attached patch<br /> modifies contrib/fuzzystrmatch, moving its Levenshtein distance code<br /> intocore without actually moving the relevant SQL functions too. That<br /> change allowed me to modify errorMissingColumn()to make more useful<br /> suggestions as to what might have been intended under other<br /> circumstances,like when someone fat-fingers a column name. psql tab<br /> completion is good, but not so good that this doesn'thappen all the<br /> time. It's good practice to consistently name columns and tables such<br /> that it's possibleto intuit the names of columns from the names of<br /> tables and so on, but it's still pretty common to forget ifa column<br /> name from the table "orders" is "order_id", "orderid", or "ordersid",<br /> particularly if you're someonewho regularly interacts with many<br /> databases. This problem is annoying in a low intensity kind of way.<br /><br/> Consider the following sample sessions of mine, made with the<br /> dellstore2 sample database:<br /><br /> [local]/postgres=#select * from orders o join orderlines ol on<br /> o.orderid = ol.orderids limit 1;<br /> ERROR: 42703:column ol.orderids does not exist<br /> LINE 1: ...* from orders o join orderlines ol on o.orderid = ol.orderid...<br/> ^<br /> HINT: Perhaps you meant to referencethe column "ol"."orderid".<br /> LOCATION: errorMissingColumn, parse_relation.c:2989<br /> [local]/postgres=# select* from orders o join orderlines ol on<br /> o.orderid = ol.orderid limit 1;<br /> orderid | orderdate | customerid| netamount | tax | totalamount |<br /> orderlineid | orderid | prod_id | quantity | orderdate<br /> ---------+------------+------------+-----------+-------+-------------+-------------+---------+---------+----------+------------<br /> 1 | 2004-01-27 | 7888 | 313.24 | 25.84 | 339.08 |<br /> 1 | 1 | 9117 | 1| 2004-01-27<br /> (1 row)<br /><br /> [local]/postgres=# select ordersid from orders o join orderlines ol on<br /> o.orderid= ol.orderid limit 1;<br /> ERROR: 42703: column "ordersid" does not exist<br /> LINE 1: select ordersid from orderso join orderlines ol on o.orderi...<br /> ^<br /> HINT: Perhaps you meant to reference the column "o"."orderid".<br/> LOCATION: errorMissingColumn, parse_relation.c:2999<br /> [local]/postgres=# select ol.ordersid fromorders o join orderlines ol<br /> on o.orderid = ol.orderid limit 1;<br /> ERROR: 42703: column ol.ordersid does notexist<br /> LINE 1: select ol.ordersid from orders o join orderlines ol on o.ord...<br /> ^<br /> HINT: Perhaps you meant to reference the column "ol"."orderid".<br /> LOCATION: errorMissingColumn, parse_relation.c:2989<br/><br /> We try to give the most useful possible HINT here, charging extra for<br /> a non-matchingalias, and going through the range table in order and<br /> preferring the first column observed to any subsequentcolumn whose<br /> name is of the same distance as an earlier Var. The fuzzy string<br /> matching works well enoughthat it seems possible in practice to<br /> successfully have the parser make the right suggestion, even when the<br/> user's original guess was fairly far off. I've found it works best to<br /> charge half as much for a characterdeletion, so that's what is<br /> charged.<br /><br /> I have some outstanding concerns about the proposed patch:<br/><br /> * It may be the case that dense logosyllabic or morphographic writing<br /> systems, for example Kanjimight consistently present, say, Japanese<br /> users with a suggestion that just isn't very useful, to the point of<br/> being annoying. Perhaps some Japanese hackers can comment on the<br /> actual risks here.<br /><br /> * Perhaps Ishould have moved the Levenshtein distance functions into<br /> core and be done with it. I thought that given the presentrestriction<br /> that the implementation imposes on source and target string lengths,<br /> it would be best to leavethe user-facing SQL functions in contrib.<br /> That restriction is not relevant to the internal use of Levenshtein<br/> distance added here, though.<br /><br /> Thoughts?<br /> --<br /> Peter Geoghegan<br /><br /><br /> --<br/> Sent via pgsql-hackers mailing list (<a href="mailto:pgsql-hackers@postgresql.org">pgsql-hackers@postgresql.org</a>)<br/> To make changes to your subscription:<br/><a href="http://www.postgresql.org/mailpref/pgsql-hackers" target="_blank">http://www.postgresql.org/mailpref/pgsql-hackers</a><br/><br /></blockquote></div>
pgsql-hackers by date: