Re: Doing better at HINTing an appropriate column within errorMissingColumn() - Mailing list pgsql-hackers
From | Michael Paquier |
---|---|
Subject | Re: Doing better at HINTing an appropriate column within errorMissingColumn() |
Date | |
Msg-id | CAB7nPqRnjgOnQ=-2R15XPudn51DFXZAyd8U+Sc-91-P=kxJQ3A@mail.gmail.com Whole thread Raw |
In response to | Doing better at HINTing an appropriate column within errorMissingColumn() (Peter Geoghegan <pg@heroku.com>) |
Responses |
Re: Doing better at HINTing an appropriate column within errorMissingColumn()
|
List | pgsql-hackers |
On Fri, Mar 28, 2014 at 4:10 AM, Peter Geoghegan <pg@heroku.com> wrote: > With the addition of LATERAL subqueries, Tom fixed up the mechanism > for keeping track of which relations are visible for column references > while the FROM clause is being scanned. That allowed > errorMissingColumn() to give a more useful error to the one produced > by the prior coding of that mechanism, with an errhint sometimes > proffering: 'There is a column named "foo" in table "bar", but it > cannot be referenced from this part of the query'. > > I wondered how much further this could be taken. Attached patch > modifies contrib/fuzzystrmatch, moving its Levenshtein distance code > into core without actually moving the relevant SQL functions too. That > change allowed me to modify errorMissingColumn() to make more useful > suggestions as to what might have been intended under other > circumstances, like when someone fat-fingers a column name. psql tab > completion is good, but not so good that this doesn't happen all the > time. It's good practice to consistently name columns and tables such > that it's possible to intuit the names of columns from the names of > tables and so on, but it's still pretty common to forget if a column > name from the table "orders" is "order_id", "orderid", or "ordersid", > particularly if you're someone who regularly interacts with many > databases. This problem is annoying in a low intensity kind of way. > > Consider the following sample sessions of mine, made with the > dellstore2 sample database: > > [local]/postgres=# select * from orders o join orderlines ol on > o.orderid = ol.orderids limit 1; > ERROR: 42703: column ol.orderids does not exist > LINE 1: ...* from orders o join orderlines ol on o.orderid = ol.orderid... > ^ > HINT: Perhaps you meant to reference the column "ol"."orderid". > LOCATION: errorMissingColumn, parse_relation.c:2989 > [local]/postgres=# select * from orders o join orderlines ol on > o.orderid = ol.orderid limit 1; > orderid | orderdate | customerid | netamount | tax | totalamount | > orderlineid | orderid | prod_id | quantity | orderdate > ---------+------------+------------+-----------+-------+-------------+-------------+---------+---------+----------+------------ > 1 | 2004-01-27 | 7888 | 313.24 | 25.84 | 339.08 | > 1 | 1 | 9117 | 1 | 2004-01-27 > (1 row) > > [local]/postgres=# select ordersid from orders o join orderlines ol on > o.orderid = ol.orderid limit 1; > ERROR: 42703: column "ordersid" does not exist > LINE 1: select ordersid from orders o join orderlines ol on o.orderi... > ^ > HINT: Perhaps you meant to reference the column "o"."orderid". > LOCATION: errorMissingColumn, parse_relation.c:2999 > [local]/postgres=# select ol.ordersid from orders o join orderlines ol > on o.orderid = ol.orderid limit 1; > ERROR: 42703: column ol.ordersid does not exist > LINE 1: select ol.ordersid from orders o join orderlines ol on o.ord... > ^ > HINT: Perhaps you meant to reference the column "ol"."orderid". > LOCATION: errorMissingColumn, parse_relation.c:2989 > > We try to give the most useful possible HINT here, charging extra for > a non-matching alias, and going through the range table in order and > preferring the first column observed to any subsequent column whose > name is of the same distance as an earlier Var. The fuzzy string > matching works well enough that it seems possible in practice to > successfully have the parser make the right suggestion, even when the > user's original guess was fairly far off. I've found it works best to > charge half as much for a character deletion, so that's what is > charged. What about the overhead that this processing creates if error processing needs to scan a schema with let's say hundreds of tables? > * It may be the case that dense logosyllabic or morphographic writing > systems, for example Kanji might consistently present, say, Japanese > users with a suggestion that just isn't very useful, to the point of > being annoying. Perhaps some Japanese hackers can comment on the > actual risks here. As long as Hiragana-only words (basic alphabet for Japanese words), and more particularly Katakana only-words (to write phonetically foreign words) are compared (even Kanji-only things compared), Levenstein could play its role pretty well. But once a comparison is made with two words using different alphabet, well Levenstein is not going to work well. A simple example is 'ramen' (Japanese noodles), that you can find written sometimes in Hiragana, or even in Katakana, and here Levenstein performs poorly: =# select levenshtein('ラーメン', 'らあめん');levenshtein ------------- 4 (1 row) Regards, -- Michael
pgsql-hackers by date: