Doing better at HINTing an appropriate column within errorMissingColumn() - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Doing better at HINTing an appropriate column within errorMissingColumn() |
Date | |
Msg-id | CAM3SWZS9-Xr2Ud_j9yrKDctT6xxy16h1EugtSWmLU6Or4CtGAA@mail.gmail.com Whole thread Raw |
Responses |
Re: Doing better at HINTing an appropriate column within errorMissingColumn()
Re: Doing better at HINTing an appropriate column within errorMissingColumn() Re: Doing better at HINTing an appropriate column within errorMissingColumn() Re: Doing better at HINTing an appropriate column within errorMissingColumn() |
List | pgsql-hackers |
With the addition of LATERAL subqueries, Tom fixed up the mechanism for keeping track of which relations are visible for column references while the FROM clause is being scanned. That allowed errorMissingColumn() to give a more useful error to the one produced by the prior coding of that mechanism, with an errhint sometimes proffering: 'There is a column named "foo" in table "bar", but it cannot be referenced from this part of the query'. I wondered how much further this could be taken. Attached patch modifies contrib/fuzzystrmatch, moving its Levenshtein distance code into core without actually moving the relevant SQL functions too. That change allowed me to modify errorMissingColumn() to make more useful suggestions as to what might have been intended under other circumstances, like when someone fat-fingers a column name. psql tab completion is good, but not so good that this doesn't happen all the time. It's good practice to consistently name columns and tables such that it's possible to intuit the names of columns from the names of tables and so on, but it's still pretty common to forget if a column name from the table "orders" is "order_id", "orderid", or "ordersid", particularly if you're someone who regularly interacts with many databases. This problem is annoying in a low intensity kind of way. Consider the following sample sessions of mine, made with the dellstore2 sample database: [local]/postgres=# select * from orders o join orderlines ol on o.orderid = ol.orderids limit 1; ERROR: 42703: column ol.orderids does not exist LINE 1: ...* from orders o join orderlines ol on o.orderid = ol.orderid... ^ HINT: Perhaps you meant to reference the column "ol"."orderid". LOCATION: errorMissingColumn, parse_relation.c:2989 [local]/postgres=# select * from orders o join orderlines ol on o.orderid = ol.orderid limit 1; orderid | orderdate | customerid | netamount | tax | totalamount | orderlineid | orderid | prod_id | quantity | orderdate ---------+------------+------------+-----------+-------+-------------+-------------+---------+---------+----------+------------ 1 | 2004-01-27 | 7888 | 313.24 | 25.84 | 339.08 | 1 | 1 | 9117 | 1 | 2004-01-27 (1 row) [local]/postgres=# select ordersid from orders o join orderlines ol on o.orderid = ol.orderid limit 1; ERROR: 42703: column "ordersid" does not exist LINE 1: select ordersid from orders o join orderlines ol on o.orderi... ^ HINT: Perhaps you meant to reference the column "o"."orderid". LOCATION: errorMissingColumn, parse_relation.c:2999 [local]/postgres=# select ol.ordersid from orders o join orderlines ol on o.orderid = ol.orderid limit 1; ERROR: 42703: column ol.ordersid does not exist LINE 1: select ol.ordersid from orders o join orderlines ol on o.ord... ^ HINT: Perhaps you meant to reference the column "ol"."orderid". LOCATION: errorMissingColumn, parse_relation.c:2989 We try to give the most useful possible HINT here, charging extra for a non-matching alias, and going through the range table in order and preferring the first column observed to any subsequent column whose name is of the same distance as an earlier Var. The fuzzy string matching works well enough that it seems possible in practice to successfully have the parser make the right suggestion, even when the user's original guess was fairly far off. I've found it works best to charge half as much for a character deletion, so that's what is charged. I have some outstanding concerns about the proposed patch: * It may be the case that dense logosyllabic or morphographic writing systems, for example Kanji might consistently present, say, Japanese users with a suggestion that just isn't very useful, to the point of being annoying. Perhaps some Japanese hackers can comment on the actual risks here. * Perhaps I should have moved the Levenshtein distance functions into core and be done with it. I thought that given the present restriction that the implementation imposes on source and target string lengths, it would be best to leave the user-facing SQL functions in contrib. That restriction is not relevant to the internal use of Levenshtein distance added here, though. Thoughts? -- Peter Geoghegan
Attachment
pgsql-hackers by date: