Re: Doing better at HINTing an appropriate column within errorMissingColumn() - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Doing better at HINTing an appropriate column within errorMissingColumn()
Date
Msg-id CAM3SWZTnFBosC6g2sqBy-XVwJrz0JbZrmhfjDT9nU+Bk9KKXsA@mail.gmail.com
Whole thread Raw
In response to Re: Doing better at HINTing an appropriate column within errorMissingColumn()  (Michael Paquier <michael.paquier@gmail.com>)
Responses Re: Doing better at HINTing an appropriate column within errorMissingColumn()  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Re: Doing better at HINTing an appropriate column within errorMissingColumn()  (Michael Paquier <michael.paquier@gmail.com>)
List pgsql-hackers
Hi,

On Tue, Jul 8, 2014 at 6:58 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> 2) Checking process goes through all the existing columns of a
> relation even a difference of 1 with some other column(s) has already
> been found. As we try to limit the number of hints returned, this
> seems like a waste of resources.

In general it's possible that an exact match will later be found
within the RTE, and exact matches don't have to pay the "wrong alias"
penalty, and are immediately returned. It is therefore not a waste of
resources, but even if it was that would be pretty inconsequential as
your benchmark shows.

> 3) distanceName could be improved, by for example having some checks
> on the string lengths of target and source columns, and immediately
> reject the match if for example the length of the source string is the
> double/half of the length of target.

I don't think it's a good idea to tie distanceName() to the ultimate
behavior of errorMissingColumn() hinting, since there may be other
callers in the future. Besides, that isn't going to help much.

> 4) This is not nice, could it be possible to remove the stuff from varlena.c?
> +/* Expand each Levenshtein distance variant */
> +#include "levenshtein.c"
> +#define LEVENSHTEIN_LESS_EQUAL
> +#include "levenshtein.c"
> +#undef LEVENSHTEIN_LESS_EQUAL
> Part of the same comment: only varstr_leven_less_equal is used to
> calculate the distance, should we really move varstr_leven to core?
> This clearly needs to be reworked as not just a copy-paste of the
> things in fuzzystrmatch.
> The flag LEVENSHTEIN_LESS_EQUAL should be let within fuzzystrmatch I think.

So there'd be one variant within core and one within
contrib/fuzzystrmatch? I don't think that's an improvement.

> 5) Do we want hints on system columns as well?

I think it's obvious that the answer must be no. That's going to
frequently result in suggestions of columns that users will complain
aren't even there. If you know about the system columns, you can just
get it right. They're supposed to be hidden for most purposes.

> 6) Sometimes no hints are returned... Even in simple cases like this one:
> =# create table foo (aa int, bb int);
> CREATE TABLE
> =# select ab from foo;
> ERROR:  42703: column "ab" does not exist
> LINE 1: select ab from foo;
>                ^
> LOCATION:  errorMissingColumn, parse_relation.c:3123

That's because those two candidates come from a single RTE and have an
equal distance -- you'd see both suggestions if you joined two tables
with each candidate, assuming that each table being joined didn't
individually have the same issue. I think that that's probably
considered the correct behavior by most.

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Piotr Stefaniak
Date:
Subject: Re: How to use Makefile - pgxs without gcc -O2 ?
Next
From: Robert Haas
Date:
Subject: Re: tweaking NTUP_PER_BUCKET