Re: [PATCH] Add hint for misspelled relations - Mailing list pgsql-hackers

From Kirill Reshke
Subject Re: [PATCH] Add hint for misspelled relations
Date
Msg-id CALdSSPgcJ=zueMiR3su7Dh4YE4TSY4dG_F8FDszbbZC2XqPeUg@mail.gmail.com
Whole thread Raw
In response to [PATCH] Add hint for misspelled relations  (Steve Chavez <steve@supabase.io>)
List pgsql-hackers
On Tue, 2 Dec 2025 at 07:46, Steve Chavez <steve@supabase.io> wrote:
>
> Hello hackers,
>
> Currently misspelled columns offer a hint but not misspelled relations.
>
> This patch enables that, the result is like:
>
> -- having this table
> create table clients (id int);
> -- we misspell the table name
> select * from cliants;
> ERROR:  relation "cliants" does not exist
> LINE 1: select * from cliants;
> HINT:  Perhaps you meant to reference the table "public.clients".
>
> The possible matches are searched in pg_class for the schemas present in search_path (or if the relation is
qualified,it only searches matches in that schema). The logic reuses the `varstr_levenshtein_less_equal` function
similarto how the column matching is done. 
>
> If there's a tie in the fuzzy match, it's solved by preferring the schema that appears first on the search path. If
thatfails, then the lexicographic order is used to break the tie. 
>
> One problem is that scanning all pg_class entries can get expensive on big catalogs, so the number of searches is
cappedby MAX_REL_HINT_CANDIDATES. I've set this to 4096 arbitrarily, any guidance on what would be a good number is
appreciated.Personally I've seen  a catalog that contains 125K tables, with mostly auto generated names. For these
casesI don't think the hint helps that much anyway, so it seemed fine to bail here. 
>
> The changes are split into two commits, one refactoring some reusable functions for easier review and another one
implementingthe relation hint. 
>
> Any feedback is welcomed.
>
> Best regards,
> Steve Chavez
>
> [1]:

Hi! I did not reviewed this patch closely, but I stopped a this:

> + reldistance = varstr_levenshtein_less_equal(target->relname, strlen(target->relname),
> + candidate, candidatelen,
> + 1, 1, 1,
> + Min(fuzzystate->distance + 1, MAX_FUZZY_DISTANCE + 1),
> + true);
> +
> + /* The above can return MAX_FUZZY_DISTANCE + 1 results, skip these */
> + if (reldistance > MAX_FUZZY_DISTANCE)
> + continue;

Why do we even do this? Can't we just pass fuzzystate->distance to
varstr_levenshtein_less_equal? It is initialized in outer func to
MAX_FUZZY_DISTANCE + 1

--
Best regards,
Kirill Reshke



pgsql-hackers by date:

Previous
From: Corey Huinker
Date:
Subject: Re: Is there value in having optimizer stats for joins/foreignkeys?
Next
From: Chengpeng Yan
Date:
Subject: Add a greedy join search algorithm to handle large join problems