Re: Fwd: Re: [DOCS] Document Upper Limit for NAMEDATELEN in pgsql 9.5+ - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Fwd: Re: [DOCS] Document Upper Limit for NAMEDATELEN in pgsql 9.5+
Date
Msg-id CA+TgmobkfHmkO1f9A=nGqDMmhe1tnp=nKf6wuPFcEJPyAJ8kkg@mail.gmail.com
Whole thread Raw
In response to Fwd: Re: [DOCS] Document Upper Limit for NAMEDATELEN in pgsql 9.5+  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Fwd: Re: [DOCS] Document Upper Limit for NAMEDATELEN in pgsql 9.5+  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Fri, Jan 8, 2016 at 3:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> [ redirecting from -docs, which isn't the best place to discuss code fixes ]
>
> Whoever thought this was a good idea:
>
>     StaticAssertStmt(NAMEDATALEN <= MAX_LEVENSHTEIN_STRLEN,
>                      "Levenshtein hinting mechanism restricts NAMEDATALEN");
>
> is nuts.
>
> A minimal fix that would not put stumbling blocks in the way of changes
> to NAMEDATALEN is to redefine MAX_LEVENSHTEIN_STRLEN as
> Max(255, NAMEDATALEN).  But I wonder why we have to have an arbitrary limit
> in this code at all.  I trust nobody here believes this is the only O(N^2)
> algorithm in the backend; the others don't have this sort of thing in
> front of them.

The arbitrary limit in that code is not new, but the Assert is.  I
didn't foresee the consequence that it would become harder to change
NAMEDATALEN, and I agree that needs to be fixed.  Ripping the
arbitrary limit out does not seem like a particularly good idea to me,
even if we put CHECK_FOR_INTERRUPTS() into that path.  Just because
there are other ways to make the backend burn insane amounts of CPU
time doesn't mean that we should try to make it easier.

Actually, though, varstr_levenshtein_less_equal() never gets called
from parse_relation.c with a distance bound of more than four, so it
can't actually go quadratic.  There's another call in that file to
varstr_levenshtein(), which in retrospect looks silly, but I'm pretty
sure that could also be changed to use a distance bound of 4 (well,
MAX_FUZZY_DISTNACE + 1) without changing the behavior at all.  Given a
constant distance bound, the algorithm is, I believe, only O(max(m,
n)) not O(mn).  Knowing that, we could let the internal code just
bypass the length check and only enforce the length restriction when
the code is called from SQL via fuzzystrmatch.

(There are of course other ways to solve the problem, too; this is
just one that came to mind.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: [COMMITTERS] pgsql: Avoid pin scan for replay of XLOG_BTREE_VACUUM
Next
From: Robert Haas
Date:
Subject: Re: ExecGather() + nworkers