Thread: Fwd: Re: [DOCS] Document Upper Limit for NAMEDATELEN in pgsql 9.5+

Fwd: Re: [DOCS] Document Upper Limit for NAMEDATELEN in pgsql 9.5+

From

Tom Lane

Date:

08 January 2016, 23:05:14

[ redirecting from -docs, which isn't the best place to discuss code fixes ]

Whoever thought this was a good idea:
   StaticAssertStmt(NAMEDATALEN <= MAX_LEVENSHTEIN_STRLEN,                    "Levenshtein hinting mechanism restricts
NAMEDATALEN");

is nuts.

A minimal fix that would not put stumbling blocks in the way of changes
to NAMEDATALEN is to redefine MAX_LEVENSHTEIN_STRLEN as
Max(255, NAMEDATALEN).  But I wonder why we have to have an arbitrary limit
in this code at all.  I trust nobody here believes this is the only O(N^2)
algorithm in the backend; the others don't have this sort of thing in
front of them.
        regards, tom lane


------- Forwarded Message

Date:    Fri, 08 Jan 2016 14:49:22 -0500
From:    Tom Lane <tgl@sss.pgh.pa.us>
To:      Kevin Day <thekevinday@gmail.com>
cc:      pgsql-docs@postgresql.org
Subject: Re: [DOCS] Document Upper Limit for NAMEDATELEN in pgsql 9.5+

Kevin Day <thekevinday@gmail.com> writes:
> Postgresql 9.5+ may now fail to compile if NAMEDATELEN has been set
> absurdly large (In my case, 384).
> The file src/backend/utils/adt/levenshtein.c does a static assert on
> "NAMEDATALEN <= MAX_LEVENSHTEIN_STRLEN" with MAX_LEVENSHTEIN_STRLEN
> currently set to 255.

Hmm.  I'm not sure whether 384 is "absurdly large", but I do wonder
why the levenshtein code gets to dictate limits on NAMEDATALEN at all.

Or to put it even more bluntly, I'm not sure that there is anything
whatever about MAX_LEVENSHTEIN_STRLEN that is well thought out.  Why not
rip it out and put some CHECK_FOR_INTERRUPTS tests into the loops instead?
        regards, tom lane


-- 
Sent via pgsql-docs mailing list (pgsql-docs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-docs

------- End of Forwarded Message

Re: Fwd: Re: [DOCS] Document Upper Limit for NAMEDATELEN in pgsql 9.5+

From

Peter Geoghegan

Date:

09 January 2016, 01:14:27

On Fri, Jan 8, 2016 at 12:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Whoever thought this was a good idea:
>
>     StaticAssertStmt(NAMEDATALEN <= MAX_LEVENSHTEIN_STRLEN,
>                      "Levenshtein hinting mechanism restricts NAMEDATALEN");
>
> is nuts.

Then I must be nuts.

> A minimal fix that would not put stumbling blocks in the way of changes
> to NAMEDATALEN is to redefine MAX_LEVENSHTEIN_STRLEN as
> Max(255, NAMEDATALEN).  But I wonder why we have to have an arbitrary limit
> in this code at all.  I trust nobody here believes this is the only O(N^2)
> algorithm in the backend; the others don't have this sort of thing in
> front of them.

The default NAMEDATALEN is of course 64. I didn't feel that the static
assertion was especially restrictive, given that it still leaves
significant headroom. I'm slightly surprised that this cropped up. I
didn't want to presume that the algorithm being O(N^2) was the only
reason for the restriction. This comment seems pretty scary to me:
   /*    * For security concerns, restrict excessive CPU+RAM usage. (This    * implementation uses O(m) memory and has
O(mn)complexity.)    */   if (m > MAX_LEVENSHTEIN_STRLEN ||       n > MAX_LEVENSHTEIN_STRLEN)       ereport(ERROR,
        (errcode(ERRCODE_INVALID_PARAMETER_VALUE),                errmsg("argument exceeds the maximum length of %d
bytes",                      MAX_LEVENSHTEIN_STRLEN)));
 

-- 
Peter Geoghegan

Re: Fwd: Re: [DOCS] Document Upper Limit for NAMEDATELEN in pgsql 9.5+

From

Robert Haas

Date:

10 January 2016, 19:48:39

On Fri, Jan 8, 2016 at 3:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> [ redirecting from -docs, which isn't the best place to discuss code fixes ]
>
> Whoever thought this was a good idea:
>
>     StaticAssertStmt(NAMEDATALEN <= MAX_LEVENSHTEIN_STRLEN,
>                      "Levenshtein hinting mechanism restricts NAMEDATALEN");
>
> is nuts.
>
> A minimal fix that would not put stumbling blocks in the way of changes
> to NAMEDATALEN is to redefine MAX_LEVENSHTEIN_STRLEN as
> Max(255, NAMEDATALEN).  But I wonder why we have to have an arbitrary limit
> in this code at all.  I trust nobody here believes this is the only O(N^2)
> algorithm in the backend; the others don't have this sort of thing in
> front of them.

The arbitrary limit in that code is not new, but the Assert is.  I
didn't foresee the consequence that it would become harder to change
NAMEDATALEN, and I agree that needs to be fixed.  Ripping the
arbitrary limit out does not seem like a particularly good idea to me,
even if we put CHECK_FOR_INTERRUPTS() into that path.  Just because
there are other ways to make the backend burn insane amounts of CPU
time doesn't mean that we should try to make it easier.

Actually, though, varstr_levenshtein_less_equal() never gets called
from parse_relation.c with a distance bound of more than four, so it
can't actually go quadratic.  There's another call in that file to
varstr_levenshtein(), which in retrospect looks silly, but I'm pretty
sure that could also be changed to use a distance bound of 4 (well,
MAX_FUZZY_DISTNACE + 1) without changing the behavior at all.  Given a
constant distance bound, the algorithm is, I believe, only O(max(m,
n)) not O(mn).  Knowing that, we could let the internal code just
bypass the length check and only enforce the length restriction when
the code is called from SQL via fuzzystrmatch.

(There are of course other ways to solve the problem, too; this is
just one that came to mind.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Fwd: Re: [DOCS] Document Upper Limit for NAMEDATELEN in pgsql 9.5+

From

Tom Lane

Date:

22 January 2016, 19:54:05

Robert Haas <robertmhaas@gmail.com> writes:
> Actually, though, varstr_levenshtein_less_equal() never gets called
> from parse_relation.c with a distance bound of more than four, so it
> can't actually go quadratic.  There's another call in that file to
> varstr_levenshtein(), which in retrospect looks silly, but I'm pretty
> sure that could also be changed to use a distance bound of 4 (well,
> MAX_FUZZY_DISTNACE + 1) without changing the behavior at all.  Given a
> constant distance bound, the algorithm is, I believe, only O(max(m,
> n)) not O(mn).  Knowing that, we could let the internal code just
> bypass the length check and only enforce the length restriction when
> the code is called from SQL via fuzzystrmatch.

Done that way; thanks for the advice.
        regards, tom lane