Thread: Fwd: Re: [DOCS] Document Upper Limit for NAMEDATELEN in pgsql 9.5+
[ redirecting from -docs, which isn't the best place to discuss code fixes ] Whoever thought this was a good idea: StaticAssertStmt(NAMEDATALEN <= MAX_LEVENSHTEIN_STRLEN, "Levenshtein hinting mechanism restricts NAMEDATALEN"); is nuts. A minimal fix that would not put stumbling blocks in the way of changes to NAMEDATALEN is to redefine MAX_LEVENSHTEIN_STRLEN as Max(255, NAMEDATALEN). But I wonder why we have to have an arbitrary limit in this code at all. I trust nobody here believes this is the only O(N^2) algorithm in the backend; the others don't have this sort of thing in front of them. regards, tom lane ------- Forwarded Message Date: Fri, 08 Jan 2016 14:49:22 -0500 From: Tom Lane <tgl@sss.pgh.pa.us> To: Kevin Day <thekevinday@gmail.com> cc: pgsql-docs@postgresql.org Subject: Re: [DOCS] Document Upper Limit for NAMEDATELEN in pgsql 9.5+ Kevin Day <thekevinday@gmail.com> writes: > Postgresql 9.5+ may now fail to compile if NAMEDATELEN has been set > absurdly large (In my case, 384). > The file src/backend/utils/adt/levenshtein.c does a static assert on > "NAMEDATALEN <= MAX_LEVENSHTEIN_STRLEN" with MAX_LEVENSHTEIN_STRLEN > currently set to 255. Hmm. I'm not sure whether 384 is "absurdly large", but I do wonder why the levenshtein code gets to dictate limits on NAMEDATALEN at all. Or to put it even more bluntly, I'm not sure that there is anything whatever about MAX_LEVENSHTEIN_STRLEN that is well thought out. Why not rip it out and put some CHECK_FOR_INTERRUPTS tests into the loops instead? regards, tom lane -- Sent via pgsql-docs mailing list (pgsql-docs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-docs ------- End of Forwarded Message
On Fri, Jan 8, 2016 at 12:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Whoever thought this was a good idea: > > StaticAssertStmt(NAMEDATALEN <= MAX_LEVENSHTEIN_STRLEN, > "Levenshtein hinting mechanism restricts NAMEDATALEN"); > > is nuts. Then I must be nuts. > A minimal fix that would not put stumbling blocks in the way of changes > to NAMEDATALEN is to redefine MAX_LEVENSHTEIN_STRLEN as > Max(255, NAMEDATALEN). But I wonder why we have to have an arbitrary limit > in this code at all. I trust nobody here believes this is the only O(N^2) > algorithm in the backend; the others don't have this sort of thing in > front of them. The default NAMEDATALEN is of course 64. I didn't feel that the static assertion was especially restrictive, given that it still leaves significant headroom. I'm slightly surprised that this cropped up. I didn't want to presume that the algorithm being O(N^2) was the only reason for the restriction. This comment seems pretty scary to me: /* * For security concerns, restrict excessive CPU+RAM usage. (This * implementation uses O(m) memory and has O(mn)complexity.) */ if (m > MAX_LEVENSHTEIN_STRLEN || n > MAX_LEVENSHTEIN_STRLEN) ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg("argument exceeds the maximum length of %d bytes", MAX_LEVENSHTEIN_STRLEN))); -- Peter Geoghegan
On Fri, Jan 8, 2016 at 3:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > [ redirecting from -docs, which isn't the best place to discuss code fixes ] > > Whoever thought this was a good idea: > > StaticAssertStmt(NAMEDATALEN <= MAX_LEVENSHTEIN_STRLEN, > "Levenshtein hinting mechanism restricts NAMEDATALEN"); > > is nuts. > > A minimal fix that would not put stumbling blocks in the way of changes > to NAMEDATALEN is to redefine MAX_LEVENSHTEIN_STRLEN as > Max(255, NAMEDATALEN). But I wonder why we have to have an arbitrary limit > in this code at all. I trust nobody here believes this is the only O(N^2) > algorithm in the backend; the others don't have this sort of thing in > front of them. The arbitrary limit in that code is not new, but the Assert is. I didn't foresee the consequence that it would become harder to change NAMEDATALEN, and I agree that needs to be fixed. Ripping the arbitrary limit out does not seem like a particularly good idea to me, even if we put CHECK_FOR_INTERRUPTS() into that path. Just because there are other ways to make the backend burn insane amounts of CPU time doesn't mean that we should try to make it easier. Actually, though, varstr_levenshtein_less_equal() never gets called from parse_relation.c with a distance bound of more than four, so it can't actually go quadratic. There's another call in that file to varstr_levenshtein(), which in retrospect looks silly, but I'm pretty sure that could also be changed to use a distance bound of 4 (well, MAX_FUZZY_DISTNACE + 1) without changing the behavior at all. Given a constant distance bound, the algorithm is, I believe, only O(max(m, n)) not O(mn). Knowing that, we could let the internal code just bypass the length check and only enforce the length restriction when the code is called from SQL via fuzzystrmatch. (There are of course other ways to solve the problem, too; this is just one that came to mind.) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > Actually, though, varstr_levenshtein_less_equal() never gets called > from parse_relation.c with a distance bound of more than four, so it > can't actually go quadratic. There's another call in that file to > varstr_levenshtein(), which in retrospect looks silly, but I'm pretty > sure that could also be changed to use a distance bound of 4 (well, > MAX_FUZZY_DISTNACE + 1) without changing the behavior at all. Given a > constant distance bound, the algorithm is, I believe, only O(max(m, > n)) not O(mn). Knowing that, we could let the internal code just > bypass the length check and only enforce the length restriction when > the code is called from SQL via fuzzystrmatch. Done that way; thanks for the advice. regards, tom lane