Re: Stack overflow issue - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Stack overflow issue
Date
Msg-id 3661156.1661871758@sss.pgh.pa.us
Whole thread Raw
In response to Re: Stack overflow issue  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Stack overflow issue
List pgsql-hackers
I wrote:
>> I think most likely we should report this to Snowball upstream
>> and see what they think is an appropriate fix.

> Done at [1], and I pushed the other fixes.  Thanks again for the report!

The upstream recommendation, which seems pretty sane to me, is to
simply reject any string exceeding some threshold length as not
possibly being a word.  Apparently it's common to use thresholds
as small as 64 bytes, but in the attached I used 1000 bytes.

            regards, tom lane

diff --git a/src/backend/snowball/dict_snowball.c b/src/backend/snowball/dict_snowball.c
index 68c9213f69..aaf4ff72b6 100644
--- a/src/backend/snowball/dict_snowball.c
+++ b/src/backend/snowball/dict_snowball.c
@@ -272,11 +272,25 @@ dsnowball_lexize(PG_FUNCTION_ARGS)
     DictSnowball *d = (DictSnowball *) PG_GETARG_POINTER(0);
     char       *in = (char *) PG_GETARG_POINTER(1);
     int32        len = PG_GETARG_INT32(2);
-    char       *txt = lowerstr_with_len(in, len);
     TSLexeme   *res = palloc0(sizeof(TSLexeme) * 2);
+    char       *txt;

+    /*
+     * Reject strings exceeding 1000 bytes, as they're surely not words in any
+     * human language.  This restriction avoids wasting cycles on stuff like
+     * base64-encoded data, and it protects us against possible inefficiency
+     * or misbehavior in the stemmers (for example, the Turkish stemmer has an
+     * indefinite recursion so it can crash on long-enough strings).
+     */
+    if (len <= 0 || len > 1000)
+        PG_RETURN_POINTER(res);
+
+    txt = lowerstr_with_len(in, len);
+
+    /* txt is probably not zero-length now, but we'll check anyway */
     if (*txt == '\0' || searchstoplist(&(d->stoplist), txt))
     {
+        /* empty or stopword, so reject */
         pfree(txt);
     }
     else

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Reducing the chunk header sizes on all memory context types
Next
From: David Rowley
Date:
Subject: Re: Reducing the chunk header sizes on all memory context types