Re: Stack overflow issue - Mailing list pgsql-hackers

From Richard Guo
Subject Re: Stack overflow issue
Date
Msg-id CAMbWs49H7=jV2oHdx_uzGyGUL_Lg4tS799KaLcHCUWa1VwggXw@mail.gmail.com
Whole thread Raw
In response to Re: Stack overflow issue  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers

On Wed, Aug 31, 2022 at 6:57 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I wrote:
> The upstream recommendation, which seems pretty sane to me, is to
> simply reject any string exceeding some threshold length as not
> possibly being a word.  Apparently it's common to use thresholds
> as small as 64 bytes, but in the attached I used 1000 bytes.

On further thought: that coding treats anything longer than 1000
bytes as a stopword, but maybe we should just accept it unmodified.
The manual says "A Snowball dictionary recognizes everything, whether
or not it is able to simplify the word".  While "recognizes" formally
includes the case of "recognizes as a stopword", people might find
this behavior surprising.  We could alternatively do it as attached,
which accepts overlength words but does nothing to them except
case-fold.  This is closer to the pre-patch behavior, but gives up
the opportunity to avoid useless downstream processing of long words.
 
This patch looks good to me. It avoids overly-long words (> 1000 bytes)
going through the stemmer so the stack overflow issue in Turkish stemmer
should not exist any more.

Thanks
Richard

pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: New strategies for freezing, advancing relfrozenxid early
Next
From: John Naylor
Date:
Subject: Re: [PATCH] Optimize json_lex_string by batching character copying