Home > mailing lists

Re: Stack overflow issue - Mailing list pgsql-hackers

From	Richard Guo
Subject	Re: Stack overflow issue
Date	August 31, 2022 05:38:23
Msg-id	CAMbWs49H7=jV2oHdx_uzGyGUL_Lg4tS799KaLcHCUWa1VwggXw@mail.gmail.com Whole thread Raw
In response to	Re: Stack overflow issue (Tom Lane <tgl@sss.pgh.pa.us>)
List	pgsql-hackers

Tree view

On Wed, Aug 31, 2022 at 6:57 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I wrote:
> The upstream recommendation, which seems pretty sane to me, is to
> simply reject any string exceeding some threshold length as not
> possibly being a word. Apparently it's common to use thresholds
> as small as 64 bytes, but in the attached I used 1000 bytes.

On further thought: that coding treats anything longer than 1000
bytes as a stopword, but maybe we should just accept it unmodified.
The manual says "A Snowball dictionary recognizes everything, whether
or not it is able to simplify the word". While "recognizes" formally
includes the case of "recognizes as a stopword", people might find
this behavior surprising. We could alternatively do it as attached,
which accepts overlength words but does nothing to them except
case-fold. This is closer to the pre-patch behavior, but gives up
the opportunity to avoid useless downstream processing of long words.

This patch looks good to me. It avoids overly-long words (> 1000 bytes)
going through the stemmer so the stack overflow issue in Turkish stemmer
should not exist any more.

Thanks
Richard

pgsql-hackers by date:

From: Peter Geoghegan
Date: 31 August 2022, 04:50:49
Subject: Re: New strategies for freezing, advancing relfrozenxid early

From: John Naylor
Date: 31 August 2022, 06:50:39
Subject: Re: [PATCH] Optimize json_lex_string by batching character copying

Re: Stack overflow issue - Mailing list pgsql-hackers

Previous

Next