Update to latest Snowball sources.
It's been some time since we did this, partly because the upstream
snowball project hasn't formally tagged a new release since 2021.
The main motivation for doing it now is to absorb a bug fix
(their commit e322673a841d9abd69994ae8cd20e191090b6ef4), which
prevents a null pointer dereference crash if SN_create_env() gets
a malloc failure at just the wrong point. We'll patch the back
branches with only that change, but we might as well do the full
sync dance on HEAD.
Aside from a bunch of mostly-minor tweaks to existing stemmers, this
update adds a new stemmer for Estonian. It also removes the existing
stemmer for Romanian using ISO-8859-2 encoding. Upstream apparently
concluded that ISO-8859-2 doesn't provide an adequate representation
of some Romanian characters, and the UTF-8 implementation should be
used instead.
While at it, update the README's instructions for doing a sync,
which have not been adjusted during the addition of meson tooling.
Thanks to Maksim Korotkov for discovering the null-pointer
bug and submitting the fix to upstream snowball.
Reported-by: Maksim Korotkov <m.korotkov@postgrespro.ru>
Discussion: https://postgr.es/m/1d1a46-67ab1000-21-80c451@83151435
Branch
------
master
Details
-------
https://git.postgresql.org/pg/commitdiff/b464e51ab32fbf09cf5d9c911a8e26f491ad1f44
Modified Files
--------------
doc/src/sgml/textsearch.sgml | 1 +
src/backend/snowball/Makefile | 2 +-
src/backend/snowball/README | 22 +-
src/backend/snowball/dict_snowball.c | 4 +-
src/backend/snowball/libstemmer/api.c | 2 +-
.../snowball/libstemmer/stem_ISO_8859_1_basque.c | 38 +-
.../snowball/libstemmer/stem_ISO_8859_1_catalan.c | 29 +-
.../snowball/libstemmer/stem_ISO_8859_1_danish.c | 14 +-
.../snowball/libstemmer/stem_ISO_8859_1_dutch.c | 56 +-
.../snowball/libstemmer/stem_ISO_8859_1_english.c | 154 +-
.../snowball/libstemmer/stem_ISO_8859_1_finnish.c | 34 +-
.../snowball/libstemmer/stem_ISO_8859_1_french.c | 256 +--
.../snowball/libstemmer/stem_ISO_8859_1_german.c | 403 ++--
.../libstemmer/stem_ISO_8859_1_indonesian.c | 36 +-
.../snowball/libstemmer/stem_ISO_8859_1_irish.c | 31 +-
.../snowball/libstemmer/stem_ISO_8859_1_italian.c | 134 +-
.../libstemmer/stem_ISO_8859_1_norwegian.c | 14 +-
.../snowball/libstemmer/stem_ISO_8859_1_porter.c | 65 +-
.../libstemmer/stem_ISO_8859_1_portuguese.c | 48 +-
.../snowball/libstemmer/stem_ISO_8859_1_spanish.c | 51 +-
.../snowball/libstemmer/stem_ISO_8859_1_swedish.c | 57 +-
.../libstemmer/stem_ISO_8859_2_hungarian.c | 31 +-
.../snowball/libstemmer/stem_ISO_8859_2_romanian.c | 965 ---------
.../snowball/libstemmer/stem_KOI8_R_russian.c | 33 +-
.../snowball/libstemmer/stem_UTF_8_arabic.c | 124 +-
.../snowball/libstemmer/stem_UTF_8_armenian.c | 25 +-
.../snowball/libstemmer/stem_UTF_8_basque.c | 38 +-
.../snowball/libstemmer/stem_UTF_8_catalan.c | 29 +-
.../snowball/libstemmer/stem_UTF_8_danish.c | 14 +-
src/backend/snowball/libstemmer/stem_UTF_8_dutch.c | 58 +-
.../snowball/libstemmer/stem_UTF_8_english.c | 158 +-
.../snowball/libstemmer/stem_UTF_8_estonian.c | 1416 ++++++++++++++
.../snowball/libstemmer/stem_UTF_8_finnish.c | 34 +-
.../snowball/libstemmer/stem_UTF_8_french.c | 274 +--
.../snowball/libstemmer/stem_UTF_8_german.c | 405 ++--
src/backend/snowball/libstemmer/stem_UTF_8_greek.c | 376 ++--
src/backend/snowball/libstemmer/stem_UTF_8_hindi.c | 2 +-
.../snowball/libstemmer/stem_UTF_8_hungarian.c | 31 +-
.../snowball/libstemmer/stem_UTF_8_indonesian.c | 36 +-
src/backend/snowball/libstemmer/stem_UTF_8_irish.c | 31 +-
.../snowball/libstemmer/stem_UTF_8_italian.c | 134 +-
.../snowball/libstemmer/stem_UTF_8_lithuanian.c | 27 +-
.../snowball/libstemmer/stem_UTF_8_nepali.c | 8 +-
.../snowball/libstemmer/stem_UTF_8_norwegian.c | 14 +-
.../snowball/libstemmer/stem_UTF_8_porter.c | 65 +-
.../snowball/libstemmer/stem_UTF_8_portuguese.c | 48 +-
.../snowball/libstemmer/stem_UTF_8_romanian.c | 1104 ++++++-----
.../snowball/libstemmer/stem_UTF_8_russian.c | 33 +-
.../snowball/libstemmer/stem_UTF_8_serbian.c | 35 +-
.../snowball/libstemmer/stem_UTF_8_spanish.c | 51 +-
.../snowball/libstemmer/stem_UTF_8_swedish.c | 57 +-
src/backend/snowball/libstemmer/stem_UTF_8_tamil.c | 2064 ++++++++------------
.../snowball/libstemmer/stem_UTF_8_turkish.c | 243 +--
.../snowball/libstemmer/stem_UTF_8_yiddish.c | 91 +-
src/backend/snowball/libstemmer/utilities.c | 6 +-
src/backend/snowball/meson.build | 2 +-
src/backend/snowball/snowball_create.pl | 1 +
src/bin/initdb/initdb.c | 2 +
src/include/snowball/libstemmer/header.h | 2 +-
.../snowball/libstemmer/stem_ISO_8859_2_romanian.h | 15 -
.../snowball/libstemmer/stem_UTF_8_estonian.h | 15 +
61 files changed, 4970 insertions(+), 4578 deletions(-)