pgsql: Update to latest Snowball sources. - Mailing list pgsql-committers
From | Tom Lane |
---|---|
Subject | pgsql: Update to latest Snowball sources. |
Date | |
Msg-id | E1tkZbJ-0003lu-1z@gemulon.postgresql.org Whole thread Raw |
List | pgsql-committers |
Update to latest Snowball sources. It's been some time since we did this, partly because the upstream snowball project hasn't formally tagged a new release since 2021. The main motivation for doing it now is to absorb a bug fix (their commit e322673a841d9abd69994ae8cd20e191090b6ef4), which prevents a null pointer dereference crash if SN_create_env() gets a malloc failure at just the wrong point. We'll patch the back branches with only that change, but we might as well do the full sync dance on HEAD. Aside from a bunch of mostly-minor tweaks to existing stemmers, this update adds a new stemmer for Estonian. It also removes the existing stemmer for Romanian using ISO-8859-2 encoding. Upstream apparently concluded that ISO-8859-2 doesn't provide an adequate representation of some Romanian characters, and the UTF-8 implementation should be used instead. While at it, update the README's instructions for doing a sync, which have not been adjusted during the addition of meson tooling. Thanks to Maksim Korotkov for discovering the null-pointer bug and submitting the fix to upstream snowball. Reported-by: Maksim Korotkov <m.korotkov@postgrespro.ru> Discussion: https://postgr.es/m/1d1a46-67ab1000-21-80c451@83151435 Branch ------ master Details ------- https://git.postgresql.org/pg/commitdiff/b464e51ab32fbf09cf5d9c911a8e26f491ad1f44 Modified Files -------------- doc/src/sgml/textsearch.sgml | 1 + src/backend/snowball/Makefile | 2 +- src/backend/snowball/README | 22 +- src/backend/snowball/dict_snowball.c | 4 +- src/backend/snowball/libstemmer/api.c | 2 +- .../snowball/libstemmer/stem_ISO_8859_1_basque.c | 38 +- .../snowball/libstemmer/stem_ISO_8859_1_catalan.c | 29 +- .../snowball/libstemmer/stem_ISO_8859_1_danish.c | 14 +- .../snowball/libstemmer/stem_ISO_8859_1_dutch.c | 56 +- .../snowball/libstemmer/stem_ISO_8859_1_english.c | 154 +- .../snowball/libstemmer/stem_ISO_8859_1_finnish.c | 34 +- .../snowball/libstemmer/stem_ISO_8859_1_french.c | 256 +-- .../snowball/libstemmer/stem_ISO_8859_1_german.c | 403 ++-- .../libstemmer/stem_ISO_8859_1_indonesian.c | 36 +- .../snowball/libstemmer/stem_ISO_8859_1_irish.c | 31 +- .../snowball/libstemmer/stem_ISO_8859_1_italian.c | 134 +- .../libstemmer/stem_ISO_8859_1_norwegian.c | 14 +- .../snowball/libstemmer/stem_ISO_8859_1_porter.c | 65 +- .../libstemmer/stem_ISO_8859_1_portuguese.c | 48 +- .../snowball/libstemmer/stem_ISO_8859_1_spanish.c | 51 +- .../snowball/libstemmer/stem_ISO_8859_1_swedish.c | 57 +- .../libstemmer/stem_ISO_8859_2_hungarian.c | 31 +- .../snowball/libstemmer/stem_ISO_8859_2_romanian.c | 965 --------- .../snowball/libstemmer/stem_KOI8_R_russian.c | 33 +- .../snowball/libstemmer/stem_UTF_8_arabic.c | 124 +- .../snowball/libstemmer/stem_UTF_8_armenian.c | 25 +- .../snowball/libstemmer/stem_UTF_8_basque.c | 38 +- .../snowball/libstemmer/stem_UTF_8_catalan.c | 29 +- .../snowball/libstemmer/stem_UTF_8_danish.c | 14 +- src/backend/snowball/libstemmer/stem_UTF_8_dutch.c | 58 +- .../snowball/libstemmer/stem_UTF_8_english.c | 158 +- .../snowball/libstemmer/stem_UTF_8_estonian.c | 1416 ++++++++++++++ .../snowball/libstemmer/stem_UTF_8_finnish.c | 34 +- .../snowball/libstemmer/stem_UTF_8_french.c | 274 +-- .../snowball/libstemmer/stem_UTF_8_german.c | 405 ++-- src/backend/snowball/libstemmer/stem_UTF_8_greek.c | 376 ++-- src/backend/snowball/libstemmer/stem_UTF_8_hindi.c | 2 +- .../snowball/libstemmer/stem_UTF_8_hungarian.c | 31 +- .../snowball/libstemmer/stem_UTF_8_indonesian.c | 36 +- src/backend/snowball/libstemmer/stem_UTF_8_irish.c | 31 +- .../snowball/libstemmer/stem_UTF_8_italian.c | 134 +- .../snowball/libstemmer/stem_UTF_8_lithuanian.c | 27 +- .../snowball/libstemmer/stem_UTF_8_nepali.c | 8 +- .../snowball/libstemmer/stem_UTF_8_norwegian.c | 14 +- .../snowball/libstemmer/stem_UTF_8_porter.c | 65 +- .../snowball/libstemmer/stem_UTF_8_portuguese.c | 48 +- .../snowball/libstemmer/stem_UTF_8_romanian.c | 1104 ++++++----- .../snowball/libstemmer/stem_UTF_8_russian.c | 33 +- .../snowball/libstemmer/stem_UTF_8_serbian.c | 35 +- .../snowball/libstemmer/stem_UTF_8_spanish.c | 51 +- .../snowball/libstemmer/stem_UTF_8_swedish.c | 57 +- src/backend/snowball/libstemmer/stem_UTF_8_tamil.c | 2064 ++++++++------------ .../snowball/libstemmer/stem_UTF_8_turkish.c | 243 +-- .../snowball/libstemmer/stem_UTF_8_yiddish.c | 91 +- src/backend/snowball/libstemmer/utilities.c | 6 +- src/backend/snowball/meson.build | 2 +- src/backend/snowball/snowball_create.pl | 1 + src/bin/initdb/initdb.c | 2 + src/include/snowball/libstemmer/header.h | 2 +- .../snowball/libstemmer/stem_ISO_8859_2_romanian.h | 15 - .../snowball/libstemmer/stem_UTF_8_estonian.h | 15 + 61 files changed, 4970 insertions(+), 4578 deletions(-)
pgsql-committers by date: