pgsql: Update to latest Snowball sources. - Mailing list pgsql-committers

From Tom Lane
Subject pgsql: Update to latest Snowball sources.
Date
Msg-id E1tkZbJ-0003lu-1z@gemulon.postgresql.org
Whole thread Raw
List pgsql-committers
Update to latest Snowball sources.

It's been some time since we did this, partly because the upstream
snowball project hasn't formally tagged a new release since 2021.
The main motivation for doing it now is to absorb a bug fix
(their commit e322673a841d9abd69994ae8cd20e191090b6ef4), which
prevents a null pointer dereference crash if SN_create_env() gets
a malloc failure at just the wrong point.  We'll patch the back
branches with only that change, but we might as well do the full
sync dance on HEAD.

Aside from a bunch of mostly-minor tweaks to existing stemmers, this
update adds a new stemmer for Estonian.  It also removes the existing
stemmer for Romanian using ISO-8859-2 encoding.  Upstream apparently
concluded that ISO-8859-2 doesn't provide an adequate representation
of some Romanian characters, and the UTF-8 implementation should be
used instead.

While at it, update the README's instructions for doing a sync,
which have not been adjusted during the addition of meson tooling.

Thanks to Maksim Korotkov for discovering the null-pointer
bug and submitting the fix to upstream snowball.

Reported-by: Maksim Korotkov <m.korotkov@postgrespro.ru>
Discussion: https://postgr.es/m/1d1a46-67ab1000-21-80c451@83151435

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/b464e51ab32fbf09cf5d9c911a8e26f491ad1f44

Modified Files
--------------
doc/src/sgml/textsearch.sgml                       |    1 +
src/backend/snowball/Makefile                      |    2 +-
src/backend/snowball/README                        |   22 +-
src/backend/snowball/dict_snowball.c               |    4 +-
src/backend/snowball/libstemmer/api.c              |    2 +-
.../snowball/libstemmer/stem_ISO_8859_1_basque.c   |   38 +-
.../snowball/libstemmer/stem_ISO_8859_1_catalan.c  |   29 +-
.../snowball/libstemmer/stem_ISO_8859_1_danish.c   |   14 +-
.../snowball/libstemmer/stem_ISO_8859_1_dutch.c    |   56 +-
.../snowball/libstemmer/stem_ISO_8859_1_english.c  |  154 +-
.../snowball/libstemmer/stem_ISO_8859_1_finnish.c  |   34 +-
.../snowball/libstemmer/stem_ISO_8859_1_french.c   |  256 +--
.../snowball/libstemmer/stem_ISO_8859_1_german.c   |  403 ++--
.../libstemmer/stem_ISO_8859_1_indonesian.c        |   36 +-
.../snowball/libstemmer/stem_ISO_8859_1_irish.c    |   31 +-
.../snowball/libstemmer/stem_ISO_8859_1_italian.c  |  134 +-
.../libstemmer/stem_ISO_8859_1_norwegian.c         |   14 +-
.../snowball/libstemmer/stem_ISO_8859_1_porter.c   |   65 +-
.../libstemmer/stem_ISO_8859_1_portuguese.c        |   48 +-
.../snowball/libstemmer/stem_ISO_8859_1_spanish.c  |   51 +-
.../snowball/libstemmer/stem_ISO_8859_1_swedish.c  |   57 +-
.../libstemmer/stem_ISO_8859_2_hungarian.c         |   31 +-
.../snowball/libstemmer/stem_ISO_8859_2_romanian.c |  965 ---------
.../snowball/libstemmer/stem_KOI8_R_russian.c      |   33 +-
.../snowball/libstemmer/stem_UTF_8_arabic.c        |  124 +-
.../snowball/libstemmer/stem_UTF_8_armenian.c      |   25 +-
.../snowball/libstemmer/stem_UTF_8_basque.c        |   38 +-
.../snowball/libstemmer/stem_UTF_8_catalan.c       |   29 +-
.../snowball/libstemmer/stem_UTF_8_danish.c        |   14 +-
src/backend/snowball/libstemmer/stem_UTF_8_dutch.c |   58 +-
.../snowball/libstemmer/stem_UTF_8_english.c       |  158 +-
.../snowball/libstemmer/stem_UTF_8_estonian.c      | 1416 ++++++++++++++
.../snowball/libstemmer/stem_UTF_8_finnish.c       |   34 +-
.../snowball/libstemmer/stem_UTF_8_french.c        |  274 +--
.../snowball/libstemmer/stem_UTF_8_german.c        |  405 ++--
src/backend/snowball/libstemmer/stem_UTF_8_greek.c |  376 ++--
src/backend/snowball/libstemmer/stem_UTF_8_hindi.c |    2 +-
.../snowball/libstemmer/stem_UTF_8_hungarian.c     |   31 +-
.../snowball/libstemmer/stem_UTF_8_indonesian.c    |   36 +-
src/backend/snowball/libstemmer/stem_UTF_8_irish.c |   31 +-
.../snowball/libstemmer/stem_UTF_8_italian.c       |  134 +-
.../snowball/libstemmer/stem_UTF_8_lithuanian.c    |   27 +-
.../snowball/libstemmer/stem_UTF_8_nepali.c        |    8 +-
.../snowball/libstemmer/stem_UTF_8_norwegian.c     |   14 +-
.../snowball/libstemmer/stem_UTF_8_porter.c        |   65 +-
.../snowball/libstemmer/stem_UTF_8_portuguese.c    |   48 +-
.../snowball/libstemmer/stem_UTF_8_romanian.c      | 1104 ++++++-----
.../snowball/libstemmer/stem_UTF_8_russian.c       |   33 +-
.../snowball/libstemmer/stem_UTF_8_serbian.c       |   35 +-
.../snowball/libstemmer/stem_UTF_8_spanish.c       |   51 +-
.../snowball/libstemmer/stem_UTF_8_swedish.c       |   57 +-
src/backend/snowball/libstemmer/stem_UTF_8_tamil.c | 2064 ++++++++------------
.../snowball/libstemmer/stem_UTF_8_turkish.c       |  243 +--
.../snowball/libstemmer/stem_UTF_8_yiddish.c       |   91 +-
src/backend/snowball/libstemmer/utilities.c        |    6 +-
src/backend/snowball/meson.build                   |    2 +-
src/backend/snowball/snowball_create.pl            |    1 +
src/bin/initdb/initdb.c                            |    2 +
src/include/snowball/libstemmer/header.h           |    2 +-
.../snowball/libstemmer/stem_ISO_8859_2_romanian.h |   15 -
.../snowball/libstemmer/stem_UTF_8_estonian.h      |   15 +
61 files changed, 4970 insertions(+), 4578 deletions(-)


pgsql-committers by date:

Previous
From: Richard Guo
Date:
Subject: pgsql: Fix unsafe access to BufferDescriptors
Next
From: Richard Guo
Date:
Subject: pgsql: Fix unsafe access to BufferDescriptors