pgsql: Update to latest Snowball sources. - Mailing list pgsql-committers

From Tom Lane
Subject pgsql: Update to latest Snowball sources.
Date
Msg-id E1vcr6R-004hYS-0k@gemulon.postgresql.org
Whole thread Raw
Responses Re: pgsql: Update to latest Snowball sources.
List pgsql-committers
Update to latest Snowball sources.

It's been almost a year since we last did this, and upstream has
been busy.  They've added stemmers for Polish and Esperanto,
and also deprecated their old Dutch stemmer in favor of the
Kraaij-Pohlmann algorithm.  (The "dutch" stemmer is now the
latter, and "dutch_porter" is the old algorithm.)

Upstream also decided to rename their internal header "header.h"
to something less generic: "snowball_runtime.h".  Seems like a good
thing, but it complicates this patch a bit because we were relying on
interposing our own version of "header.h" to control system header
inclusion order.  (We're partially failing at that now, because now the
generated stemmer files include <stddef.h> before snowball_runtime.h.
I think that'll be okay, but if the buildfarm complains then we'll
have to do more-extensive editing of the generated files.)

I realized that we weren't documenting the available stemmers in
any user-visible place, except indirectly through sample \dFd output.
That's incomplete because we only provide built-in dictionaries for
the recommended stemmers for each language, not alternative stemmers
such as dutch_porter.  So I added a list to the documentation.

I did not do anything with the stopword lists.  If those are still
available from snowballstem.org, they are mighty well hidden.

Discussion: https://postgr.es/m/1185975.1767569534@sss.pgh.pa.us

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/7dc95cc3b94f2f558606e5ec307466a4e3dbc832

Modified Files
--------------
doc/src/sgml/textsearch.sgml                       |    52 +
src/backend/snowball/Makefile                      |     5 +
src/backend/snowball/README                        |    22 +-
src/backend/snowball/dict_snowball.c               |    14 +-
src/backend/snowball/libstemmer/api.c              |    48 +-
.../snowball/libstemmer/stem_ISO_8859_1_basque.c   |  1132 +--
.../snowball/libstemmer/stem_ISO_8859_1_catalan.c  |  1364 +--
.../snowball/libstemmer/stem_ISO_8859_1_danish.c   |   324 +-
.../snowball/libstemmer/stem_ISO_8859_1_dutch.c    |  2338 ++++-
.../libstemmer/stem_ISO_8859_1_dutch_porter.c      |   665 ++
.../snowball/libstemmer/stem_ISO_8859_1_english.c  |  1310 +--
.../snowball/libstemmer/stem_ISO_8859_1_finnish.c  |   686 +-
.../snowball/libstemmer/stem_ISO_8859_1_french.c   |  1493 +--
.../snowball/libstemmer/stem_ISO_8859_1_german.c   |   547 +-
.../libstemmer/stem_ISO_8859_1_indonesian.c        |   543 +-
.../snowball/libstemmer/stem_ISO_8859_1_irish.c    |   388 +-
.../snowball/libstemmer/stem_ISO_8859_1_italian.c  |   995 +-
.../libstemmer/stem_ISO_8859_1_norwegian.c         |   420 +-
.../snowball/libstemmer/stem_ISO_8859_1_porter.c   |   611 +-
.../libstemmer/stem_ISO_8859_1_portuguese.c        |   901 +-
.../snowball/libstemmer/stem_ISO_8859_1_spanish.c  |  1017 +-
.../snowball/libstemmer/stem_ISO_8859_1_swedish.c  |   477 +-
.../libstemmer/stem_ISO_8859_2_hungarian.c         |  1101 +-
.../snowball/libstemmer/stem_ISO_8859_2_polish.c   |   520 +
.../snowball/libstemmer/stem_KOI8_R_russian.c      |   602 +-
.../snowball/libstemmer/stem_UTF_8_arabic.c        |  1554 +--
.../snowball/libstemmer/stem_UTF_8_armenian.c      |   528 +-
.../snowball/libstemmer/stem_UTF_8_basque.c        |  1135 +--
.../snowball/libstemmer/stem_UTF_8_catalan.c       |  1367 +--
.../snowball/libstemmer/stem_UTF_8_danish.c        |   326 +-
src/backend/snowball/libstemmer/stem_UTF_8_dutch.c |  2400 ++++-
.../snowball/libstemmer/stem_UTF_8_dutch_porter.c  |   680 ++
.../snowball/libstemmer/stem_UTF_8_english.c       |  1324 +--
.../snowball/libstemmer/stem_UTF_8_esperanto.c     |   820 ++
.../snowball/libstemmer/stem_UTF_8_estonian.c      |  2010 ++--
.../snowball/libstemmer/stem_UTF_8_finnish.c       |   696 +-
.../snowball/libstemmer/stem_UTF_8_french.c        |  1523 +--
.../snowball/libstemmer/stem_UTF_8_german.c        |   554 +-
src/backend/snowball/libstemmer/stem_UTF_8_greek.c |  4218 ++++----
src/backend/snowball/libstemmer/stem_UTF_8_hindi.c |   308 +-
.../snowball/libstemmer/stem_UTF_8_hungarian.c     |  1100 +-
.../snowball/libstemmer/stem_UTF_8_indonesian.c    |   543 +-
src/backend/snowball/libstemmer/stem_UTF_8_irish.c |   388 +-
.../snowball/libstemmer/stem_UTF_8_italian.c       |  1007 +-
.../snowball/libstemmer/stem_UTF_8_lithuanian.c    |  1179 ++-
.../snowball/libstemmer/stem_UTF_8_nepali.c        |   598 +-
.../snowball/libstemmer/stem_UTF_8_norwegian.c     |   422 +-
.../snowball/libstemmer/stem_UTF_8_polish.c        |   523 +
.../snowball/libstemmer/stem_UTF_8_porter.c        |   620 +-
.../snowball/libstemmer/stem_UTF_8_portuguese.c    |   910 +-
.../snowball/libstemmer/stem_UTF_8_romanian.c      |   961 +-
.../snowball/libstemmer/stem_UTF_8_russian.c       |   625 +-
.../snowball/libstemmer/stem_UTF_8_serbian.c       | 10148 ++++++++++---------
.../snowball/libstemmer/stem_UTF_8_spanish.c       |  1023 +-
.../snowball/libstemmer/stem_UTF_8_swedish.c       |   479 +-
src/backend/snowball/libstemmer/stem_UTF_8_tamil.c |  1361 +--
.../snowball/libstemmer/stem_UTF_8_turkish.c       |  2371 +++--
.../snowball/libstemmer/stem_UTF_8_yiddish.c       |  1235 +--
src/backend/snowball/libstemmer/utilities.c        |   205 +-
src/backend/snowball/meson.build                   |     7 +-
src/backend/snowball/snowball_create.pl            |     2 +
src/bin/initdb/initdb.c                            |     2 +
src/include/snowball/libstemmer/api.h              |    18 +-
src/include/snowball/libstemmer/header.h           |    61 -
src/include/snowball/libstemmer/snowball_runtime.h |   109 +
.../snowball/libstemmer/stem_ISO_8859_1_basque.h   |     3 +-
.../snowball/libstemmer/stem_ISO_8859_1_catalan.h  |     3 +-
.../snowball/libstemmer/stem_ISO_8859_1_danish.h   |     3 +-
.../snowball/libstemmer/stem_ISO_8859_1_dutch.h    |     3 +-
.../libstemmer/stem_ISO_8859_1_dutch_porter.h      |    14 +
.../snowball/libstemmer/stem_ISO_8859_1_english.h  |     3 +-
.../snowball/libstemmer/stem_ISO_8859_1_finnish.h  |     3 +-
.../snowball/libstemmer/stem_ISO_8859_1_french.h   |     3 +-
.../snowball/libstemmer/stem_ISO_8859_1_german.h   |     3 +-
.../libstemmer/stem_ISO_8859_1_indonesian.h        |     3 +-
.../snowball/libstemmer/stem_ISO_8859_1_irish.h    |     3 +-
.../snowball/libstemmer/stem_ISO_8859_1_italian.h  |     3 +-
.../libstemmer/stem_ISO_8859_1_norwegian.h         |     3 +-
.../snowball/libstemmer/stem_ISO_8859_1_porter.h   |     3 +-
.../libstemmer/stem_ISO_8859_1_portuguese.h        |     3 +-
.../snowball/libstemmer/stem_ISO_8859_1_spanish.h  |     3 +-
.../snowball/libstemmer/stem_ISO_8859_1_swedish.h  |     3 +-
.../libstemmer/stem_ISO_8859_2_hungarian.h         |     3 +-
.../snowball/libstemmer/stem_ISO_8859_2_polish.h   |    14 +
.../snowball/libstemmer/stem_KOI8_R_russian.h      |     3 +-
.../snowball/libstemmer/stem_UTF_8_arabic.h        |     3 +-
.../snowball/libstemmer/stem_UTF_8_armenian.h      |     3 +-
.../snowball/libstemmer/stem_UTF_8_basque.h        |     3 +-
.../snowball/libstemmer/stem_UTF_8_catalan.h       |     3 +-
.../snowball/libstemmer/stem_UTF_8_danish.h        |     3 +-
src/include/snowball/libstemmer/stem_UTF_8_dutch.h |     3 +-
.../snowball/libstemmer/stem_UTF_8_dutch_porter.h  |    14 +
.../snowball/libstemmer/stem_UTF_8_english.h       |     3 +-
.../snowball/libstemmer/stem_UTF_8_esperanto.h     |    14 +
.../snowball/libstemmer/stem_UTF_8_estonian.h      |     3 +-
.../snowball/libstemmer/stem_UTF_8_finnish.h       |     3 +-
.../snowball/libstemmer/stem_UTF_8_french.h        |     3 +-
.../snowball/libstemmer/stem_UTF_8_german.h        |     3 +-
src/include/snowball/libstemmer/stem_UTF_8_greek.h |     3 +-
src/include/snowball/libstemmer/stem_UTF_8_hindi.h |     3 +-
.../snowball/libstemmer/stem_UTF_8_hungarian.h     |     3 +-
.../snowball/libstemmer/stem_UTF_8_indonesian.h    |     3 +-
src/include/snowball/libstemmer/stem_UTF_8_irish.h |     3 +-
.../snowball/libstemmer/stem_UTF_8_italian.h       |     3 +-
.../snowball/libstemmer/stem_UTF_8_lithuanian.h    |     3 +-
.../snowball/libstemmer/stem_UTF_8_nepali.h        |     3 +-
.../snowball/libstemmer/stem_UTF_8_norwegian.h     |     3 +-
.../snowball/libstemmer/stem_UTF_8_polish.h        |    14 +
.../snowball/libstemmer/stem_UTF_8_porter.h        |     3 +-
.../snowball/libstemmer/stem_UTF_8_portuguese.h    |     3 +-
.../snowball/libstemmer/stem_UTF_8_romanian.h      |     3 +-
.../snowball/libstemmer/stem_UTF_8_russian.h       |     3 +-
.../snowball/libstemmer/stem_UTF_8_serbian.h       |     3 +-
.../snowball/libstemmer/stem_UTF_8_spanish.h       |     3 +-
.../snowball/libstemmer/stem_UTF_8_swedish.h       |     3 +-
src/include/snowball/libstemmer/stem_UTF_8_tamil.h |     3 +-
.../snowball/libstemmer/stem_UTF_8_turkish.h       |     3 +-
.../snowball/libstemmer/stem_UTF_8_yiddish.h       |     3 +-
.../snowball/{header.h => snowball_runtime.h}      |    22 +-
119 files changed, 36038 insertions(+), 27113 deletions(-)


pgsql-committers by date:

Previous
From: Andres Freund
Date:
Subject: pgsql: ci: Remove ulimit -p for netbsd/openbsd
Next
From: Tom Lane
Date:
Subject: pgsql: Fix meson build of snowball code.