pgsql: Update to latest Snowball sources. - Mailing list pgsql-committers
| From | Tom Lane |
|---|---|
| Subject | pgsql: Update to latest Snowball sources. |
| Date | |
| Msg-id | E1vcr6R-004hYS-0k@gemulon.postgresql.org Whole thread Raw |
| Responses |
Re: pgsql: Update to latest Snowball sources.
|
| List | pgsql-committers |
Update to latest Snowball sources. It's been almost a year since we last did this, and upstream has been busy. They've added stemmers for Polish and Esperanto, and also deprecated their old Dutch stemmer in favor of the Kraaij-Pohlmann algorithm. (The "dutch" stemmer is now the latter, and "dutch_porter" is the old algorithm.) Upstream also decided to rename their internal header "header.h" to something less generic: "snowball_runtime.h". Seems like a good thing, but it complicates this patch a bit because we were relying on interposing our own version of "header.h" to control system header inclusion order. (We're partially failing at that now, because now the generated stemmer files include <stddef.h> before snowball_runtime.h. I think that'll be okay, but if the buildfarm complains then we'll have to do more-extensive editing of the generated files.) I realized that we weren't documenting the available stemmers in any user-visible place, except indirectly through sample \dFd output. That's incomplete because we only provide built-in dictionaries for the recommended stemmers for each language, not alternative stemmers such as dutch_porter. So I added a list to the documentation. I did not do anything with the stopword lists. If those are still available from snowballstem.org, they are mighty well hidden. Discussion: https://postgr.es/m/1185975.1767569534@sss.pgh.pa.us Branch ------ master Details ------- https://git.postgresql.org/pg/commitdiff/7dc95cc3b94f2f558606e5ec307466a4e3dbc832 Modified Files -------------- doc/src/sgml/textsearch.sgml | 52 + src/backend/snowball/Makefile | 5 + src/backend/snowball/README | 22 +- src/backend/snowball/dict_snowball.c | 14 +- src/backend/snowball/libstemmer/api.c | 48 +- .../snowball/libstemmer/stem_ISO_8859_1_basque.c | 1132 +-- .../snowball/libstemmer/stem_ISO_8859_1_catalan.c | 1364 +-- .../snowball/libstemmer/stem_ISO_8859_1_danish.c | 324 +- .../snowball/libstemmer/stem_ISO_8859_1_dutch.c | 2338 ++++- .../libstemmer/stem_ISO_8859_1_dutch_porter.c | 665 ++ .../snowball/libstemmer/stem_ISO_8859_1_english.c | 1310 +-- .../snowball/libstemmer/stem_ISO_8859_1_finnish.c | 686 +- .../snowball/libstemmer/stem_ISO_8859_1_french.c | 1493 +-- .../snowball/libstemmer/stem_ISO_8859_1_german.c | 547 +- .../libstemmer/stem_ISO_8859_1_indonesian.c | 543 +- .../snowball/libstemmer/stem_ISO_8859_1_irish.c | 388 +- .../snowball/libstemmer/stem_ISO_8859_1_italian.c | 995 +- .../libstemmer/stem_ISO_8859_1_norwegian.c | 420 +- .../snowball/libstemmer/stem_ISO_8859_1_porter.c | 611 +- .../libstemmer/stem_ISO_8859_1_portuguese.c | 901 +- .../snowball/libstemmer/stem_ISO_8859_1_spanish.c | 1017 +- .../snowball/libstemmer/stem_ISO_8859_1_swedish.c | 477 +- .../libstemmer/stem_ISO_8859_2_hungarian.c | 1101 +- .../snowball/libstemmer/stem_ISO_8859_2_polish.c | 520 + .../snowball/libstemmer/stem_KOI8_R_russian.c | 602 +- .../snowball/libstemmer/stem_UTF_8_arabic.c | 1554 +-- .../snowball/libstemmer/stem_UTF_8_armenian.c | 528 +- .../snowball/libstemmer/stem_UTF_8_basque.c | 1135 +-- .../snowball/libstemmer/stem_UTF_8_catalan.c | 1367 +-- .../snowball/libstemmer/stem_UTF_8_danish.c | 326 +- src/backend/snowball/libstemmer/stem_UTF_8_dutch.c | 2400 ++++- .../snowball/libstemmer/stem_UTF_8_dutch_porter.c | 680 ++ .../snowball/libstemmer/stem_UTF_8_english.c | 1324 +-- .../snowball/libstemmer/stem_UTF_8_esperanto.c | 820 ++ .../snowball/libstemmer/stem_UTF_8_estonian.c | 2010 ++-- .../snowball/libstemmer/stem_UTF_8_finnish.c | 696 +- .../snowball/libstemmer/stem_UTF_8_french.c | 1523 +-- .../snowball/libstemmer/stem_UTF_8_german.c | 554 +- src/backend/snowball/libstemmer/stem_UTF_8_greek.c | 4218 ++++---- src/backend/snowball/libstemmer/stem_UTF_8_hindi.c | 308 +- .../snowball/libstemmer/stem_UTF_8_hungarian.c | 1100 +- .../snowball/libstemmer/stem_UTF_8_indonesian.c | 543 +- src/backend/snowball/libstemmer/stem_UTF_8_irish.c | 388 +- .../snowball/libstemmer/stem_UTF_8_italian.c | 1007 +- .../snowball/libstemmer/stem_UTF_8_lithuanian.c | 1179 ++- .../snowball/libstemmer/stem_UTF_8_nepali.c | 598 +- .../snowball/libstemmer/stem_UTF_8_norwegian.c | 422 +- .../snowball/libstemmer/stem_UTF_8_polish.c | 523 + .../snowball/libstemmer/stem_UTF_8_porter.c | 620 +- .../snowball/libstemmer/stem_UTF_8_portuguese.c | 910 +- .../snowball/libstemmer/stem_UTF_8_romanian.c | 961 +- .../snowball/libstemmer/stem_UTF_8_russian.c | 625 +- .../snowball/libstemmer/stem_UTF_8_serbian.c | 10148 ++++++++++--------- .../snowball/libstemmer/stem_UTF_8_spanish.c | 1023 +- .../snowball/libstemmer/stem_UTF_8_swedish.c | 479 +- src/backend/snowball/libstemmer/stem_UTF_8_tamil.c | 1361 +-- .../snowball/libstemmer/stem_UTF_8_turkish.c | 2371 +++-- .../snowball/libstemmer/stem_UTF_8_yiddish.c | 1235 +-- src/backend/snowball/libstemmer/utilities.c | 205 +- src/backend/snowball/meson.build | 7 +- src/backend/snowball/snowball_create.pl | 2 + src/bin/initdb/initdb.c | 2 + src/include/snowball/libstemmer/api.h | 18 +- src/include/snowball/libstemmer/header.h | 61 - src/include/snowball/libstemmer/snowball_runtime.h | 109 + .../snowball/libstemmer/stem_ISO_8859_1_basque.h | 3 +- .../snowball/libstemmer/stem_ISO_8859_1_catalan.h | 3 +- .../snowball/libstemmer/stem_ISO_8859_1_danish.h | 3 +- .../snowball/libstemmer/stem_ISO_8859_1_dutch.h | 3 +- .../libstemmer/stem_ISO_8859_1_dutch_porter.h | 14 + .../snowball/libstemmer/stem_ISO_8859_1_english.h | 3 +- .../snowball/libstemmer/stem_ISO_8859_1_finnish.h | 3 +- .../snowball/libstemmer/stem_ISO_8859_1_french.h | 3 +- .../snowball/libstemmer/stem_ISO_8859_1_german.h | 3 +- .../libstemmer/stem_ISO_8859_1_indonesian.h | 3 +- .../snowball/libstemmer/stem_ISO_8859_1_irish.h | 3 +- .../snowball/libstemmer/stem_ISO_8859_1_italian.h | 3 +- .../libstemmer/stem_ISO_8859_1_norwegian.h | 3 +- .../snowball/libstemmer/stem_ISO_8859_1_porter.h | 3 +- .../libstemmer/stem_ISO_8859_1_portuguese.h | 3 +- .../snowball/libstemmer/stem_ISO_8859_1_spanish.h | 3 +- .../snowball/libstemmer/stem_ISO_8859_1_swedish.h | 3 +- .../libstemmer/stem_ISO_8859_2_hungarian.h | 3 +- .../snowball/libstemmer/stem_ISO_8859_2_polish.h | 14 + .../snowball/libstemmer/stem_KOI8_R_russian.h | 3 +- .../snowball/libstemmer/stem_UTF_8_arabic.h | 3 +- .../snowball/libstemmer/stem_UTF_8_armenian.h | 3 +- .../snowball/libstemmer/stem_UTF_8_basque.h | 3 +- .../snowball/libstemmer/stem_UTF_8_catalan.h | 3 +- .../snowball/libstemmer/stem_UTF_8_danish.h | 3 +- src/include/snowball/libstemmer/stem_UTF_8_dutch.h | 3 +- .../snowball/libstemmer/stem_UTF_8_dutch_porter.h | 14 + .../snowball/libstemmer/stem_UTF_8_english.h | 3 +- .../snowball/libstemmer/stem_UTF_8_esperanto.h | 14 + .../snowball/libstemmer/stem_UTF_8_estonian.h | 3 +- .../snowball/libstemmer/stem_UTF_8_finnish.h | 3 +- .../snowball/libstemmer/stem_UTF_8_french.h | 3 +- .../snowball/libstemmer/stem_UTF_8_german.h | 3 +- src/include/snowball/libstemmer/stem_UTF_8_greek.h | 3 +- src/include/snowball/libstemmer/stem_UTF_8_hindi.h | 3 +- .../snowball/libstemmer/stem_UTF_8_hungarian.h | 3 +- .../snowball/libstemmer/stem_UTF_8_indonesian.h | 3 +- src/include/snowball/libstemmer/stem_UTF_8_irish.h | 3 +- .../snowball/libstemmer/stem_UTF_8_italian.h | 3 +- .../snowball/libstemmer/stem_UTF_8_lithuanian.h | 3 +- .../snowball/libstemmer/stem_UTF_8_nepali.h | 3 +- .../snowball/libstemmer/stem_UTF_8_norwegian.h | 3 +- .../snowball/libstemmer/stem_UTF_8_polish.h | 14 + .../snowball/libstemmer/stem_UTF_8_porter.h | 3 +- .../snowball/libstemmer/stem_UTF_8_portuguese.h | 3 +- .../snowball/libstemmer/stem_UTF_8_romanian.h | 3 +- .../snowball/libstemmer/stem_UTF_8_russian.h | 3 +- .../snowball/libstemmer/stem_UTF_8_serbian.h | 3 +- .../snowball/libstemmer/stem_UTF_8_spanish.h | 3 +- .../snowball/libstemmer/stem_UTF_8_swedish.h | 3 +- src/include/snowball/libstemmer/stem_UTF_8_tamil.h | 3 +- .../snowball/libstemmer/stem_UTF_8_turkish.h | 3 +- .../snowball/libstemmer/stem_UTF_8_yiddish.h | 3 +- .../snowball/{header.h => snowball_runtime.h} | 22 +- 119 files changed, 36038 insertions(+), 27113 deletions(-)
pgsql-committers by date: