initdb faild to initialize full text search dictionaries - Mailing list pgsql-bugs

From Les
Subject initdb faild to initialize full text search dictionaries
Date
Msg-id CAKXe9UAHCUYJTdh+s7QNywgRt5fAO8RNEhXwzo0bW1kDQxhQqg@mail.gmail.com
Whole thread Raw
Responses Re: initdb faild to initialize full text search dictionaries  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs
This is a follow-up on bug #17356. PostgreSQL version 15 is also affected: it is not able to build dictionary from hunspell packages.

How to reproduce: 

# Start minimal virgin system
docker run -it --rm debian bash
apt update
apt install -y acl ca-certificates curl gzip libbsd0 libbz2-1.0 libc6 libedit2 libffi7 libnettle8 libicu67 libreadline8 libgcc1 libgmp10 libgnutls30 libhogweed6  libidn2-0 libldap-2.4-2 liblz4-1 liblzma5 \
libncurses6  libp11-kit0 libpcre3  libsasl2-2 libsqlite3-0 libssl1.1 libstdc++6 libtasn1-6 libtinfo6 libunistring2 libuuid1 libxml2 libxslt1.1 libzstd1 locales procps tar zlib1g gnupg dumb-init curl

# Install hunspell-hu and PostgreSQL
apt install -y hunspell hunspell-hu
curl -s https://salsa.debian.org/postgresql/postgresql-common/raw/master/pgdg/apt.postgresql.org.sh | bash
apt update
apt install -y postgresql-15

The last command "apt install -y postgresql-15" gives this error:

Building PostgreSQL dictionaries from installed myspell/hunspell packages...
  hu_hu
iconv: illegal input sequence at position 131
ERROR: Conversion of /usr/share/hunspell/hu_HU.aff failed
Removing obsolete dictionary files:

I'm not sure where the problem is. It may be in hunspell, or hunspell-hu, or iconv or postgresql. I have tried to find the root cause, but I falied. At least it seems that it is NOT a bug in hunspell or hunspell-hu, because the author of hunspell wrote this comment in 2018 at https://github.com/hunspell/hunspell/issues/559#issuecomment-446335091

> Not a bug: Hunspell's file format is not an UTF-8 encoded text file in the case of SET UTF-8 with the default 8-bit FLAG.

That hunspell issue is only open because "it is a valid request", but it is not a bug nonetheless (according to the author).

So it might be iconv, or it might be pg_updatedicts that calls iconv with the wrong parameters. I do not know enough to tell...

The effect of this bug is that PostgreSQL is not able to utilize the dictionaries for full text search. ( https://www.postgresql.org/docs/15/textsearch-dictionaries.html ) I did not try ispell or myspell yet, but they are old (ancient, actually) and hunspell should be preferred. I think that this bug has been around at least since 5 years (2018).

Regards,

   Laszlo Zsolt Nagy


pgsql-bugs by date:

Previous
From: José Lorenzo Urdaneta Rodriguez
Date:
Subject: Server crash with parallel workers with Postgres 14.7
Next
From: sulfinu@gmail.com
Date:
Subject: Aggregation results with json(b)_agg and array_agg in a SELECT with OUTER JOIN