Thread: initdb faild to initialize full text search dictionaries

initdb faild to initialize full text search dictionaries

From
Les
Date:
This is a follow-up on bug #17356. PostgreSQL version 15 is also affected: it is not able to build dictionary from hunspell packages.

How to reproduce: 

# Start minimal virgin system
docker run -it --rm debian bash
apt update
apt install -y acl ca-certificates curl gzip libbsd0 libbz2-1.0 libc6 libedit2 libffi7 libnettle8 libicu67 libreadline8 libgcc1 libgmp10 libgnutls30 libhogweed6  libidn2-0 libldap-2.4-2 liblz4-1 liblzma5 \
libncurses6  libp11-kit0 libpcre3  libsasl2-2 libsqlite3-0 libssl1.1 libstdc++6 libtasn1-6 libtinfo6 libunistring2 libuuid1 libxml2 libxslt1.1 libzstd1 locales procps tar zlib1g gnupg dumb-init curl

# Install hunspell-hu and PostgreSQL
apt install -y hunspell hunspell-hu
curl -s https://salsa.debian.org/postgresql/postgresql-common/raw/master/pgdg/apt.postgresql.org.sh | bash
apt update
apt install -y postgresql-15

The last command "apt install -y postgresql-15" gives this error:

Building PostgreSQL dictionaries from installed myspell/hunspell packages...
  hu_hu
iconv: illegal input sequence at position 131
ERROR: Conversion of /usr/share/hunspell/hu_HU.aff failed
Removing obsolete dictionary files:

I'm not sure where the problem is. It may be in hunspell, or hunspell-hu, or iconv or postgresql. I have tried to find the root cause, but I falied. At least it seems that it is NOT a bug in hunspell or hunspell-hu, because the author of hunspell wrote this comment in 2018 at https://github.com/hunspell/hunspell/issues/559#issuecomment-446335091

> Not a bug: Hunspell's file format is not an UTF-8 encoded text file in the case of SET UTF-8 with the default 8-bit FLAG.

That hunspell issue is only open because "it is a valid request", but it is not a bug nonetheless (according to the author).

So it might be iconv, or it might be pg_updatedicts that calls iconv with the wrong parameters. I do not know enough to tell...

The effect of this bug is that PostgreSQL is not able to utilize the dictionaries for full text search. ( https://www.postgresql.org/docs/15/textsearch-dictionaries.html ) I did not try ispell or myspell yet, but they are old (ancient, actually) and hunspell should be preferred. I think that this bug has been around at least since 5 years (2018).

Regards,

   Laszlo Zsolt Nagy


Re: initdb faild to initialize full text search dictionaries

From
Tom Lane
Date:
Les <nagylzs@gmail.com> writes:
> The last command "apt install -y postgresql-15" gives this error:

> Building PostgreSQL dictionaries from installed myspell/hunspell packages...
>   hu_hu
> iconv: illegal input sequence at position 131
> ERROR: Conversion of /usr/share/hunspell/hu_HU.aff failed

Sadly, I do not think any of the moving parts there are under the PG
project's control.  We certainly can't fix problems in either hunspell
or iconv, and even the fact that iconv is being applied during install
is not something the core project does.  I gather that this is
something the Debian packaging of postgres is attempting, so I'd
suggest taking it up with those packagers.  It's possible that it's
something easy like they have the wrong idea of what encoding that
particular file is in.  Or maybe the best answer is to skip any
files that fail conversion, without aborting the package install
entirely.  But we here on pgsql-bugs can't help you.

            regards, tom lane