The following bug has been logged on the website:
Bug reference: 6455
Logged by: Desmares Vincent
Email address: vincent.desmares@inovia-team.com
PostgreSQL version: 9.1.0
Operating system: Ubuntu
Description:=20=20=20=20=20=20=20=20
Hello everyone,=20
We recently discovered something that could be a "bug" when using the Full
Text Search of Postgres. More precisely the ispell dictionary.
It appears that words composed with the same character (like =E2=80=9Ca=E2=
=80=9D, =E2=80=9Caa=E2=80=9D,
=E2=80=9Caaa=E2=80=9D, ...) trigger all the prefix and suffix rules even if=
nothing have
been specified in the dictionary.
We got the bug with the word =E2=80=9Ce=E2=80=9D which was associated to th=
e word =E2=80=9Cdeer=E2=80=9D.
Here is a short way to reproduce the bug from scratch :
# 1) Create a test.dict with only =E2=80=9Ce=E2=80=9D inside
cat =E2=80=9Ce=E2=80=9D > test.dict
# 2) Create an empty test.stop file
touch test.stop
# 3) Create a test.affix file with rules :
echo -e 'PFX C Y 1\nPFX C 0 de .\n\nSFX R Y 1\nSFX R 0 r e\n' > test.affix
# 4) Execute those requests :
DROP TEXT SEARCH DICTIONARY IF EXISTS testispell CASCADE;
CREATE TEXT SEARCH DICTIONARY testispell (
TEMPLATE =3D ispell,
DictFile =3D test,
AffFile =3D test,
StopWords =3D test
);
CREATE TEXT SEARCH CONFIGURATION test_ispell (
PARSER =3D "default"
);
ALTER TEXT SEARCH CONFIGURATION test_ispell ADD MAPPING FOR asciihword WITH
testispell;
ALTER TEXT SEARCH CONFIGURATION test_ispell ADD MAPPING FOR asciiword WITH
testispell;
ALTER TEXT SEARCH CONFIGURATION test_ispell ADD MAPPING FOR uint WITH
testispell;
ALTER TEXT SEARCH CONFIGURATION test_ispell ADD MAPPING FOR word WITH
testispell;
SELECT * from ts_debug('test_ispell', 'deer');
# 5) You should get a table with this result :
alias : "asciiword"
description : "Word, all ASCII"
token : "deer"
dictionaries : "{testispell}"
dictionary : "testispell"=20
lexemes : "{e}"
It appear that it=E2=80=99s reproductible with more characters of the same =
letter :
- .dict with [ee] searching for [deeer] give [ee]
but
- .dict with [ee] searching for [eer|deee] give nothing
Did we miss a configuration or a default behavior, or there is really a bug
?
Regards,
Vincent Desmares
Developer @ Inovia-team