> The only reason the TS stuff needs an encoding spec is to figure out how
> to read an external stop word file. I think my suggestion upthread is a
> lot better: have just one stop word file per language, store them all in
> UTF8, and convert to database encoding when loading them. The database
Hmm. You mean to use language name in configuration, use current encoding to
define which dictionary should be used (stemmers for the same language are
different for different encoding) and recode dictionaries file from UTF8 to
current locale. Did I understand you right?
That's possible to do. But it's incompatible changes and cause some difficulties
for DBA. If server locale is ISO (or KOI8 or any other) and file is in UTF8 then
text editor/tools might be confused.
--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/