Michael Paesold <mpaesold@gmx.at> writes:
> After reading the discussion and the introduction, here is what I think
> tsearch in core should at least accomplish in 8.3.
> ...
> - Stop words in tables, not in external files.
I realized that there's a pretty serious problem with doing that, which
is encoding. We don't have any way to deal with preloaded catalog data
that exceeds 7-bit-ASCII, because when you do CREATE DATABASE ... ENCODING
it's going to be copied over exactly as-is. And there's plenty of
not-ASCII stuff in the non-English stopword files. This is something we
need to solve eventually, but I think it ties into the whole multiple
locale can-of-worms; there's no way we're getting it done for 8.3.
So I'm afraid we have to settle for stop words in external files for the
moment. I do have two suggestions though:
* Let's have just one stopword file for each language, with the
convention that the file is stored in UTF8 no matter what language
you're talking about. We can have the stopword reading code convert
to the database encoding on-the-fly when it reads the file. Without
this there's just a whole bunch of foot-guns there. We'd at least need
to have encoding verification checks when reading the files, which seems
hardly cheaper than just translating the data.
* Let's fix it so the reference to the stoplist in the user-visible
options is just a name, with no path or anything like that. (Similar
to the handling of timezone_abbreviations.) Then it will be feasible
to re-interpret the option as a reference to a named list in a catalog
someday, when we solve the encoding problem. Right now the patch has
things like
+ DATA(insert OID = 5140 ( "ru_stem_koi8" PGNSP PGUID 5135 5137 "dicts_data/russian.stop.koi8"));
which is really binding the option pretty tightly to being a filename;
not to mention the large security risks involved in letting anyone but
a superuser have control of such an option.
> What I don't really like is the number of commands introduced without
> any strong reference to full text search. E.g. CREATE CONFIGURATION
> gives no hint at all that this is about full text search.
Yeah. We had some off-list discussion about this and concluded that
TEXT SEARCH seemed to be the right phrase to use in the command names.
That hasn't gotten reflected into the patch yet.
regards, tom lane