Home > mailing lists

Re: full text search: the concept of a "word" - Mailing list pgsql-general

From	Teodor Sigaev
Subject	Re: full text search: the concept of a "word"
Date	April 20, 2006 20:56:05
Msg-id	44481F8E.1050800@sigaev.ru Whole thread Raw
In response to	full text search: the concept of a "word" ("Tomi NA" <hefest@gmail.com>)
List	pgsql-general

Tree view

> My textfields are trigger-generated using information from a number of
> tables: these fields can be, say, a couple of thousand characters
> wide.
> Up to here, there's no problem.
> What I'd like to do is define - possibly using regexps - what
> constitutes a word. For instance, my word separator is a semicolon,
> not a space; a dash is not a separator, and neither are language
> specific characters (which might be interpreted that way by a language
> agnostic tool)...
> BTW, I use UTF-8 as my database encoding if it's of any importance.

I do not see a big problem: just write your own parser.

It's may be a problem with UTF-8: only CHS head tsearch2 supports UTF-8. But you
can find a patch on 8.1 at http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/




--
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/

pgsql-general by date:

From: "Tomi NA"
Date: 20 April 2006, 19:49:39
Subject: setting the environment locale - linux, windows

From: Teodor Sigaev
Date: 20 April 2006, 21:01:03
Subject: Re: GiST index slower than seqscan

Re: full text search: the concept of a "word" - Mailing list pgsql-general

Previous

Next