Re: Another tsearch bug... - Mailing list pgsql-hackers

From Oleg Bartunov
Subject Re: Another tsearch bug...
Date
Msg-id Pine.GSO.4.44.0208231520120.15230-100000@ra.sai.msu.su
Whole thread Raw
In response to Another tsearch bug...  ("Christopher Kings-Lynne" <chriskl@familyhealth.com.au>)
List pgsql-hackers
On Fri, 23 Aug 2002, Christopher Kings-Lynne wrote:

> Hi guys,
>
> Hate to keep coming up with these bugs without patches - but I really don't
> have time to look into the source code atm :(
>
> OK, attached is an example of the problem.  Notice how trademarks and
> copyright symbols are being indexed along with the word.  This means that if
> someone searches for 'balance' in the above data set, they won't find
> anything.
>
> I'm not sure how this would be handled.  In the English language, it'd
> probably be safe to say that high ascii characters would be stripped from
> the index?  But you'd want to leave accents and stuff in I guess.  Tricky.

Rather tricky. The problem is that we don't know how to get flex to works
with locale. Parser recognizes latin words ([a-zA-Z]), nonLatin ([\0200-\0377])
and mixed words ([a-zA-Z\0200-\0377]). Your case (balanceR) is the mixed word.
The right way is to have locale aware parser to properly recognize words.
We incline to refuse a flex.

>
> Anyway, just bringing it to your attention...
>
> Chris
>
Regards,    Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83



pgsql-hackers by date:

Previous
From: Vince Vielhaber
Date:
Subject: Re: My head is spinning
Next
From: Chris Humphries
Date:
Subject: Re: v7.2.2 packaged ...