Thread: tsearch2 punctuation question

tsearch2 punctuation question

From
John DeSoi
Date:
For example:


select to_tsvector('cat,dog apple/orange');

            to_tsvector
----------------------------------
'cat':1 'dog':2 'apple/orange':3
(1 row)


Is there a setting that allows me to specify that strings containing
the '/' should be parsed into separate words? As is, I can't find
'apple' or 'orange'.

Thanks,

John




John DeSoi, Ph.D.
http://pgedit.com/
Power Tools for PostgreSQL


Re: tsearch2 punctuation question

From
Oleg Bartunov
Date:
On Thu, 26 Apr 2007, John DeSoi wrote:

> For example:
>
>
> select to_tsvector('cat,dog apple/orange');
>
>          to_tsvector
> ----------------------------------
> 'cat':1 'dog':2 'apple/orange':3
> (1 row)
>
>
> Is there a setting that allows me to specify that strings containing the '/'
> should be parsed into separate words? As is, I can't find 'apple' or
> 'orange'.

There is no such settings.
You can write your parser or  dictionary for 'file' token type. We have
howto, see
http://mira.sai.msu.su/~megera/pgsql/ftsdoc/appendixes.html

If you want simple parser, probable better to write one. Probably, the
simple way is to write dictionary, which will return
{apple/orange, apple,orange}.

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

Re: tsearch2 punctuation question

From
"Greg Sabino Mullane"
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160


> Is there a setting that allows me to specify that strings containing
> the '/' should be parsed into separate words? As is, I can't find
> 'apple' or 'orange'.

No setting, I think you would have to mess with tsearch2 dictionaries. A
far easier approach is to have your application simply split the words
apart, or even write a wrapper function to do it for you within Postgres, e.g.

CREATE OR REPLACE FUNCTION wordsplit(text) RETURNS text LANGUAGE plperl
AS $_$
 my $string = shift;
 $string =~ s/\W/ /g;
 return $string;
$_$;

SELECT to_tsvector(wordsplit('cat,dog apple/orange'));

- --
Greg Sabino Mullane greg@turnstep.com
PGP Key: 0x14964AC8 200704261140
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iD8DBQFGMMikvJuQZxSWSsgRAwq4AKDJv4D6UDKZngU2vZt+cPgr6gGsnwCgmJET
arG3n5+2pXxR+wedZ2LjZYU=
=BPs4
-----END PGP SIGNATURE-----