Thread: tsearch2 for alphabetic character strings & codes

tsearch2 for alphabetic character strings & codes

From

Ron Mayer

Date:

23 September 2005, 18:43:33

I'm looking for a way search for substrings strings within
documents in a way very similar to tsearch2, but my strings
are not alphabetical codes so I'm having a tough time
trying to use the current tsearch2 configurations with them.

For example, using tsearch to search for codes like
  '31.03(e)(2)(A)'
in a set of documents is tricky because tsearch seems
to treat most of the punctuation as word separators.

  fli=# select
  fli-#      to_tsvector('default','31.03(e)(2)(A)'),
  fli-#      to_tsvector('simple','31.03(e)(2)(A)');

        to_tsvector      |         to_tsvector
  -----------------------+-----------------------------
   '2':3 'e':2 '31.03':1 | '2':3 'a':4 'e':2 '31.03':1
  (1 row)


I see that tsearch2 allows different "configurations"
that appaently differ in how they parse strings.

I guess what I'm looking for is a "configuration"
that's even simpler-than-simple, and only breaks
up strings on whitespace and doesn't use any natural
language dictionaries. I was hoping I could download
or define such a configuration; but didn't see any
obvious documentation on how to set up my own
configuration.

Does this sound like a good approach (and if so, could
someone please point me in the right direction), or
are there other things I should be looking to.

   Ron

Re: tsearch2 for alphabetic character strings & codes

From

Oleg Bartunov

Date:

24 September 2005, 03:12:07

Ron,

probably you need to write custom parser. tsearch2 supports
different parsers.

     Oleg
On Fri, 23 Sep 2005, Ron Mayer wrote:

>
> I'm looking for a way search for substrings strings within
> documents in a way very similar to tsearch2, but my strings
> are not alphabetical codes so I'm having a tough time
> trying to use the current tsearch2 configurations with them.
>
> For example, using tsearch to search for codes like
>  '31.03(e)(2)(A)'
> in a set of documents is tricky because tsearch seems
> to treat most of the punctuation as word separators.
>
>  fli=# select
>  fli-#      to_tsvector('default','31.03(e)(2)(A)'),
>  fli-#      to_tsvector('simple','31.03(e)(2)(A)');
>
>        to_tsvector      |         to_tsvector
>  -----------------------+-----------------------------
>   '2':3 'e':2 '31.03':1 | '2':3 'a':4 'e':2 '31.03':1
>  (1 row)
>
>
> I see that tsearch2 allows different "configurations"
> that appaently differ in how they parse strings.
>
> I guess what I'm looking for is a "configuration"
> that's even simpler-than-simple, and only breaks
> up strings on whitespace and doesn't use any natural
> language dictionaries. I was hoping I could download
> or define such a configuration; but didn't see any
> obvious documentation on how to set up my own
> configuration.
>
> Does this sound like a good approach (and if so, could
> someone please point me in the right direction), or
> are there other things I should be looking to.
>
>   Ron
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>               http://archives.postgresql.org
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

Re: tsearch2 for alphabetic character strings & codes

From

"Andrew J. Kopciuch"

Date:

24 September 2005, 06:32:14

On Saturday 24 September 2005 00:09, Oleg Bartunov wrote:
> Ron,
>
> probably you need to write custom parser. tsearch2 supports
> different parsers.
>

To expand somewhat on what Oleg mentioned, you can find a howto on writing a
custom parser here :

http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/docs/HOWTO-parser-tsearch2.html

This example might be exactly what you are looking for, I did not look into it
too much myself though, but it appears to just split on whitespace.

There is lots of documentation, examples, help, and other goodies for tsearch2
here:

http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/

HTH,

Andy