Thread: tsearch2 for alphabetic character strings & codes
I'm looking for a way search for substrings strings within documents in a way very similar to tsearch2, but my strings are not alphabetical codes so I'm having a tough time trying to use the current tsearch2 configurations with them. For example, using tsearch to search for codes like '31.03(e)(2)(A)' in a set of documents is tricky because tsearch seems to treat most of the punctuation as word separators. fli=# select fli-# to_tsvector('default','31.03(e)(2)(A)'), fli-# to_tsvector('simple','31.03(e)(2)(A)'); to_tsvector | to_tsvector -----------------------+----------------------------- '2':3 'e':2 '31.03':1 | '2':3 'a':4 'e':2 '31.03':1 (1 row) I see that tsearch2 allows different "configurations" that appaently differ in how they parse strings. I guess what I'm looking for is a "configuration" that's even simpler-than-simple, and only breaks up strings on whitespace and doesn't use any natural language dictionaries. I was hoping I could download or define such a configuration; but didn't see any obvious documentation on how to set up my own configuration. Does this sound like a good approach (and if so, could someone please point me in the right direction), or are there other things I should be looking to. Ron
Ron, probably you need to write custom parser. tsearch2 supports different parsers. Oleg On Fri, 23 Sep 2005, Ron Mayer wrote: > > I'm looking for a way search for substrings strings within > documents in a way very similar to tsearch2, but my strings > are not alphabetical codes so I'm having a tough time > trying to use the current tsearch2 configurations with them. > > For example, using tsearch to search for codes like > '31.03(e)(2)(A)' > in a set of documents is tricky because tsearch seems > to treat most of the punctuation as word separators. > > fli=# select > fli-# to_tsvector('default','31.03(e)(2)(A)'), > fli-# to_tsvector('simple','31.03(e)(2)(A)'); > > to_tsvector | to_tsvector > -----------------------+----------------------------- > '2':3 'e':2 '31.03':1 | '2':3 'a':4 'e':2 '31.03':1 > (1 row) > > > I see that tsearch2 allows different "configurations" > that appaently differ in how they parse strings. > > I guess what I'm looking for is a "configuration" > that's even simpler-than-simple, and only breaks > up strings on whitespace and doesn't use any natural > language dictionaries. I was hoping I could download > or define such a configuration; but didn't see any > obvious documentation on how to set up my own > configuration. > > Does this sound like a good approach (and if so, could > someone please point me in the right direction), or > are there other things I should be looking to. > > Ron > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org > Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
On Saturday 24 September 2005 00:09, Oleg Bartunov wrote: > Ron, > > probably you need to write custom parser. tsearch2 supports > different parsers. > To expand somewhat on what Oleg mentioned, you can find a howto on writing a custom parser here : http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/docs/HOWTO-parser-tsearch2.html This example might be exactly what you are looking for, I did not look into it too much myself though, but it appears to just split on whitespace. There is lots of documentation, examples, help, and other goodies for tsearch2 here: http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/ HTH, Andy