Re: tsearch2 anomoly? - Mailing list pgsql-general
From | Teodor Sigaev |
---|---|
Subject | Re: tsearch2 anomoly? |
Date | |
Msg-id | 46E15B15.2090703@sigaev.ru Whole thread Raw |
In response to | Re: tsearch2 anomoly? (RC Gobeille <bob.gobeille@hp.com>) |
List | pgsql-general |
Usual text hasn't strict syntax rules, so parser tries to recognize most probable token. Something with '.', '-' and alnum characters is often a filename, but filename is very rare finished or started by dot. RC Gobeille wrote: > Thanks and I didn't know about ts_debug, so thanks for that also. > > For the record, I see how to use my own processing function (e.g. > dropatsymbol) to get what I need: > http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/docs/tsearch-V2-intro > .html > > However, can you explain the logic behind the parsing difference if I just > add a ".s" to a string: > > > ossdb=# select ts_debug('gallery2-httpd-2.1-conf.'); > ts_debug > ----------------------------------------------------------------------- > (default,hword,"Hyphenated word",gallery2-httpd-2,{simple},"'2' 'httpd' > 'gallery2' 'gallery2-httpd-2'") > (default,part_hword,"Part of hyphenated word",gallery2,{simple},'gallery2') > (default,lpart_hword,"Latin part of hyphenated > word",httpd,{en_stem},'httpd') > (default,float,"Decimal notation",2.1,{simple},'2.1') > (default,lpart_hword,"Latin part of hyphenated word",conf,{en_stem},'conf') > (5 rows) > > ossdb=# select ts_debug('gallery2-httpd-2.1-conf.s'); > ts_debug > --------------------------------------------------------------------- > (default,host,Host,gallery2-httpd-2.1-conf.s,{simple},'gallery2-httpd-2.1-c > onf.s') > (1 row) > > Thanks again, > Bob > > > On 9/6/07 11:19 AM, "Oleg Bartunov" <oleg@sai.msu.su> wrote: > >> This is how default parser works. See output from >> select * from ts_debug('gallery2-httpd-conf'); >> and >> select * from ts_debug('httpd-2.2.3-5.src.rpm'); >> >> All token type: >> >> select * from token_type(); >> >> >> On Thu, 6 Sep 2007, RC Gobeille wrote: >> >>> I'm having trouble understanding to_tsvector. (PostreSQL 8.1.9 contrib) >>> >>> In this first case converting 'gallery2-httpd-conf' makes sense to me and is >>> exactly what I want. It looks like the entire string is indexed plus the >>> substrings broken by '-' are indexed. >>> >>> >>> ossdb=# select to_tsvector('gallery2-httpd-conf'); >>> to_tsvector >>> --------------------------------------------------------- >>> 'conf':4 'httpd':3 'gallery2':2 'gallery2-httpd-conf':1 >>> >>> >>> However, I'd expect the same to happen in the httpd example - but it does not >>> appear to. >>> >>> ossdb=# select to_tsvector('httpd-2.2.3-5.src.rpm'); >>> to_tsvector >>> --------------------------- >>> 'httpd-2.2.3-5.src.rpm':1 >>> >>> Why don't I get: 'httpd', 'src', 'rpm', 'httpd-2.2.3-5.src.rpm' ? >>> >>> Is this a bug or design? >>> >>> >>> Thank you! >>> Bob >> Regards, >> Oleg >> _____________________________________________________________ >> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), >> Sternberg Astronomical Institute, Moscow University, Russia >> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ >> phone: +007(495)939-16-83, +007(495)939-23-83 > > -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
pgsql-general by date: