Thread: [tsearch2] Problem with case sensitivity (or with creating own dictionary)
[tsearch2] Problem with case sensitivity (or with creating own dictionary)
From
Krzysztof xaru Rajda
Date:
Hello, I encountered such a problem. my goal is to extract links from a text using tsearch2. Everything seemed to be well, unless I got some youtube links - there are some small and big letters inside, and a tsearch parser is lowering everything (from http://youtube.com/Y6dsHDX I got http://youtube.com/y6dshdx, which is not working). I went through PostgreSQL docs, and it seem that each of default dictionaries (simple, ispell, snowball) are lowering lexems during normalization, and there is no option to disable it. I started to look for some tutorials, how to create own dictionary, or modify existing one (I'm talking about dictionary like snowball, with my own source code - not just a dictionary created by 'CREATE DICTIONARY...' query), but all I found is really out-of-date, and uses some mechanisms that are deprecated in latest version of Postgres (I'm working on v 9.2) - like 'contrib/gendict' here: http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/docs/custom-dict.html <http://www.sai.msu.su/%7Emegera/postgres/gist/tsearch/V2/docs/custom-dict.html> So now, I have no idea what to do with my case sensitivity problem... Is there any other way to overcome it, apart from creating own dictionary? If no - how to create one on the Postgres 9.2? Regards, xaru
Re: [tsearch2] Problem with case sensitivity (or with creating own dictionary)
From
Oleg Bartunov
Date:
Please, take a look on contrib/dict_int and create your own dict_noop. It should be easy. I think you could document it and share with people (wiki.postgresql.org ?), since there were other people interesting in noop dictionary. Also, don't forget to modify your configuration - use ts_debug(), it will helps you. Regards, Oleg On Sat, 3 Aug 2013, Krzysztof xaru Rajda wrote: > Hello, > > I encountered such a problem. my goal is to extract links from a text using > tsearch2. Everything seemed to be well, unless I got some youtube links - > there are some small and big letters inside, and a tsearch parser is lowering > everything (from http://youtube.com/Y6dsHDX I got http://youtube.com/y6dshdx, > which is not working). I went through PostgreSQL docs, and it seem that each > of default dictionaries (simple, ispell, snowball) are lowering lexems during > normalization, and there is no option to disable it. > > I started to look for some tutorials, how to create own dictionary, or modify > existing one (I'm talking about dictionary like snowball, with my own source > code - not just a dictionary created by 'CREATE DICTIONARY...' query), but > all I found is really out-of-date, and uses some mechanisms that are > deprecated in latest version of Postgres (I'm working on v 9.2) - like > 'contrib/gendict' here: > http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/docs/custom-dict.html > <http://www.sai.msu.su/%7Emegera/postgres/gist/tsearch/V2/docs/custom-dict.html> > > So now, I have no idea what to do with my case sensitivity problem... Is > there any other way to overcome it, apart from creating own dictionary? If no > - how to create one on the Postgres 9.2? > > Regards, > xaru > > > Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
Re: [tsearch2] Problem with case sensitivity (or with creating own dictionary)
From
Krzysztof xaru Rajda
Date:
Ok, so to be sure if I understand everything - first I should install a postgresql-contrib extension. Next, there will appear a contrib/dict_int directory with dict_int sourcecode inside, which I can modify. Then, I'll be able to install this modified dictionary, and it would be working properly, like ispell or snowball dictionaries. Finally, if everything will be ok, I'll share a little tutorial at wiki :) Am I right, or it isn't that easy? Regards, xaru W dniu 2013-08-05 18:37, Oleg Bartunov pisze: > Please, > > take a look on contrib/dict_int and create your own dict_noop. > It should be easy. I think you could document it and share > with people (wiki.postgresql.org ?), since there were other people > interesting in noop dictionary. Also, don't forget to modify > your configuration - use ts_debug(), it will helps you. > > Regards, > Oleg > > On Sat, 3 Aug 2013, Krzysztof xaru Rajda wrote: > >> Hello, >> >> I encountered such a problem. my goal is to extract links from a text >> using tsearch2. Everything seemed to be well, unless I got some >> youtube links - there are some small and big letters inside, and a >> tsearch parser is lowering everything (from >> http://youtube.com/Y6dsHDX I got http://youtube.com/y6dshdx, which is >> not working). I went through PostgreSQL docs, and it seem that each >> of default dictionaries (simple, ispell, snowball) are lowering >> lexems during normalization, and there is no option to disable it. >> >> I started to look for some tutorials, how to create own dictionary, >> or modify existing one (I'm talking about dictionary like snowball, >> with my own source code - not just a dictionary created by 'CREATE >> DICTIONARY...' query), but all I found is really out-of-date, and >> uses some mechanisms that are deprecated in latest version of >> Postgres (I'm working on v 9.2) - like 'contrib/gendict' here: >> http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/docs/custom-dict.html >> <http://www.sai.msu.su/%7Emegera/postgres/gist/tsearch/V2/docs/custom-dict.html> >> >> So now, I have no idea what to do with my case sensitivity problem... >> Is there any other way to overcome it, apart from creating own >> dictionary? If no - how to create one on the Postgres 9.2? >> >> Regards, >> xaru >> >> >> > > Regards, > Oleg > _____________________________________________________________ > Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), > Sternberg Astronomical Institute, Moscow University, Russia > Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ > phone: +007(495)939-16-83, +007(495)939-23-83