Re: Initial ugly reverse-translator - Mailing list pgsql-general
From | Oleg Bartunov |
---|---|
Subject | Re: Initial ugly reverse-translator |
Date | |
Msg-id | Pine.LNX.4.64.0901160816450.9554@sn.sai.msu.ru Whole thread Raw |
In response to | Re: Initial ugly reverse-translator (pepone.onrez <pepone.onrez@gmail.com>) |
List | pgsql-general |
Hi, ltree and pg_trgm with UTF8 support are available from CVS HEAD, see See http://archives.postgresql.org/pgsql-committers/2008-06/msg00356.php http://archives.postgresql.org/pgsql-committers/2008-11/msg00139.php Oleg On Fri, 16 Jan 2009, pepone.onrez wrote: > On Sat, Apr 19, 2008 at 6:10 PM, Oleg Bartunov <oleg@sai.msu.su> wrote: >> On Sat, 19 Apr 2008, Tom Lane wrote: >> >>> Craig Ringer <craig@postnewspapers.com.au> writes: >>>> >>>> Tom Lane wrote: >>>>> >>>>> I don't really see the problem. I assume from your reference to pg_trgm >>>>> that you're using trigram similarity as the prefilter for potential >>>>> matches >>> >>>> It turns out that's no good anyway, as it appears to ignore characters >>>> outside the ASCII range. Rather less than useful for searching a >>>> database of translated strings ;-) >>> >>> A quick look at the pg_trgm code suggests that it is only prepared to >>> deal with single-byte encodings; if you're working in UTF8, which I >>> suppose you'd have to be, it's dead in the water :-(. Perhaps fixing >>> that should be on the TODO list. >> >> as well as ltree. they are in our todo list: >> http://www.sai.msu.su/~megera/wiki/TODO >> > > Hi Oleg > > In your TODO list says that UTF8 was added to ltree, is this code > currently available for download? > > Regards, > JosЪЪ >>> >>> But in any case maybe the full-text-search stuff would be more useful >>> as a prefilter? Although honestly, for the speed we need here, I'm >>> not sure a prefilter is needed at all. Full text might be useful >>> if a LIKE-based match fails, though. >>> >>>>> (And besides, speed doesn't seem like the be-all and end-all here.) >>> >>>> True. It's not so much the speed as the fragility when faced with small >>>> changes to formatting. In addition to whitespace, some clients mangle >>>> punctuation with features like automatic "curly"-quoting. >>> >>> Yeah. I was wondering whether encoding differences wouldn't be a huge >>> problem in practice, as well. >>> >>> regards, tom lane >>> >>> >> >> Regards, >> Oleg >> _____________________________________________________________ >> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), >> Sternberg Astronomical Institute, Moscow University, Russia >> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ >> phone: +007(495)939-16-83, +007(495)939-23-83 >> >> -- >> Sent via pgsql-general mailing list (pgsql-general@postgresql.org) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-general >> > > Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
pgsql-general by date: