Re: Initial ugly reverse-translator - Mailing list pgsql-general

From Oleg Bartunov
Subject Re: Initial ugly reverse-translator
Date
Msg-id Pine.LNX.4.64.0901160816450.9554@sn.sai.msu.ru
Whole thread Raw
In response to Re: Initial ugly reverse-translator  (pepone.onrez <pepone.onrez@gmail.com>)
List pgsql-general
Hi,

ltree and pg_trgm with UTF8 support are available from CVS HEAD, see
See http://archives.postgresql.org/pgsql-committers/2008-06/msg00356.php
http://archives.postgresql.org/pgsql-committers/2008-11/msg00139.php

Oleg
On Fri, 16 Jan 2009, pepone.onrez wrote:

> On Sat, Apr 19, 2008 at 6:10 PM, Oleg Bartunov <oleg@sai.msu.su> wrote:
>> On Sat, 19 Apr 2008, Tom Lane wrote:
>>
>>> Craig Ringer <craig@postnewspapers.com.au> writes:
>>>>
>>>> Tom Lane wrote:
>>>>>
>>>>> I don't really see the problem.  I assume from your reference to pg_trgm
>>>>> that you're using trigram similarity as the prefilter for potential
>>>>> matches
>>>
>>>> It turns out that's no good anyway, as it appears to ignore characters
>>>> outside the ASCII range. Rather less than useful for searching a
>>>> database of translated strings ;-)
>>>
>>> A quick look at the pg_trgm code suggests that it is only prepared to
>>> deal with single-byte encodings; if you're working in UTF8, which I
>>> suppose you'd have to be, it's dead in the water :-(.  Perhaps fixing
>>> that should be on the TODO list.
>>
>> as well as ltree. they are in our todo list:
>> http://www.sai.msu.su/~megera/wiki/TODO
>>
>
> Hi Oleg
>
> In your TODO list says that UTF8 was added to ltree, is this code
> currently available for download?
>
> Regards,
> JosЪЪ
>>>
>>> But in any case maybe the full-text-search stuff would be more useful
>>> as a prefilter?  Although honestly, for the speed we need here, I'm
>>> not sure a prefilter is needed at all.  Full text might be useful
>>> if a LIKE-based match fails, though.
>>>
>>>>> (And besides, speed doesn't seem like the be-all and end-all here.)
>>>
>>>> True. It's not so much the speed as the fragility when faced with small
>>>> changes to formatting. In addition to whitespace, some clients mangle
>>>> punctuation with features like automatic "curly"-quoting.
>>>
>>> Yeah.  I was wondering whether encoding differences wouldn't be a huge
>>> problem in practice, as well.
>>>
>>>                        regards, tom lane
>>>
>>>
>>
>>        Regards,
>>                Oleg
>> _____________________________________________________________
>> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
>> Sternberg Astronomical Institute, Moscow University, Russia
>> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
>> phone: +007(495)939-16-83, +007(495)939-23-83
>>
>> --
>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
>>
>
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

pgsql-general by date:

Previous
From: Dhaval Shah
Date:
Subject: Question regarding Postgres + OpenSSL + FIPs
Next
From: Jeff Davis
Date:
Subject: Re: Query sometimes takes down server