Thread: Tsearch2 - spanish

Tsearch2 - spanish

From
Felipe de Jesús Molina Bravo
Date:

Hi

I had installed postgresql-8.2.4 and tsearch2 with dictionary spanish.
My problem is:

        prueba=# select to_tsvector('espanol','melón');
        ERROR:  Affix parse error at 506 line

And if execute:

        prueba=# select lexize('sp','melón');
         lexize
        ---------
         {melon}
        (1 row)




I tried many dictionaries with the same results. Also I change the
codeset of files :aff and dict (from "latin1 to utf8" and "utf8 to
iso88591") and got the same error

where  can I investigate for resolve about this problem?

My dictionary at 506 line had:

flag *J:        # isimo
    E   > -E, ÍSIMO     # grande grandísimo
    E   > -E, ÍSIMOS    # grande grandísimos
    E   > -E, ÍSIMA     # grande grandísima
    E   > -E, ÍSIMAS    # grande grandísimas
    O   > -O, ÍSIMO     # tonto tontísimo
    O   > -O, ÍSIMA     # tonto tontísima
    O   > -O, ÍSIMOS    # tonto tontísimos
    O   > -O, ÍSIMAS    # tonto tontísimas
    L   > ÍSIMO # formal formalísimo
    L   > ÍSIMA # formal formalísima
    L   > ÍSIMOS        # formal formalísimos
    L   > ÍSIMAS        # formal formalísimas

If removed "Í" then I don't have problem, but the lexema is incorrect


I saw the post
http://archives.postgresql.org/pgsql-general/2007-07/msg00888.php

Maybe Marcelo had resolve the problem, can you tell me your
configuration of tsearch2?



best regards

PD I need to resolve it for my work

Re: Tsearch2 - spanish

From
Teodor Sigaev
Date:
>         prueba=# select to_tsvector('espanol','melón');
>         ERROR:  Affix parse error at 506 line
and
>         prueba=# select lexize('sp','melón');
>          lexize
>         ---------
>          {melon}
>         (1 row)

Looks very strange, can you provide list of dictionaries and configuration map?

> I tried many dictionaries with the same results. Also I change the
> codeset of files :aff and dict (from "latin1 to utf8" and "utf8 to
> iso88591") and got the same error
>
> where  can I investigate for resolve about this problem?
>
> My dictionary at 506 line had:
Where do you take this file? And what is encdoing/locale setting of your db?

--
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/

Re: Tsearch2 - spanish

From
Felipe de Jesús Molina Bravo
Date:
Hi

You are rigth, the output of "show lc_ctype;" is C.

Then I did is:

prueba1=# show lc_ctype;
    lc_ctype
-----------------
 es_MX.ISO8859-1
(1 row)

and do it

 % initdb -D /YOUR/PATH -E LATIN1 --locale es_ES.ISO8859-1

(how you do say)

and "createdb -E iso8859-1 prueba1" and finally tsearch2

the original problem is resolved

prueba1=# select to_tsvector('espanol','melón');
 to_tsvector
-------------
 'melón':1
(1 row)


but if I change the sentece for it:

prueba1=# select to_tsvector('espanol','melón  perro mordelón');
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!>


??? lost the connection ... the server is up .... any idea?

The synonym is intentional


thanks in advanced


El mar, 18-09-2007 a las 21:40 +0400, Teodor Sigaev escribió:
> >         LC_CTYPE="POSIX"
>
>
> pls, output of "show lc_ctype;" command. If it's C locale then I can identify
> problem - characters diacritical mark (as ó) is not an alpha character, and
> ispell dictionary will fail. To fix that you should run initdb with options:
> % initdb -D /YOUR/PATH -E LATIN1 --locale es_ES.ISO8859-1
> or
> % initdb -D /YOUR/PATH -E UTF8 --locale es_ES.UTF8
>
> In last case you should also recode all dictionary's datafile in utf8 encoding.
>
> >>>         prueba=# select to_tsvector('espanol','melón');
> >>>         ERROR:  Affix parse error at 506 line
> >> and
> >>>         prueba=# select lexize('sp','melón');
> >>>          lexize
> >>>         ---------
> >>>          {melon}
> >>>         (1 row)
> sp is a Snowball stemmer, it doesn't require affix file, so it works.
>
> By the way, why is synonym dictionary paced after ispell? is it intentional?
> Usually, synonym dictionary goes first, then ispell and after all of them snowball.
>

Re: Tsearch2 - spanish

From
Teodor Sigaev
Date:
> prueba1=# select to_tsvector('espanol','melón  perro mordelón');
> server closed the connection unexpectedly
>         This probably means the server terminated abnormally
>         before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.
> !>
>

Hmm, can you provide backtrace?

--
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/

Re: Tsearch2 - spanish

From
marcelo Cortez
Date:
Felipe

--- Felipe de Jesús Molina Bravo
<felipe.molina@inegi.gob.mx> escribió:

> Hi
>
> You are rigth, the output of "show lc_ctype;" is C.
>
> Then I did is:
>
> prueba1=# show lc_ctype;
>     lc_ctype
> -----------------
>  es_MX.ISO8859-1
> (1 row)
>
> and do it
>
>  % initdb -D /YOUR/PATH -E LATIN1 --locale
> es_ES.ISO8859-1
>
> (how you do say)
>
> and "createdb -E iso8859-1 prueba1" and finally
> tsearch2
>
> the original problem is resolved
>
> prueba1=# select to_tsvector('espanol','melón');
>  to_tsvector
> -------------
>  'melón':1
> (1 row)
>
>
> but if I change the sentece for it:
>
> prueba1=# select to_tsvector('espanol','melón  perro
> mordelón');
> server closed the connection unexpectedly
>         This probably means the server terminated
> abnormally
>         before or while processing the request.
> The connection to the server was lost. Attempting
> reset: Failed.
> !>

 The same thing he same thing happened my to me at
first time with
 Tsearch2 - spanish , i think you need
 patch snowball with tsearch_snowball_82 file ,
googling
 you find instructions how doit .
 best regards
 mdc
>
>
> ??? lost the connection ... the server is up ....
> any idea?
>
> The synonym is intentional
>
>
> thanks in advanced
>
>
> El mar, 18-09-2007 a las 21:40 +0400, Teodor Sigaev
> escribió:
> > >         LC_CTYPE="POSIX"
> >
> >
> > pls, output of "show lc_ctype;" command. If it's C
> locale then I can identify
> > problem - characters diacritical mark (as ó) is
> not an alpha character, and
> > ispell dictionary will fail. To fix that you
> should run initdb with options:
> > % initdb -D /YOUR/PATH -E LATIN1 --locale
> es_ES.ISO8859-1
> > or
> > % initdb -D /YOUR/PATH -E UTF8 --locale es_ES.UTF8
> >
> > In last case you should also recode all
> dictionary's datafile in utf8 encoding.
> >
> > >>>         prueba=# select
> to_tsvector('espanol','melón');
> > >>>         ERROR:  Affix parse error at 506 line
> > >> and
> > >>>         prueba=# select lexize('sp','melón');
> > >>>          lexize
> > >>>         ---------
> > >>>          {melon}
> > >>>         (1 row)
> > sp is a Snowball stemmer, it doesn't require affix
> file, so it works.
> >
> > By the way, why is synonym dictionary paced after
> ispell? is it intentional?
> > Usually, synonym dictionary goes first, then
> ispell and after all of them snowball.
> >
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please
> send an appropriate
>        subscribe-nomail command to
> majordomo@postgresql.org so that your
>        message can get through to the mailing list
> cleanly
>



      Seguí de cerca a la Selección Argentina de Rugby en el Mundial de Francia 2007.
http://ar.sports.yahoo.com/mundialderugby

Re: Tsearch2 - spanish

From
"MOLINA BRAVO FELIPE DE JESUS"
Date:
Hi

Thank's Teodor and Marcelo

the problem is solved

regards


-----Mensaje original-----
De: marcelo Cortez [mailto:jmdc_marcelo@yahoo.com.ar]
Enviado el: jue 20/09/2007 7:13
Para: MOLINA BRAVO FELIPE DE JESUS; Teodor Sigaev
CC: PostgreSQL General
Asunto: Re: [GENERAL] Tsearch2 - spanish

Felipe

--- Felipe de Jesús Molina Bravo
<felipe.molina@inegi.gob.mx> escribió:

> Hi
>
> You are rigth, the output of "show lc_ctype;" is C.
>
> Then I did is:
>
> prueba1=# show lc_ctype;
>     lc_ctype
> -----------------
>  es_MX.ISO8859-1
> (1 row)
>
> and do it
>
>  % initdb -D /YOUR/PATH -E LATIN1 --locale
> es_ES.ISO8859-1
>
> (how you do say)
>
> and "createdb -E iso8859-1 prueba1" and finally
> tsearch2
>
> the original problem is resolved
>
> prueba1=# select to_tsvector('espanol','melón');
>  to_tsvector
> -------------
>  'melón':1
> (1 row)
>
>
> but if I change the sentece for it:
>
> prueba1=# select to_tsvector('espanol','melón  perro
> mordelón');
> server closed the connection unexpectedly
>         This probably means the server terminated
> abnormally
>         before or while processing the request.
> The connection to the server was lost. Attempting
> reset: Failed.
> !>

 The same thing he same thing happened my to me at
first time with
 Tsearch2 - spanish , i think you need
 patch snowball with tsearch_snowball_82 file ,
googling
 you find instructions how doit .
 best regards
 mdc
>
>
> ??? lost the connection ... the server is up ....
> any idea?
>
> The synonym is intentional
>
>
> thanks in advanced
>
>
> El mar, 18-09-2007 a las 21:40 +0400, Teodor Sigaev
> escribió:
> > >         LC_CTYPE="POSIX"
> >
> >
> > pls, output of "show lc_ctype;" command. If it's C
> locale then I can identify
> > problem - characters diacritical mark (as ó) is
> not an alpha character, and
> > ispell dictionary will fail. To fix that you
> should run initdb with options:
> > % initdb -D /YOUR/PATH -E LATIN1 --locale
> es_ES.ISO8859-1
> > or
> > % initdb -D /YOUR/PATH -E UTF8 --locale es_ES.UTF8
> >
> > In last case you should also recode all
> dictionary's datafile in utf8 encoding.
> >
> > >>>         prueba=# select
> to_tsvector('espanol','melón');
> > >>>         ERROR:  Affix parse error at 506 line
> > >> and
> > >>>         prueba=# select lexize('sp','melón');
> > >>>          lexize
> > >>>         ---------
> > >>>          {melon}
> > >>>         (1 row)
> > sp is a Snowball stemmer, it doesn't require affix
> file, so it works.
> >
> > By the way, why is synonym dictionary paced after
> ispell? is it intentional?
> > Usually, synonym dictionary goes first, then
> ispell and after all of them snowball.
> >
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please
> send an appropriate
>        subscribe-nomail command to
> majordomo@postgresql.org so that your
>        message can get through to the mailing list
> cleanly
>



      Seguí de cerca a la Selección Argentina de Rugby en el Mundial de Francia 2007.
http://ar.sports.yahoo.com/mundialderugby


How to clear bits?

From
"madhtr"
Date:
Hello group :)

How do a clear bits in a number in PostGreSQL?

in c++ its:

0xffffff00 &~ 0x0000ffff

what is it in PostGreSQL from the psql command line app?

select ...

Thanx:)


Re: How to clear bits?

From
"madhtr"
Date:
nevermind, I figured it out ...

fails:

0xffffff00 &~ 0x0000ffff

succeeds:

0xffffff00 & ~ 0x0000ffff

I had to add a space.




----- Original Message -----
From: "madhtr" <madhtr@schif.org>
To: "PostgreSQL General" <pgsql-general@postgresql.org>
Sent: Thursday, September 20, 2007 13:01
Subject: [GENERAL] How to clear bits?


> Hello group :)
>
> How do a clear bits in a number in PostGreSQL?
>
> in c++ its:
>
> 0xffffff00 &~ 0x0000ffff
>
> what is it in PostGreSQL from the psql command line app?
>
> select ...
>
> Thanx:)
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>       choose an index scan if your joining column's datatypes do not
>       match