Thread: tsearch2 column update produces "word too long" error

tsearch2 column update produces "word too long" error

From
"Markus Wollny"
Date:
Hello!

I'm currently testing deployment of tsearch2 on our forum table. The
table is huge in itself - some 2GB of data without the indexes. I have
got PostgreSQL 7.4RC2 running on a test machine, installed tsearch2 to
my database, added the new column to the table and tried to update it in
the recommended fashion:

UPDATE ct_com_board_message
            SET ftindex=to_tsvector('default',coalesce(user_login,'')
||' '|| coalesce(title,'') ||' '|| coalesce(text,''));

It does run for a while but at some point I get "ERROR:  word is too
long"; I guess that this is caused by some idiot user(s) writing some
Joycean nonsense (but most probably without the literary value) or
drivelling about their holidays in
Llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch (Wales). Now
what could I do in order to intercept this error?

Kind regards

    Markus

Re: tsearch2 column update produces "word too long" error

From
Oleg Bartunov
Date:
On Fri, 21 Nov 2003, Markus Wollny wrote:

> Hello!
>
> I'm currently testing deployment of tsearch2 on our forum table. The
> table is huge in itself - some 2GB of data without the indexes. I have
> got PostgreSQL 7.4RC2 running on a test machine, installed tsearch2 to
> my database, added the new column to the table and tried to update it in
> the recommended fashion:
>
> UPDATE ct_com_board_message
>             SET ftindex=to_tsvector('default',coalesce(user_login,'')
> ||' '|| coalesce(title,'') ||' '|| coalesce(text,''));
>
> It does run for a while but at some point I get "ERROR:  word is too
> long"; I guess that this is caused by some idiot user(s) writing some
> Joycean nonsense (but most probably without the literary value) or
> drivelling about their holidays in
> Llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch (Wales). Now
> what could I do in order to intercept this error?

Word length is limited by 2K. What's exactly the word  tsearch2 complained on ?
'Llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch' is fine :)

btw, don't forget to configure properly dictionaries, so you don't have
a lot of unique words.

>
> Kind regards
>
>     Markus
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
>                http://archives.postgresql.org
>

    Regards,
        Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

Re: tsearch2 column update produces "word too long" error

From
"Markus Wollny"
Date:
Hello!

> Von: Oleg Bartunov [mailto:oleg@sai.msu.su]
> Gesendet: Freitag, 21. November 2003 13:06
> An: Markus Wollny
> Cc: pgsql-general@postgresql.org
>
> Word length is limited by 2K. What's exactly the word
> tsearch2 complained on ?
> 'Llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch'
> is fine :)

This was a silly example, I know - it is a long word, but not too long
to worry a machine. The offending word will surely be much longer, but
as a matter of fact, I cannot think of any user actually typing a 2k+
string without any spaces in between. I'm not sure on which word
tsearch2 complained, it doesn't tell and even logging did not provide me
with any more detail:

2003-11-21 14:06:44 [26497] ERROR:  42601: word is too long
LOCATION:  parsetext_v2, ts_cfg.c:294
STATEMENT:  UPDATE ct_com_board_message
                    SET
ftindex=to_tsvector('default',coalesce(user_login,'') ||' '||
coalesce(title,'') ||' '|| coalesce(text,''));

Is there some way to find the exact position?

> btw, don't forget to configure properly dictionaries, so you
> don't have a lot of unique words.

I won't forget that; I justed wanted to run a quick-off first test
before diving deeper into Ispell and other issues which are as yet a bit
of a mystery to me.

Kind Regards

    Markus

Re: tsearch2 column update produces "word too long"

From
Oleg Bartunov
Date:
On Fri, 21 Nov 2003, Markus Wollny wrote:

> Hello!
>
> > Von: Oleg Bartunov [mailto:oleg@sai.msu.su]
> > Gesendet: Freitag, 21. November 2003 13:06
> > An: Markus Wollny
> > Cc: pgsql-general@postgresql.org
> >
> > Word length is limited by 2K. What's exactly the word
> > tsearch2 complained on ?
> > 'Llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch'
> > is fine :)
>
> This was a silly example, I know - it is a long word, but not too long
> to worry a machine. The offending word will surely be much longer, but
> as a matter of fact, I cannot think of any user actually typing a 2k+
> string without any spaces in between. I'm not sure on which word
> tsearch2 complained, it doesn't tell and even logging did not provide me
> with any more detail:
>
> 2003-11-21 14:06:44 [26497] ERROR:  42601: word is too long
> LOCATION:  parsetext_v2, ts_cfg.c:294
> STATEMENT:  UPDATE ct_com_board_message
>                     SET
> ftindex=to_tsvector('default',coalesce(user_login,'') ||' '||
> coalesce(title,'') ||' '|| coalesce(text,''));
>
> Is there some way to find the exact position?

I'm afraid you need to hack ts_cfg.c:294 yourself to print the word
which's bugging you :)

>
> > btw, don't forget to configure properly dictionaries, so you
> > don't have a lot of unique words.
>
> I won't forget that; I justed wanted to run a quick-off first test
> before diving deeper into Ispell and other issues which are as yet a bit
> of a mystery to me.
>
> Kind Regards
>
>     Markus
>

    Regards,
        Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83