Thread: tsearch2 column update produces "word too long" error
Hello! I'm currently testing deployment of tsearch2 on our forum table. The table is huge in itself - some 2GB of data without the indexes. I have got PostgreSQL 7.4RC2 running on a test machine, installed tsearch2 to my database, added the new column to the table and tried to update it in the recommended fashion: UPDATE ct_com_board_message SET ftindex=to_tsvector('default',coalesce(user_login,'') ||' '|| coalesce(title,'') ||' '|| coalesce(text,'')); It does run for a while but at some point I get "ERROR: word is too long"; I guess that this is caused by some idiot user(s) writing some Joycean nonsense (but most probably without the literary value) or drivelling about their holidays in Llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch (Wales). Now what could I do in order to intercept this error? Kind regards Markus
On Fri, 21 Nov 2003, Markus Wollny wrote: > Hello! > > I'm currently testing deployment of tsearch2 on our forum table. The > table is huge in itself - some 2GB of data without the indexes. I have > got PostgreSQL 7.4RC2 running on a test machine, installed tsearch2 to > my database, added the new column to the table and tried to update it in > the recommended fashion: > > UPDATE ct_com_board_message > SET ftindex=to_tsvector('default',coalesce(user_login,'') > ||' '|| coalesce(title,'') ||' '|| coalesce(text,'')); > > It does run for a while but at some point I get "ERROR: word is too > long"; I guess that this is caused by some idiot user(s) writing some > Joycean nonsense (but most probably without the literary value) or > drivelling about their holidays in > Llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch (Wales). Now > what could I do in order to intercept this error? Word length is limited by 2K. What's exactly the word tsearch2 complained on ? 'Llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch' is fine :) btw, don't forget to configure properly dictionaries, so you don't have a lot of unique words. > > Kind regards > > Markus > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org > Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
Hello! > Von: Oleg Bartunov [mailto:oleg@sai.msu.su] > Gesendet: Freitag, 21. November 2003 13:06 > An: Markus Wollny > Cc: pgsql-general@postgresql.org > > Word length is limited by 2K. What's exactly the word > tsearch2 complained on ? > 'Llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch' > is fine :) This was a silly example, I know - it is a long word, but not too long to worry a machine. The offending word will surely be much longer, but as a matter of fact, I cannot think of any user actually typing a 2k+ string without any spaces in between. I'm not sure on which word tsearch2 complained, it doesn't tell and even logging did not provide me with any more detail: 2003-11-21 14:06:44 [26497] ERROR: 42601: word is too long LOCATION: parsetext_v2, ts_cfg.c:294 STATEMENT: UPDATE ct_com_board_message SET ftindex=to_tsvector('default',coalesce(user_login,'') ||' '|| coalesce(title,'') ||' '|| coalesce(text,'')); Is there some way to find the exact position? > btw, don't forget to configure properly dictionaries, so you > don't have a lot of unique words. I won't forget that; I justed wanted to run a quick-off first test before diving deeper into Ispell and other issues which are as yet a bit of a mystery to me. Kind Regards Markus
On Fri, 21 Nov 2003, Markus Wollny wrote: > Hello! > > > Von: Oleg Bartunov [mailto:oleg@sai.msu.su] > > Gesendet: Freitag, 21. November 2003 13:06 > > An: Markus Wollny > > Cc: pgsql-general@postgresql.org > > > > Word length is limited by 2K. What's exactly the word > > tsearch2 complained on ? > > 'Llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch' > > is fine :) > > This was a silly example, I know - it is a long word, but not too long > to worry a machine. The offending word will surely be much longer, but > as a matter of fact, I cannot think of any user actually typing a 2k+ > string without any spaces in between. I'm not sure on which word > tsearch2 complained, it doesn't tell and even logging did not provide me > with any more detail: > > 2003-11-21 14:06:44 [26497] ERROR: 42601: word is too long > LOCATION: parsetext_v2, ts_cfg.c:294 > STATEMENT: UPDATE ct_com_board_message > SET > ftindex=to_tsvector('default',coalesce(user_login,'') ||' '|| > coalesce(title,'') ||' '|| coalesce(text,'')); > > Is there some way to find the exact position? I'm afraid you need to hack ts_cfg.c:294 yourself to print the word which's bugging you :) > > > btw, don't forget to configure properly dictionaries, so you > > don't have a lot of unique words. > > I won't forget that; I justed wanted to run a quick-off first test > before diving deeper into Ispell and other issues which are as yet a bit > of a mystery to me. > > Kind Regards > > Markus > Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83