Thread: Re: tsearch2 column update produces "word too long"error
Hi! Now I really couldn't code C to save my life, but I managed to elicit some more debugging info. It's still dumb-user-interaction as suspected, but this is an issue I have to take into account as a basis; here's the "patch" for ts_cfg.c: if (lenlemm >= MAXSTRLEN) ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), ! errmsg("word is too long(%d): %s",lenlemm,lemm))); Now when I try UPDATE ct_com_board_message SET ftindex=to_tsvector('default',coalesce(user_login,'') ||' '|| coalesce(title,'') ||' '|| coalesce(text,'')); I eventually get: ERROR: word is too long(2724): jajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaja jajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaja jajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaja jajajajajajjajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaj ajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaj ajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaj ajajajajajajajajajajajjajajajajajajajajajajajajajajajajajajajajajajajaja jajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaja jajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaja jajajajajajajajajajajajajajajajajjajajajajajajajajajajajajajajajajajajaj ajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaj ajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaj ajajajajajajajajajajajajajajajajajajajajajajjajajajajajajajajajajajajaja jajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaja jajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaja jajajajajajajajajajajajajajajajajajajajajajajajajajajajjajajajajajajajaj ajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaj ajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaj ajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajjajaja jajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaja jajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaja jajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaja jajajjajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaj ajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaj ajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaj ajajajajajajajajjajajajajajajajajajajajajajajajajajajajajajajajajajajaja jajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaja jajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaja jajajajajajajajajajajajajajjajajajajajajajajajajajajajajajajajajajajajaj ajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaj ajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaj ajajajajajajajajajajajajajajajajajajajjajajajajajajajajajajajajajajajaja jajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaja jajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaja jajajajajajajajajajajajajajajajajajajajajajajajajjajajajajajajajajajajaj ajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaj ajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaj ajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaj This is a brightly shining example of utterly wanton user-stupidity, I think: A 2k+ string of |:ja:|. Input like that cannot be helped, though - if he'd been a bit more imaginative, he could have used a few dozen "Llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch" in a row or anything else; unfortunately there's no app that could automatically whack a user if he's doing something stupid. But on the other hand I cannot think of any reason why crap like that should be indexed in the first place. Therefore I would like to see some sort of option allowing me to still use tsearch2 but actually automatically excluding anything exceeding MAXSTRLEN - so the UPDATE might throw a NOTICE (if anything at all) but still get on with the rest. An alteration like that does however exceed my limited abilities with C by far and I don't want to mess with something I do not fully understand and then use that mess in a production environment. Is there a way to get around this problem with oversized words? Kind regards Markus > -----Ursprüngliche Nachricht----- > Von: Oleg Bartunov [mailto:oleg@sai.msu.su] > Gesendet: Freitag, 21. November 2003 15:13 > An: Markus Wollny > Cc: pgsql-general@postgresql.org > Betreff: Re: AW: [GENERAL] tsearch2 column update produces "word too > long"error > > > On Fri, 21 Nov 2003, Markus Wollny wrote: > > > Hello! > > > > > Von: Oleg Bartunov [mailto:oleg@sai.msu.su] > > > Gesendet: Freitag, 21. November 2003 13:06 > > > An: Markus Wollny > > > Cc: pgsql-general@postgresql.org > > > > > > Word length is limited by 2K. What's exactly the word > > > tsearch2 complained on ? > > > 'Llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch' > > > is fine :) > > > > This was a silly example, I know - it is a long word, but > not too long > > to worry a machine. The offending word will surely be much > longer, but > > as a matter of fact, I cannot think of any user actually > typing a 2k+ > > string without any spaces in between. I'm not sure on which word > > tsearch2 complained, it doesn't tell and even logging did > not provide me > > with any more detail: > > > > 2003-11-21 14:06:44 [26497] ERROR: 42601: word is too long > > LOCATION: parsetext_v2, ts_cfg.c:294 > > STATEMENT: UPDATE ct_com_board_message > > SET > > ftindex=to_tsvector('default',coalesce(user_login,'') ||' '|| > > coalesce(title,'') ||' '|| coalesce(text,'')); > > > > Is there some way to find the exact position? > > I'm afraid you need to hack ts_cfg.c:294 yourself to print the word > which's bugging you :) > > > > > > btw, don't forget to configure properly dictionaries, so you > > > don't have a lot of unique words. > > > > I won't forget that; I justed wanted to run a quick-off first test > > before diving deeper into Ispell and other issues which are > as yet a bit > > of a mystery to me. > > > > Kind Regards > > > > Markus > > > > Regards, > Oleg > _____________________________________________________________ > Oleg Bartunov, sci.researcher, hostmaster of AstroNet, > Sternberg Astronomical Institute, Moscow University (Russia) > Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ > phone: +007(095)939-16-83, +007(095)939-23-83 >
Markus, thanks for your analyses ! I think we'll submit a patch to throw NOTICE and skip these useless words from indexing. Oleg On Mon, 24 Nov 2003, Markus Wollny wrote: > Hi! > > Now I really couldn't code C to save my life, but I managed to elicit > some more debugging info. It's still dumb-user-interaction as suspected, > but this is an issue I have to take into account as a basis; here's the > "patch" for ts_cfg.c: > > if (lenlemm >= MAXSTRLEN) > ereport(ERROR, > (errcode(ERRCODE_SYNTAX_ERROR), > ! errmsg("word is too long(%d): > %s",lenlemm,lemm))); > > Now when I try > > UPDATE ct_com_board_message > SET ftindex=to_tsvector('default',coalesce(user_login,'') ||' > '|| coalesce(title,'') ||' '|| coalesce(text,'')); > > I eventually get: > > ERROR: word is too long(2724): > jajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaja > jajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaja > jajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaja > jajajajajajjajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaj > ajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaj > ajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaj > ajajajajajajajajajajajjajajajajajajajajajajajajajajajajajajajajajajajaja > jajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaja > jajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaja > jajajajajajajajajajajajajajajajajjajajajajajajajajajajajajajajajajajajaj > ajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaj > ajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaj > ajajajajajajajajajajajajajajajajajajajajajajjajajajajajajajajajajajajaja > jajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaja > jajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaja > jajajajajajajajajajajajajajajajajajajajajajajajajajajajjajajajajajajajaj > ajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaj > ajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaj > ajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajjajaja > jajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaja > jajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaja > jajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaja > jajajjajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaj > ajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaj > ajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaj > ajajajajajajajajjajajajajajajajajajajajajajajajajajajajajajajajajajajaja > jajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaja > jajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaja > jajajajajajajajajajajajajajjajajajajajajajajajajajajajajajajajajajajajaj > ajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaj > ajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaj > ajajajajajajajajajajajajajajajajajajajjajajajajajajajajajajajajajajajaja > jajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaja > jajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaja > jajajajajajajajajajajajajajajajajajajajajajajajajjajajajajajajajajajajaj > ajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaj > ajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaj > ajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaj > > This is a brightly shining example of utterly wanton user-stupidity, I > think: A 2k+ string of |:ja:|. Input like that cannot be helped, though > - if he'd been a bit more imaginative, he could have used a few dozen > "Llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch" in a row or > anything else; unfortunately there's no app that could automatically > whack a user if he's doing something stupid. > > But on the other hand I cannot think of any reason why crap like that > should be indexed in the first place. Therefore I would like to see some > sort of option allowing me to still use tsearch2 but actually > automatically excluding anything exceeding MAXSTRLEN - so the UPDATE > might throw a NOTICE (if anything at all) but still get on with the > rest. > > An alteration like that does however exceed my limited abilities with C > by far and I don't want to mess with something I do not fully understand > and then use that mess in a production environment. Is there a way to > get around this problem with oversized words? > > Kind regards > > Markus > > > > -----UrsprЭngliche Nachricht----- > > Von: Oleg Bartunov [mailto:oleg@sai.msu.su] > > Gesendet: Freitag, 21. November 2003 15:13 > > An: Markus Wollny > > Cc: pgsql-general@postgresql.org > > Betreff: Re: AW: [GENERAL] tsearch2 column update produces "word too > > long"error > > > > > > On Fri, 21 Nov 2003, Markus Wollny wrote: > > > > > Hello! > > > > > > > Von: Oleg Bartunov [mailto:oleg@sai.msu.su] > > > > Gesendet: Freitag, 21. November 2003 13:06 > > > > An: Markus Wollny > > > > Cc: pgsql-general@postgresql.org > > > > > > > > Word length is limited by 2K. What's exactly the word > > > > tsearch2 complained on ? > > > > 'Llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch' > > > > is fine :) > > > > > > This was a silly example, I know - it is a long word, but > > not too long > > > to worry a machine. The offending word will surely be much > > longer, but > > > as a matter of fact, I cannot think of any user actually > > typing a 2k+ > > > string without any spaces in between. I'm not sure on which word > > > tsearch2 complained, it doesn't tell and even logging did > > not provide me > > > with any more detail: > > > > > > 2003-11-21 14:06:44 [26497] ERROR: 42601: word is too long > > > LOCATION: parsetext_v2, ts_cfg.c:294 > > > STATEMENT: UPDATE ct_com_board_message > > > SET > > > ftindex=to_tsvector('default',coalesce(user_login,'') ||' '|| > > > coalesce(title,'') ||' '|| coalesce(text,'')); > > > > > > Is there some way to find the exact position? > > > > I'm afraid you need to hack ts_cfg.c:294 yourself to print the word > > which's bugging you :) > > > > > > > > > btw, don't forget to configure properly dictionaries, so you > > > > don't have a lot of unique words. > > > > > > I won't forget that; I justed wanted to run a quick-off first test > > > before diving deeper into Ispell and other issues which are > > as yet a bit > > > of a mystery to me. > > > > > > Kind Regards > > > > > > Markus > > > > > > > Regards, > > Oleg > > _____________________________________________________________ > > Oleg Bartunov, sci.researcher, hostmaster of AstroNet, > > Sternberg Astronomical Institute, Moscow University (Russia) > > Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ > > phone: +007(095)939-16-83, +007(095)939-23-83 > > > Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
Patch submitted to 7.5devel and REL7_4_STABLE Markus Wollny wrote: > Hi! > > Now I really couldn't code C to save my life, but I managed to elicit > some more debugging info. It's still dumb-user-interaction as suspected, > but this is an issue I have to take into account as a basis; here's the > "patch" for ts_cfg.c: > > if (lenlemm >= MAXSTRLEN) > ereport(ERROR, > (errcode(ERRCODE_SYNTAX_ERROR), > ! errmsg("word is too long(%d): > %s",lenlemm,lemm))); > > Now when I try > > UPDATE ct_com_board_message > SET ftindex=to_tsvector('default',coalesce(user_login,'') ||' > '|| coalesce(title,'') ||' '|| coalesce(text,'')); > > I eventually get: > > ERROR: word is too long(2724): > jajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaja > jajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaja > jajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaja > jajajajajajjajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaj > ajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaj > ajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaj > ajajajajajajajajajajajjajajajajajajajajajajajajajajajajajajajajajajajaja > jajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaja > jajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaja > jajajajajajajajajajajajajajajajajjajajajajajajajajajajajajajajajajajajaj > ajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaj > ajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaj > ajajajajajajajajajajajajajajajajajajajajajajjajajajajajajajajajajajajaja > jajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaja > jajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaja > jajajajajajajajajajajajajajajajajajajajajajajajajajajajjajajajajajajajaj > ajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaj > ajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaj > ajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajjajaja > jajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaja > jajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaja > jajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaja > jajajjajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaj > ajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaj > ajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaj > ajajajajajajajajjajajajajajajajajajajajajajajajajajajajajajajajajajajaja > jajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaja > jajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaja > jajajajajajajajajajajajajajjajajajajajajajajajajajajajajajajajajajajajaj > ajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaj > ajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaj > ajajajajajajajajajajajajajajajajajajajjajajajajajajajajajajajajajajajaja > jajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaja > jajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaja > jajajajajajajajajajajajajajajajajajajajajajajajajjajajajajajajajajajajaj > ajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaj > ajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaj > ajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaj > > This is a brightly shining example of utterly wanton user-stupidity, I > think: A 2k+ string of |:ja:|. Input like that cannot be helped, though > - if he'd been a bit more imaginative, he could have used a few dozen > "Llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch" in a row or > anything else; unfortunately there's no app that could automatically > whack a user if he's doing something stupid. > > But on the other hand I cannot think of any reason why crap like that > should be indexed in the first place. Therefore I would like to see some > sort of option allowing me to still use tsearch2 but actually > automatically excluding anything exceeding MAXSTRLEN - so the UPDATE > might throw a NOTICE (if anything at all) but still get on with the > rest. > > An alteration like that does however exceed my limited abilities with C > by far and I don't want to mess with something I do not fully understand > and then use that mess in a production environment. Is there a way to > get around this problem with oversized words? > > Kind regards > > Markus > > > >>-----Ursprüngliche Nachricht----- >>Von: Oleg Bartunov [mailto:oleg@sai.msu.su] >>Gesendet: Freitag, 21. November 2003 15:13 >>An: Markus Wollny >>Cc: pgsql-general@postgresql.org >>Betreff: Re: AW: [GENERAL] tsearch2 column update produces "word too >>long"error >> >> >>On Fri, 21 Nov 2003, Markus Wollny wrote: >> >> >>>Hello! >>> >>> >>>>Von: Oleg Bartunov [mailto:oleg@sai.msu.su] >>>>Gesendet: Freitag, 21. November 2003 13:06 >>>>An: Markus Wollny >>>>Cc: pgsql-general@postgresql.org >>>> >>>>Word length is limited by 2K. What's exactly the word >>>>tsearch2 complained on ? >>>>'Llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch' >>>>is fine :) >>> >>>This was a silly example, I know - it is a long word, but >> >>not too long >> >>>to worry a machine. The offending word will surely be much >> >>longer, but >> >>>as a matter of fact, I cannot think of any user actually >> >>typing a 2k+ >> >>>string without any spaces in between. I'm not sure on which word >>>tsearch2 complained, it doesn't tell and even logging did >> >>not provide me >> >>>with any more detail: >>> >>>2003-11-21 14:06:44 [26497] ERROR: 42601: word is too long >>>LOCATION: parsetext_v2, ts_cfg.c:294 >>>STATEMENT: UPDATE ct_com_board_message >>> SET >>>ftindex=to_tsvector('default',coalesce(user_login,'') ||' '|| >>>coalesce(title,'') ||' '|| coalesce(text,'')); >>> >>>Is there some way to find the exact position? >> >>I'm afraid you need to hack ts_cfg.c:294 yourself to print the word >>which's bugging you :) >> >> >>>>btw, don't forget to configure properly dictionaries, so you >>>>don't have a lot of unique words. >>> >>>I won't forget that; I justed wanted to run a quick-off first test >>>before diving deeper into Ispell and other issues which are >> >>as yet a bit >> >>>of a mystery to me. >>> >>>Kind Regards >>> >>> Markus >>> >> >> Regards, >> Oleg >>_____________________________________________________________ >>Oleg Bartunov, sci.researcher, hostmaster of AstroNet, >>Sternberg Astronomical Institute, Moscow University (Russia) >>Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ >>phone: +007(095)939-16-83, +007(095)939-23-83 >> > > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org -- Teodor Sigaev E-mail: teodor@sigaev.ru