Re: tsearch2 column update produces "word too long"error - Mailing list pgsql-general
From | Teodor Sigaev |
---|---|
Subject | Re: tsearch2 column update produces "word too long"error |
Date | |
Msg-id | 3FC3A3BD.9030709@sigaev.ru Whole thread Raw |
In response to | Re: tsearch2 column update produces "word too long"error ("Markus Wollny" <Markus.Wollny@computec.de>) |
List | pgsql-general |
Patch submitted to 7.5devel and REL7_4_STABLE Markus Wollny wrote: > Hi! > > Now I really couldn't code C to save my life, but I managed to elicit > some more debugging info. It's still dumb-user-interaction as suspected, > but this is an issue I have to take into account as a basis; here's the > "patch" for ts_cfg.c: > > if (lenlemm >= MAXSTRLEN) > ereport(ERROR, > (errcode(ERRCODE_SYNTAX_ERROR), > ! errmsg("word is too long(%d): > %s",lenlemm,lemm))); > > Now when I try > > UPDATE ct_com_board_message > SET ftindex=to_tsvector('default',coalesce(user_login,'') ||' > '|| coalesce(title,'') ||' '|| coalesce(text,'')); > > I eventually get: > > ERROR: word is too long(2724): > jajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaja > jajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaja > jajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaja > jajajajajajjajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaj > ajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaj > ajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaj > ajajajajajajajajajajajjajajajajajajajajajajajajajajajajajajajajajajajaja > jajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaja > jajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaja > jajajajajajajajajajajajajajajajajjajajajajajajajajajajajajajajajajajajaj > ajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaj > ajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaj > ajajajajajajajajajajajajajajajajajajajajajajjajajajajajajajajajajajajaja > jajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaja > jajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaja > jajajajajajajajajajajajajajajajajajajajajajajajajajajajjajajajajajajajaj > ajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaj > ajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaj > ajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajjajaja > jajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaja > jajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaja > jajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaja > jajajjajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaj > ajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaj > ajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaj > ajajajajajajajajjajajajajajajajajajajajajajajajajajajajajajajajajajajaja > jajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaja > jajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaja > jajajajajajajajajajajajajajjajajajajajajajajajajajajajajajajajajajajajaj > ajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaj > ajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaj > ajajajajajajajajajajajajajajajajajajajjajajajajajajajajajajajajajajajaja > jajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaja > jajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaja > jajajajajajajajajajajajajajajajajajajajajajajajajjajajajajajajajajajajaj > ajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaj > ajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaj > ajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaj > > This is a brightly shining example of utterly wanton user-stupidity, I > think: A 2k+ string of |:ja:|. Input like that cannot be helped, though > - if he'd been a bit more imaginative, he could have used a few dozen > "Llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch" in a row or > anything else; unfortunately there's no app that could automatically > whack a user if he's doing something stupid. > > But on the other hand I cannot think of any reason why crap like that > should be indexed in the first place. Therefore I would like to see some > sort of option allowing me to still use tsearch2 but actually > automatically excluding anything exceeding MAXSTRLEN - so the UPDATE > might throw a NOTICE (if anything at all) but still get on with the > rest. > > An alteration like that does however exceed my limited abilities with C > by far and I don't want to mess with something I do not fully understand > and then use that mess in a production environment. Is there a way to > get around this problem with oversized words? > > Kind regards > > Markus > > > >>-----Ursprüngliche Nachricht----- >>Von: Oleg Bartunov [mailto:oleg@sai.msu.su] >>Gesendet: Freitag, 21. November 2003 15:13 >>An: Markus Wollny >>Cc: pgsql-general@postgresql.org >>Betreff: Re: AW: [GENERAL] tsearch2 column update produces "word too >>long"error >> >> >>On Fri, 21 Nov 2003, Markus Wollny wrote: >> >> >>>Hello! >>> >>> >>>>Von: Oleg Bartunov [mailto:oleg@sai.msu.su] >>>>Gesendet: Freitag, 21. November 2003 13:06 >>>>An: Markus Wollny >>>>Cc: pgsql-general@postgresql.org >>>> >>>>Word length is limited by 2K. What's exactly the word >>>>tsearch2 complained on ? >>>>'Llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch' >>>>is fine :) >>> >>>This was a silly example, I know - it is a long word, but >> >>not too long >> >>>to worry a machine. The offending word will surely be much >> >>longer, but >> >>>as a matter of fact, I cannot think of any user actually >> >>typing a 2k+ >> >>>string without any spaces in between. I'm not sure on which word >>>tsearch2 complained, it doesn't tell and even logging did >> >>not provide me >> >>>with any more detail: >>> >>>2003-11-21 14:06:44 [26497] ERROR: 42601: word is too long >>>LOCATION: parsetext_v2, ts_cfg.c:294 >>>STATEMENT: UPDATE ct_com_board_message >>> SET >>>ftindex=to_tsvector('default',coalesce(user_login,'') ||' '|| >>>coalesce(title,'') ||' '|| coalesce(text,'')); >>> >>>Is there some way to find the exact position? >> >>I'm afraid you need to hack ts_cfg.c:294 yourself to print the word >>which's bugging you :) >> >> >>>>btw, don't forget to configure properly dictionaries, so you >>>>don't have a lot of unique words. >>> >>>I won't forget that; I justed wanted to run a quick-off first test >>>before diving deeper into Ispell and other issues which are >> >>as yet a bit >> >>>of a mystery to me. >>> >>>Kind Regards >>> >>> Markus >>> >> >> Regards, >> Oleg >>_____________________________________________________________ >>Oleg Bartunov, sci.researcher, hostmaster of AstroNet, >>Sternberg Astronomical Institute, Moscow University (Russia) >>Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ >>phone: +007(095)939-16-83, +007(095)939-23-83 >> > > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org -- Teodor Sigaev E-mail: teodor@sigaev.ru
pgsql-general by date: