Re: tsearch2 column update produces "word too - Mailing list pgsql-general
From | Oleg Bartunov |
---|---|
Subject | Re: tsearch2 column update produces "word too |
Date | |
Msg-id | Pine.GSO.4.58.0311251527070.6636@ra.sai.msu.su Whole thread Raw |
In response to | Re: tsearch2 column update produces "word too long"error ("Markus Wollny" <Markus.Wollny@computec.de>) |
List | pgsql-general |
Markus, thanks for your analyses ! I think we'll submit a patch to throw NOTICE and skip these useless words from indexing. Oleg On Mon, 24 Nov 2003, Markus Wollny wrote: > Hi! > > Now I really couldn't code C to save my life, but I managed to elicit > some more debugging info. It's still dumb-user-interaction as suspected, > but this is an issue I have to take into account as a basis; here's the > "patch" for ts_cfg.c: > > if (lenlemm >= MAXSTRLEN) > ereport(ERROR, > (errcode(ERRCODE_SYNTAX_ERROR), > ! errmsg("word is too long(%d): > %s",lenlemm,lemm))); > > Now when I try > > UPDATE ct_com_board_message > SET ftindex=to_tsvector('default',coalesce(user_login,'') ||' > '|| coalesce(title,'') ||' '|| coalesce(text,'')); > > I eventually get: > > ERROR: word is too long(2724): > jajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaja > jajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaja > jajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaja > jajajajajajjajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaj > ajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaj > ajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaj > ajajajajajajajajajajajjajajajajajajajajajajajajajajajajajajajajajajajaja > jajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaja > jajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaja > jajajajajajajajajajajajajajajajajjajajajajajajajajajajajajajajajajajajaj > ajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaj > ajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaj > ajajajajajajajajajajajajajajajajajajajajajajjajajajajajajajajajajajajaja > jajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaja > jajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaja > jajajajajajajajajajajajajajajajajajajajajajajajajajajajjajajajajajajajaj > ajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaj > ajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaj > ajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajjajaja > jajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaja > jajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaja > jajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaja > jajajjajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaj > ajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaj > ajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaj > ajajajajajajajajjajajajajajajajajajajajajajajajajajajajajajajajajajajaja > jajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaja > jajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaja > jajajajajajajajajajajajajajjajajajajajajajajajajajajajajajajajajajajajaj > ajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaj > ajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaj > ajajajajajajajajajajajajajajajajajajajjajajajajajajajajajajajajajajajaja > jajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaja > jajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaja > jajajajajajajajajajajajajajajajajajajajajajajajajjajajajajajajajajajajaj > ajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaj > ajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaj > ajajajajajajajajajajajajajajajajajajajajajajajajajajajajajaj > > This is a brightly shining example of utterly wanton user-stupidity, I > think: A 2k+ string of |:ja:|. Input like that cannot be helped, though > - if he'd been a bit more imaginative, he could have used a few dozen > "Llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch" in a row or > anything else; unfortunately there's no app that could automatically > whack a user if he's doing something stupid. > > But on the other hand I cannot think of any reason why crap like that > should be indexed in the first place. Therefore I would like to see some > sort of option allowing me to still use tsearch2 but actually > automatically excluding anything exceeding MAXSTRLEN - so the UPDATE > might throw a NOTICE (if anything at all) but still get on with the > rest. > > An alteration like that does however exceed my limited abilities with C > by far and I don't want to mess with something I do not fully understand > and then use that mess in a production environment. Is there a way to > get around this problem with oversized words? > > Kind regards > > Markus > > > > -----UrsprЭngliche Nachricht----- > > Von: Oleg Bartunov [mailto:oleg@sai.msu.su] > > Gesendet: Freitag, 21. November 2003 15:13 > > An: Markus Wollny > > Cc: pgsql-general@postgresql.org > > Betreff: Re: AW: [GENERAL] tsearch2 column update produces "word too > > long"error > > > > > > On Fri, 21 Nov 2003, Markus Wollny wrote: > > > > > Hello! > > > > > > > Von: Oleg Bartunov [mailto:oleg@sai.msu.su] > > > > Gesendet: Freitag, 21. November 2003 13:06 > > > > An: Markus Wollny > > > > Cc: pgsql-general@postgresql.org > > > > > > > > Word length is limited by 2K. What's exactly the word > > > > tsearch2 complained on ? > > > > 'Llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch' > > > > is fine :) > > > > > > This was a silly example, I know - it is a long word, but > > not too long > > > to worry a machine. The offending word will surely be much > > longer, but > > > as a matter of fact, I cannot think of any user actually > > typing a 2k+ > > > string without any spaces in between. I'm not sure on which word > > > tsearch2 complained, it doesn't tell and even logging did > > not provide me > > > with any more detail: > > > > > > 2003-11-21 14:06:44 [26497] ERROR: 42601: word is too long > > > LOCATION: parsetext_v2, ts_cfg.c:294 > > > STATEMENT: UPDATE ct_com_board_message > > > SET > > > ftindex=to_tsvector('default',coalesce(user_login,'') ||' '|| > > > coalesce(title,'') ||' '|| coalesce(text,'')); > > > > > > Is there some way to find the exact position? > > > > I'm afraid you need to hack ts_cfg.c:294 yourself to print the word > > which's bugging you :) > > > > > > > > > btw, don't forget to configure properly dictionaries, so you > > > > don't have a lot of unique words. > > > > > > I won't forget that; I justed wanted to run a quick-off first test > > > before diving deeper into Ispell and other issues which are > > as yet a bit > > > of a mystery to me. > > > > > > Kind Regards > > > > > > Markus > > > > > > > Regards, > > Oleg > > _____________________________________________________________ > > Oleg Bartunov, sci.researcher, hostmaster of AstroNet, > > Sternberg Astronomical Institute, Moscow University (Russia) > > Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ > > phone: +007(095)939-16-83, +007(095)939-23-83 > > > Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
pgsql-general by date: