Re: tsearch parser inefficiency if text includes urls or emails - new version - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: tsearch parser inefficiency if text includes urls or emails - new version |
Date | |
Msg-id | 200911251418.47137.andres@anarazel.de Whole thread Raw |
In response to | Re: tsearch parser inefficiency if text includes urls or emails - new version ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>) |
Responses |
Re: tsearch parser inefficiency if text includes
urls or emails - new version
|
List | pgsql-hackers |
On Saturday 14 November 2009 15:33:00 Kevin Grittner wrote: > Andres Freund wrote: > > On Saturday 14 November 2009 01:03:33 Kevin Grittner wrote: > >> It is in context format, applies cleanly, and passes "make check". > > > > Unfortunately the latter is not saying much - I had a bug there and > > it was not found by the regression tests. Perhaps I should take a > > stab and add at least some more... > > Sounds like a good idea. The one thing to avoid is anything with a > long enough run time to annoy those that run it many times a day. Hm. There actually are tests excercising the part where I had a bug... Strange. It was a bug involving uninitialized data so probably the regression tests where just "lucky". > >> It is in context format, applies cleanly, and passes "make check". > >> Next I read through the code, and have the same question that > >> Andres posed 12 days ago. His patch massively reduces the cost of > >> the parser recursively calling itself for some cases, and it seems > >> like the least invasive way to modify the parser to solve this > >> performance problem; but it does beg the question of why a state > >> machine like this should recursively call itself when it hits > >> certain states. > > I was wondering about that as well. I am not completely sure but to > > me it looks like its just done to reduce the amount of rules and > > states. > I'm assuming that's the reason, but didn't dig deep enough to be sure. > I suspect to be really sure, I'd have to set it up without the > recursion and see what breaks. I can't imagine it would be anything > which couldn't be fixed by adding enough states; but perhaps they ran > into something where these types would require so many new states that > the recursion seemed like the lesser of evils. This is similar to my understanding... > > I have to say that that code is not exactly clear and well > > documented... > Yeah. I was happy with the level of documentation that you added with > your new code, but what was there before is mighty thin. If you > gleaned enough information while working on it to feel comfortable > adding documentation anywhere else, that would be a good thing. It definitely would be a good thing. But that would definitely be seperate patch. But I fear my current leel of knowledge is sufficient and also I am not sure if I can make myself interested enough in that part. > So far the only vote is to proceed with the mitigation, which was my > preference, and apparently yours -- so I guess we're at 3 to 0 in > favor of that. I'll mark the patch as "Waiting on Author" so you can > add any comments and regression tests you feel are appropriate. > > By the way, I found one typo in the comments -- it should by useful, > not usefull. Ok, will update. > I liked what I saw so far, but need to spend more time desk-checking > for correctness, testing to confirm that it doesn't change results, > and confirming the performance improvement. Thanks again for your reviewing! Andres
pgsql-hackers by date: