Re: tsearch parser inefficiency if text includes urls or emails - new version - Mailing list pgsql-hackers

From Andres Freund
Subject Re: tsearch parser inefficiency if text includes urls or emails - new version
Date
Msg-id 200911251418.47137.andres@anarazel.de
Whole thread Raw
In response to Re: tsearch parser inefficiency if text includes urls or emails - new version  ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
Responses Re: tsearch parser inefficiency if text includes urls or emails - new version
List pgsql-hackers
On Saturday 14 November 2009 15:33:00 Kevin Grittner wrote:
> Andres Freund  wrote:
> 
> On Saturday 14 November 2009 01:03:33 Kevin Grittner wrote:
> >> It is in context format, applies cleanly, and passes "make check".
> >
> > Unfortunately the latter is not saying much - I had a bug there and
> > it was not found by the regression tests. Perhaps I should take a
> > stab and add at least some more...
> 
> Sounds like a good idea.  The one thing to avoid is anything with a
> long enough run time to annoy those that run it many times a day.
Hm. There actually are tests excercising the part where I had a bug... 
Strange.
It was a bug involving uninitialized data so probably the regression tests 
where just "lucky".

> >> It is in context format, applies cleanly, and passes "make check".
> >> Next I read through the code, and have the same question that
> >> Andres posed 12 days ago. His patch massively reduces the cost of
> >> the parser recursively calling itself for some cases, and it seems
> >> like the least invasive way to modify the parser to solve this
> >> performance problem; but it does beg the question of why a state
> >> machine like this should recursively call itself when it hits
> >> certain states.
> > I was wondering about that as well. I am not completely sure but to
> > me it looks like its just done to reduce the amount of rules and
> > states.
> I'm assuming that's the reason, but didn't dig deep enough to be sure.
> I suspect to be really sure, I'd have to set it up without the
> recursion and see what breaks.  I can't imagine it would be anything
> which couldn't be fixed by adding enough states; but perhaps they ran
> into something where these types would require so many new states that
> the recursion seemed like the lesser of evils.
This is similar to my understanding...

> > I have to say that that code is not exactly clear and well
> > documented...
> Yeah.  I was happy with the level of documentation that you added with
> your new code, but what was there before is mighty thin.  If you
> gleaned enough information while working on it to feel comfortable
> adding documentation anywhere else, that would be a good thing.
It definitely would be a good thing. But that would definitely be seperate 
patch. But I fear my current leel of knowledge is sufficient and also I am not 
sure if I can make myself interested enough in that part.

> So far the only vote is to proceed with the mitigation, which was my
> preference, and apparently yours -- so I guess we're at 3 to 0 in
> favor of that.  I'll mark the patch as "Waiting on Author" so you can
> add any comments and regression tests you feel are appropriate.
> 
> By the way, I found one typo in the comments -- it should by useful,
> not usefull.
Ok, will update.

> I liked what I saw so far, but need to spend more time desk-checking
> for correctness, testing to confirm that it doesn't change results,
> and confirming the performance improvement.
Thanks again for your reviewing!


Andres    


pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: garbage in psql -l
Next
From: Andres Freund
Date:
Subject: Re: Application name patch - v3