Re: tsearch parser inefficiency if text includes urls or emails - new version - Mailing list pgsql-hackers

From Andres Freund
Subject Re: tsearch parser inefficiency if text includes urls or emails - new version
Date
Msg-id 200912081626.11709.andres@anarazel.de
Whole thread Raw
In response to Re: tsearch parser inefficiency if text includes urls or emails - new version  ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
Responses Re: tsearch parser inefficiency if text includes urls or emails - new version  ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
List pgsql-hackers
On Tuesday 08 December 2009 16:23:11 Kevin Grittner wrote:
> I wrote:
> > Frankly, I'd be amazed if there was a performance regression,
> 
> OK, I'm amazed.  While it apparently helps some cases dramatically
> (Andres had a case where run time was reduced by 93.2%), I found a
> pretty routine case where run time was increased by 3.1%.  I tweaked
> the code and got that down to a 2.5% run time increase.  I'm having
> troubles getting it any lower than that.  And yes, this is real, not
> noise -- the slowest unpatched time for this test is faster than the
> fastest time with any version of the patch.  :-(
> 
> Andres, could you provide more information on the test which showed
> the dramatic improvement?  In particular, info on OS, CPU, character
> set, encoding scheme, and what kind of data was used for the test.
> 
> I'll do some more testing and try to figure out how the patch is
> slowing things down and post with details.
Could you show your testcase? I dont see why it could get slower?

I tested with various data, the one benefiting most was some changelog where 
each entry was signed by an email.

OS: Debian Sid, Core2 Duo, UTF-8, and I tried both C and de_DE.UTF8.

Andres


pgsql-hackers by date:

Previous
From: "Kevin Grittner"
Date:
Subject: Re: tsearch parser inefficiency if text includes urls or emails - new version
Next
From: Tom Lane
Date:
Subject: Re: YAML