Re: tsearch parser inefficiency if text includes urls or emails - new version - Mailing list pgsql-hackers

From Kevin Grittner
Subject Re: tsearch parser inefficiency if text includes urls or emails - new version
Date
Msg-id 4B20D4F1020000250002D2F1@gw.wicourts.gov
Whole thread Raw
In response to Re: tsearch parser inefficiency if text includes urls or emails - new version  (Andres Freund <andres@anarazel.de>)
Responses Re: tsearch parser inefficiency if text includes urls or emails - new version  ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
List pgsql-hackers
Andres Freund <andres@anarazel.de> wrote:
> I think you see no real benefit, because your strings are rather
> short - the documents I scanned when noticing the issue where
> rather long.
The document I used in the test which showed the regression was
672,585 characters, containing 10,000 URLs.
> A rather extreme/contrived example:
> postgres=# SELECT 1 FROM to_tsvector(array_to_string(ARRAY(SELECT 
> 'andres@anarazel.de http://www.postgresql.org/'::text FROM 
> generate_series(1, 
> 20000) g(i)), ' -  '));
The most extreme of your examples uses a 979,996 character string,
which is less than 50% larger than my test.  I am, however, able to
see the performance difference for this particular example, so I now
have something to work with.  I'm seeing some odd behavior in terms
of when there is what sort of difference.  Once I can categorize it
better, I'll follow up.
Thanks for the sample which shows the difference.
-Kevin


pgsql-hackers by date:

Previous
From: Ron Mayer
Date:
Subject: Re: explain output infelicity in psql
Next
From: Andrew Dunstan
Date:
Subject: Re: explain output infelicity in psql