Home > mailing lists

Re: tsearch parser inefficiency if text includes urls or emails - new version - Mailing list pgsql-hackers

From	Kevin Grittner
Subject	Re: tsearch parser inefficiency if text includes urls or emails - new version
Date	December 10, 2009 13:01:24
Msg-id	4B20D4F1020000250002D2F1@gw.wicourts.gov Whole thread Raw
In response to	Re: tsearch parser inefficiency if text includes urls or emails - new version (Andres Freund <andres@anarazel.de>)
Responses	Re: tsearch parser inefficiency if text includes urls or emails - new version
List	pgsql-hackers

Tree view

Andres Freund <andres@anarazel.de> wrote:
> I think you see no real benefit, because your strings are rather
> short - the documents I scanned when noticing the issue where
> rather long.
The document I used in the test which showed the regression was
672,585 characters, containing 10,000 URLs.
> A rather extreme/contrived example:
> postgres=# SELECT 1 FROM to_tsvector(array_to_string(ARRAY(SELECT 
> 'andres@anarazel.de http://www.postgresql.org/'::text FROM 
> generate_series(1, 
> 20000) g(i)), ' -  '));
The most extreme of your examples uses a 979,996 character string,
which is less than 50% larger than my test.  I am, however, able to
see the performance difference for this particular example, so I now
have something to work with.  I'm seeing some odd behavior in terms
of when there is what sort of difference.  Once I can categorize it
better, I'll follow up.
Thanks for the sample which shows the difference.
-Kevin

pgsql-hackers by date:

From: Ron Mayer
Date: 10 December 2009, 12:44:31
Subject: Re: explain output infelicity in psql

From: Andrew Dunstan
Date: 10 December 2009, 13:07:22
Subject: Re: explain output infelicity in psql

Re: tsearch parser inefficiency if text includes urls or emails - new version - Mailing list pgsql-hackers

Previous

Next