Re: english parser in text search: support for multiple words in the same position - Mailing list pgsql-hackers

From Markus Wanner
Subject Re: english parser in text search: support for multiple words in the same position
Date
Msg-id 4C567578.5030501@bluegap.ch
Whole thread Raw
In response to english parser in text search: support for multiple words in the same position  (Sushant Sinha <sushant354@gmail.com>)
Responses Re: english parser in text search: support for multiple words in the same position
List pgsql-hackers
Hi,

On 08/01/2010 08:04 PM, Sushant Sinha wrote:
> 1. We do not have separate tokens "wikipedia" and "org"
> 2. If we have the two tokens we should have them at adjacent position so
> that a phrase search for "wikipedia org" should work.

This would needlessly increase the number of tokens. Instead you'd 
better make it work like compound word support, having just "wikipedia" 
and "org" as tokens.

Searching for "wikipedia.org" or "wikipedia org" should then result in 
the same search query with the two tokens: "wikipedia" and "org".

> position 0: WORD(wikipedia), URL(wikipedia.org/search?q=sushant)

IMO the differentiation between WORDs and URLs is not something the text 
search engine should have to take care a lot. Let it just do the 
searching and make it do that well.

What does a token "wikipedia.org/search?q=sushant" buy you in terms of 
text searching? Or even result highlighting? I wouldn't expect anybody 
to want to search for a full URL, do you?

Regards

Markus Wanner


pgsql-hackers by date:

Previous
From: Hardik Belani
Date:
Subject: Postgres as Historian
Next
From: Pavel Stehule
Date:
Subject: Re: Initial review of xslt with no limits patch