Re: Simplifying Text Search - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Simplifying Text Search
Date
Msg-id 1194936519.2644.261.camel@ebony.site
Whole thread Raw
In response to Re: Simplifying Text Search  (Bruce Momjian <bruce@momjian.us>)
Responses Re: Simplifying Text Search  ("Pavel Stehule" <pavel.stehule@gmail.com>)
List pgsql-hackers
On Mon, 2007-11-12 at 23:03 -0500, Bruce Momjian wrote:
> Simon Riggs wrote:
> > On Mon, 2007-11-12 at 11:56 -0500, Tom Lane wrote:
> > > Simon Riggs <simon@2ndquadrant.com> writes:
> > > > So we end up with a normal sounding function that is overloaded to
> > > > provide all of the various goodies.
> > > 
> > > As best I can tell, @@ does exactly this already.  This is just a
> > > different spelling of the same capability, and I don't actually
> > > find it better.  Why is "text_search(x,y)" better than "x @@ y"?
> > > We don't recommend that people write "texteq(x,y)" instead of
> > > "x = y".
> > 
> > Most people don't understand those differences. x = y means "make sure
> > they are the same" to most people. They don't see what you (and I) see:
> > function and operator interchangeability. So text_search() is better
> > than @@ and = is better than texteq(). Life ain't neat...
> > 
> > Right now, Full Text Search SQL looks like complete gibberish and it
> > dissuades many people from using what is an awesome set of features. I
> > just want to add a little sugar to help people get started.
> 
> I realized this when editing the documentation but not clearly.  I
> noticed that:
> 
>     http://momjian.us/main/writings/pgsql/sgml/textsearch-intro.html#TEXTSEARCH-MATCHING
> 
>     tsvector @@ tsquery
>     tsquery  @@ tsvector
>     text @@ tsquery
>     text @@ text
> 
>     The first two of these we saw already. The form text @@ tsquery  is
>     equivalent to to_tsvector(x) @@ y. The form text @@ text  is equivalent
>     to to_tsvector(x) @@ plainto_tsquery(y).
> 
> was quite odd, especially the "text @@ text" case, and in fact it makes
> casting almost required unless you can remember which one is a query and
> which is a vector (hint, the vector is first).  What really adds to the
> confusion is that the operator is two _identical_ characters, meaning
> the operator is symetric, and it behave symetric if you cast one side,
> but as vector @@ query if you don't.

I'm thinking we can have an inlinable function

contains(text, text) returns int 

Return values limited to just 0 or 1 or NULL, as with SQL/MM.
It's close to SQL/MM, but not exact.

contains(sourceText, searchText) is a macro for

case to_tsvector(default_text_search_config, sourceText) @@
to_tsquery(default_text_search_config, searchText)
when true then 1
when false then 0
else null
end

that allows us to write indexable queries like this

WHERE contains(sourceText, searchText) > 0

where we must still have built the index on a constant config.
Not checked that still works yet, maybe not, in which case something
slightly more complex to make sure its still indexable. This is the
difficult part.

So changes are:
- add SQL function
- simplify first 2 pages of docs using this function

--  Simon Riggs 2ndQuadrant  http://www.2ndQuadrant.com



pgsql-hackers by date:

Previous
From: Christopher Browne
Date:
Subject: Re: How to keep a table in memory?
Next
From: "Pavel Stehule"
Date:
Subject: Re: Simplifying Text Search