Thread: Identifying which column matches a full text search

Identifying which column matches a full text search

From
Ryan Wallace
Date:
<div class="Section1"><p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">Hi all,</span><p
class="MsoNormal"><spanstyle="font-size:10.0pt;font-family:"Courier New""> </span><p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"CourierNew"">The following example is given in the Postgres 8.3 manual regarding
howto create a single ts_vector column for two existing columns:</span><p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"CourierNew""> </span><p class="MsoNormal" style="margin-left:36.0pt"><span
style="font-size:10.0pt;
font-family:"Courier New"">ALTER TABLE pgweb ADD COLUMN textsearchable_index_col tsvector;</span><p class="MsoNormal"
style="margin-left:36.0pt"><spanstyle="font-size:10.0pt; 
font-family:"Courier New"">UPDATE pgweb SET textsearchable_index_col =</span><p class="MsoNormal"
style="margin-left:36.0pt"><spanstyle="font-size:10.0pt; 
font-family:"Courier New"">     to_tsvector('english', coalesce(title,'') || coalesce(body,''));</span><p
class="MsoNormal"style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto; 
margin-left:36.0pt"><span style="font-size:12.0pt;font-family:"Times New Roman","serif"">Then we create a GIN index to
speedup the search: </span><p class="MsoNormal" style="margin-left:36.0pt"><span style="font-size:10.0pt; 
font-family:"Courier New"">CREATE INDEX textsearch_idx ON pgweb USING gin(textsearchable_index_col);</span><p
class="MsoNormal"style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto; 
margin-left:36.0pt"><span style="font-size:12.0pt;font-family:"Times New Roman","serif"">Now we are ready to perform a
fastfull text search: </span><p class="MsoNormal" style="margin-left:36.0pt"><span style="font-size:10.0pt; 
font-family:"Courier New"">SELECT title</span><p class="MsoNormal" style="margin-left:36.0pt"><span
style="font-size:10.0pt;
font-family:"Courier New"">FROM pgweb</span><p class="MsoNormal" style="margin-left:36.0pt"><span
style="font-size:10.0pt;
font-family:"Courier New"">WHERE textsearchable_index_col @@ to_tsquery('create & table')</span><p
class="MsoNormal"style="margin-left:36.0pt"><span style="font-size:10.0pt; 
font-family:"Courier New"">ORDER BY last_mod_date DESC LIMIT 10;</span><p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"CourierNew""> </span><p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"CourierNew"">Using this approach. Is there any way of retrieving which of the
originaltwo columns the match was found in?</span><p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"CourierNew""> </span><p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"CourierNew"">Any help would be much appreciated,</span><p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"CourierNew"">Ryan</span></div> 

Re: Identifying which column matches a full text search

From
Richard Huxton
Date:
Ryan Wallace wrote:
> 
> UPDATE pgweb SET textsearchable_index_col =
>      to_tsvector('english', coalesce(title,'') || coalesce(body,''));

> WHERE textsearchable_index_col @@ to_tsquery('create & table')

> Using this approach. Is there any way of retrieving which of the original
> two columns the match was found in?

Afraid not - you're not indexing two columns, you're indexing one: 
textsearchable_index_col.

You can add up to four weights to a tsvector though, typically for 
title/body matching. See chapter 12.3 for details.

Failing that, where I've had many (a dozen) different sources but want 
to search them all I've built a textsearch_blocks table with columns to 
identify the source and have triggers that keep it up to date.

--   Richard Huxton  Archonet Ltd


Re: Identifying which column matches a full text search

From
Ryan Wallace
Date:
Richard Huxton wrote:
>
> Failing that, where I've had many (a dozen) different sources but want 
> to search them all I've built a textsearch_blocks table with columns to 
> identify the source and have triggers that keep it up to date.

Once you've built the text search blocks table, how do you search it? Do you
perform
twelve separate queries or can you just do one?

Ryan

Ryan Wallace wrote:
> 
> UPDATE pgweb SET textsearchable_index_col =
>      to_tsvector('english', coalesce(title,'') || coalesce(body,''));

> WHERE textsearchable_index_col @@ to_tsquery('create & table')

> Using this approach. Is there any way of retrieving which of the original
> two columns the match was found in?

Afraid not - you're not indexing two columns, you're indexing one: 
textsearchable_index_col.

You can add up to four weights to a tsvector though, typically for 
title/body matching. See chapter 12.3 for details.

Failing that, where I've had many (a dozen) different sources but want 
to search them all I've built a textsearch_blocks table with columns to 
identify the source and have triggers that keep it up to date.

--   Richard Huxton  Archonet Ltd
No virus found in this incoming message.
Checked by AVG - http://www.avg.com 
Version: 8.0.138 / Virus Database: 270.5.6/1579 - Release Date: 7/29/2008
6:43 AM



Re: Identifying which column matches a full text search

From
Richard Huxton
Date:
Ryan Wallace wrote:
> Richard Huxton wrote:
>> Failing that, where I've had many (a dozen) different sources but want 
>> to search them all I've built a textsearch_blocks table with columns to 
>> identify the source and have triggers that keep it up to date.
> 
> Once you've built the text search blocks table, how do you search it? Do you
> perform
> twelve separate queries or can you just do one?

OK, you have a table something like:

fulltext_blocks (  section  varchar(32),  itemid   int4,  words    tsvector,them  PRIMARY KEY (section, itemid)
)

Now assume two of the things I search are "news" and "faqs". I'm 
assuming they've both got a simple serial pkey - if not, "itemid" above 
needs to be text and you'll have to cast.

For each target table (news, faqs) add a trigger that updates 
fulltext_blocks appropriately. This can include weighting title and body 
of a news article.

Then, search the fulltext_blocks table, optionally filtering by section. 
If you're going to have lots of results put the ids into a (perhapd 
temporary) results-table. Then join your results back to the original 
tables with the appropriate UNION (if you need to - it might be you 
fetch results one at a time elsewhere in your app).

SELECT n.id, n.title, n.body
FROM news n JOIN results r ON n.id=r.id
WHERE r.section='news'
UNION ALL
SELECT f.id, f,question, f.answer
FROM faqs f JOIN results r ON f.id=r.id
WHERE r.section='faqs'
;

You'll probably want to set ownership/permissions on the triggers / 
fulltext_blocks table so you can't accidentally update it directly.

In mine I even had a "documents" section which relied on an external 
cron-driven script to strip the first 32k of text out of uploaded 
documents (pdf,word) in addition to user-supplied metadata (title, summary).

Note - this is basically simulating what we could do if you could index 
a view. The fulltext_blocks table is nothing more than a materialised view.

HTH

--   Richard Huxton  Archonet Ltd