Thread: Identifying which column matches a full text search
<div class="Section1"><p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">Hi all,</span><p class="MsoNormal"><spanstyle="font-size:10.0pt;font-family:"Courier New""> </span><p class="MsoNormal"><span style="font-size:10.0pt;font-family:"CourierNew"">The following example is given in the Postgres 8.3 manual regarding howto create a single ts_vector column for two existing columns:</span><p class="MsoNormal"><span style="font-size:10.0pt;font-family:"CourierNew""> </span><p class="MsoNormal" style="margin-left:36.0pt"><span style="font-size:10.0pt; font-family:"Courier New"">ALTER TABLE pgweb ADD COLUMN textsearchable_index_col tsvector;</span><p class="MsoNormal" style="margin-left:36.0pt"><spanstyle="font-size:10.0pt; font-family:"Courier New"">UPDATE pgweb SET textsearchable_index_col =</span><p class="MsoNormal" style="margin-left:36.0pt"><spanstyle="font-size:10.0pt; font-family:"Courier New""> to_tsvector('english', coalesce(title,'') || coalesce(body,''));</span><p class="MsoNormal"style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto; margin-left:36.0pt"><span style="font-size:12.0pt;font-family:"Times New Roman","serif"">Then we create a GIN index to speedup the search: </span><p class="MsoNormal" style="margin-left:36.0pt"><span style="font-size:10.0pt; font-family:"Courier New"">CREATE INDEX textsearch_idx ON pgweb USING gin(textsearchable_index_col);</span><p class="MsoNormal"style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto; margin-left:36.0pt"><span style="font-size:12.0pt;font-family:"Times New Roman","serif"">Now we are ready to perform a fastfull text search: </span><p class="MsoNormal" style="margin-left:36.0pt"><span style="font-size:10.0pt; font-family:"Courier New"">SELECT title</span><p class="MsoNormal" style="margin-left:36.0pt"><span style="font-size:10.0pt; font-family:"Courier New"">FROM pgweb</span><p class="MsoNormal" style="margin-left:36.0pt"><span style="font-size:10.0pt; font-family:"Courier New"">WHERE textsearchable_index_col @@ to_tsquery('create & table')</span><p class="MsoNormal"style="margin-left:36.0pt"><span style="font-size:10.0pt; font-family:"Courier New"">ORDER BY last_mod_date DESC LIMIT 10;</span><p class="MsoNormal"><span style="font-size:10.0pt;font-family:"CourierNew""> </span><p class="MsoNormal"><span style="font-size:10.0pt;font-family:"CourierNew"">Using this approach. Is there any way of retrieving which of the originaltwo columns the match was found in?</span><p class="MsoNormal"><span style="font-size:10.0pt;font-family:"CourierNew""> </span><p class="MsoNormal"><span style="font-size:10.0pt;font-family:"CourierNew"">Any help would be much appreciated,</span><p class="MsoNormal"><span style="font-size:10.0pt;font-family:"CourierNew"">Ryan</span></div>
Ryan Wallace wrote: > > UPDATE pgweb SET textsearchable_index_col = > to_tsvector('english', coalesce(title,'') || coalesce(body,'')); > WHERE textsearchable_index_col @@ to_tsquery('create & table') > Using this approach. Is there any way of retrieving which of the original > two columns the match was found in? Afraid not - you're not indexing two columns, you're indexing one: textsearchable_index_col. You can add up to four weights to a tsvector though, typically for title/body matching. See chapter 12.3 for details. Failing that, where I've had many (a dozen) different sources but want to search them all I've built a textsearch_blocks table with columns to identify the source and have triggers that keep it up to date. -- Richard Huxton Archonet Ltd
Richard Huxton wrote: > > Failing that, where I've had many (a dozen) different sources but want > to search them all I've built a textsearch_blocks table with columns to > identify the source and have triggers that keep it up to date. Once you've built the text search blocks table, how do you search it? Do you perform twelve separate queries or can you just do one? Ryan Ryan Wallace wrote: > > UPDATE pgweb SET textsearchable_index_col = > to_tsvector('english', coalesce(title,'') || coalesce(body,'')); > WHERE textsearchable_index_col @@ to_tsquery('create & table') > Using this approach. Is there any way of retrieving which of the original > two columns the match was found in? Afraid not - you're not indexing two columns, you're indexing one: textsearchable_index_col. You can add up to four weights to a tsvector though, typically for title/body matching. See chapter 12.3 for details. Failing that, where I've had many (a dozen) different sources but want to search them all I've built a textsearch_blocks table with columns to identify the source and have triggers that keep it up to date. -- Richard Huxton Archonet Ltd No virus found in this incoming message. Checked by AVG - http://www.avg.com Version: 8.0.138 / Virus Database: 270.5.6/1579 - Release Date: 7/29/2008 6:43 AM
Ryan Wallace wrote: > Richard Huxton wrote: >> Failing that, where I've had many (a dozen) different sources but want >> to search them all I've built a textsearch_blocks table with columns to >> identify the source and have triggers that keep it up to date. > > Once you've built the text search blocks table, how do you search it? Do you > perform > twelve separate queries or can you just do one? OK, you have a table something like: fulltext_blocks ( section varchar(32), itemid int4, words tsvector,them PRIMARY KEY (section, itemid) ) Now assume two of the things I search are "news" and "faqs". I'm assuming they've both got a simple serial pkey - if not, "itemid" above needs to be text and you'll have to cast. For each target table (news, faqs) add a trigger that updates fulltext_blocks appropriately. This can include weighting title and body of a news article. Then, search the fulltext_blocks table, optionally filtering by section. If you're going to have lots of results put the ids into a (perhapd temporary) results-table. Then join your results back to the original tables with the appropriate UNION (if you need to - it might be you fetch results one at a time elsewhere in your app). SELECT n.id, n.title, n.body FROM news n JOIN results r ON n.id=r.id WHERE r.section='news' UNION ALL SELECT f.id, f,question, f.answer FROM faqs f JOIN results r ON f.id=r.id WHERE r.section='faqs' ; You'll probably want to set ownership/permissions on the triggers / fulltext_blocks table so you can't accidentally update it directly. In mine I even had a "documents" section which relied on an external cron-driven script to strip the first 32k of text out of uploaded documents (pdf,word) in addition to user-supplied metadata (title, summary). Note - this is basically simulating what we could do if you could index a view. The fulltext_blocks table is nothing more than a materialised view. HTH -- Richard Huxton Archonet Ltd