Re: Database design for separate tsearch table - Mailing list pgsql-general

From Ivan Sergio Borgonovo
Subject Re: Database design for separate tsearch table
Date
Msg-id 20081023010626.4ef704ac@dawn.webthatworks.it
Whole thread Raw
In response to Database design for separate tsearch table  (Mikkel Høgh <m@ooh.dk>)
Responses Re: Database design for separate tsearch table  (Mikkel Høgh <m@ooh.dk>)
List pgsql-general
On Thu, 23 Oct 2008 00:10:19 +0200
Mikkel Høgh <m@ooh.dk> wrote:

> Hi,

> I'm trying to make a module allowing Drupal to take advantage of
> PostgreSQL's excellent Full Text Search, aka. tsearch.

> Since this module will probably not become part of Drupal core
> right off the bat, I need to do this without modifying Drupal's
> own tables, so I've created a new one for the purpose,
> "tsearch_node".
>
> I'm pondering how best to do this. Basic minimum is a column which
> holds the foreign key (nid) to Drupal's node.
>
> I'm wondering whether to merge all the searchable stuff when
> creating the index or to have a separate tsvector column for each
> of the three commonly indexable things about Drupal content
> (nodes), namely the title, the body and taxonomy. Any insights?

weight them and you'll be able to search by field and "globally".

I didn't make any scientific test but I previously had something
like:

create table subtable (
  subtableid int,
  body text,
  ftidx tsvector
)
create table maintable (
  maintableid int,
  body text,
  subtableid int
  ftidx tsvector
);

but it seem that just searching on a tsvector in maintable
build up with

setweight(to_tsvector('pg_catalog.english',
coalesce(maintable.body,'')), 'A') || ' ' ||

setweight(to_tsvector('pg_catalog.english',
coalesce(subtable.body,'')), 'B')

is faster.

Beware of the difference between gist and gin indexes for
"restricted" weighted searches since with the latter you've to use
@@@

> Also, I figure it'd be a good idea to keep the language of the
> indexed content right there in the same table, since Drupals own
> node table specifies two-letter codes (en, da, pt, etc.) and as
> far as I can understand the PostgreSQL documentation, tsearch
> expects the full language name (english, danish, portuguese), so
> to use it in queries without having to do too much magic would
> require me to store the full name right there in the table, or am
> I mistaken?

You'll have to build up a tsvector for each language, so yeah it may
be useful to store the tsvector together with the language with
witch it was obtained.
If you don't tsearch won't work.
eg. "cane" with Italian tsearch becomes {can}, but when you search it
with English tsearch it remains {cane} so you won't find it.


--
Ivan Sergio Borgonovo
http://www.webthatworks.it


pgsql-general by date:

Previous
From: "Grzegorz Jaśkiewicz"
Date:
Subject: Re: join question
Next
From: "Jeff"
Date:
Subject: stackbuilder updates