Thread: Fulltext index

Fulltext index

From
Andreas Kraftl
Date:
Hello again,

my previous post wasn't answered. I think i told my question wrong :-).

I have a table like
a    | b
--------------------
'de' | Hallo welt
'en' | Hello world

How can I create a full text index over b?
CREATE INDEX idx ON table USING gin(
to_tsvector(
    case
        when a = 'de' then 'german'
        when a = 'en' then 'english'
        else 'english'
    end
), b);

This doesn't work. Error Message in german:
FEHLER: Zugriffsmethode »gin« unterstützt keine mehrspaltigen Indexe
SQL state: 0A000
means, gin doesn't accept multicolumn indexes.

Any ideas?

Greetings Andreas
--
Kraftl EDV - Dienstleistungen
Linux, Linuxschulungen, Webprogrammierung
Autofabrikstraße 16/6
1230 Wien


Re: Fulltext index

From
Ivan Sergio Borgonovo
Date:
On Sat, 08 Nov 2008 09:44:17 +0100
Andreas Kraftl <andreas.kraftl@kraftl.at> wrote:

> Hello again,

> my previous post wasn't answered. I think i told my question
> wrong :-).
>
> I have a table like
> a    | b
> --------------------
> 'de' | Hallo welt
> 'en' | Hello world
>
> How can I create a full text index over b?
> CREATE INDEX idx ON table USING gin(
> to_tsvector(
>     case
>         when a = 'de' then 'german'
>         when a = 'en' then 'english'
>         else 'english'
>     end
> ), b);
>
> This doesn't work. Error Message in german:
> FEHLER: Zugriffsmethode »gin« unterstützt keine mehrspaltigen
> Indexe SQL state: 0A000
> means, gin doesn't accept multicolumn indexes.
>
> Any ideas?

Multicolumn indexes should get into 8.4.

You may add a column tsvector and compute it with a trigger that
chose the correct language when generating the tsvector.
Then you'll have to pick up the correct language when you generate
the tsquery in your search.

http://www.sigaev.ru/gin/fastinsert_and_multicolumn_GIN.pdf

--
Ivan Sergio Borgonovo
http://www.webthatworks.it


Re: Fulltext index

From
Oleg Bartunov
Date:
On Sat, 8 Nov 2008, Andreas Kraftl wrote:

> Hello again,
>
> my previous post wasn't answered. I think i told my question wrong :-).
>
> I have a table like
> a    | b
> --------------------
> 'de' | Hallo welt
> 'en' | Hello world
>
> How can I create a full text index over b?

use concatenation operator: tsvector('de',b_de) || tsvector('en',b_en)


> CREATE INDEX idx ON table USING gin(
> to_tsvector(
>     case
>         when a = 'de' then 'german'
>         when a = 'en' then 'english'
>         else 'english'
>     end
> ), b);
>
> This doesn't work. Error Message in german:
> FEHLER: Zugriffsmethode ЪЪginЪЪ unterstЪЪtzt keine mehrspaltigen Indexe
> SQL state: 0A000
> means, gin doesn't accept multicolumn indexes.
>
> Any ideas?
>
> Greetings Andreas
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

Re: Fulltext index

From
Tom Lane
Date:
Andreas Kraftl <andreas.kraftl@kraftl.at> writes:
> CREATE INDEX idx ON table USING gin(
> to_tsvector(
>     case
>         when a = 'de' then 'german'
>         when a = 'en' then 'english'
>         else 'english'
>     end
> ), b);

> This doesn't work. Error Message in german:
> FEHLER: Zugriffsmethode »gin« unterstützt keine mehrspaltigen Indexe
> SQL state: 0A000
> means, gin doesn't accept multicolumn indexes.

You've got the parentheses in the wrong place.

The way I'd suggest doing this is

regression=# create table tab (a regconfig, b text);
CREATE TABLE
regression=# create index idx on tab using gin(to_tsvector(a,b));
CREATE INDEX
regression=# explain select * from tab where to_tsvector(a,b) @@ to_tsquery('english','foo');
                           QUERY PLAN
----------------------------------------------------------------
 Index Scan using idx on tab  (cost=0.00..8.27 rows=1 width=36)
   Index Cond: (to_tsvector(a, b) @@ '''foo'''::tsquery)
(2 rows)

If you want to use abbreviations like 'en' and 'de', create text search
configurations named that way instead of inserting a run-time
conversion.

            regards, tom lane

Re: Fulltext index

From
Andreas Kraftl
Date:
Am Sat, 08 Nov 2008 09:44:17 +0100 schrieb Andreas Kraftl:
> How can I create a full text index over b?

Hello

thanks for the answers. But nothing matches my problem.

I read the manual again and decide me for an other way.
I change my table that it looks like:

lang    | text
----------------------
german  | hallo welt
english | hello world

Then i follow the manual which says:
http://www.postgresql.org/docs/8.3/interactive/textsearch-
tables.html#TEXTSEARCH-TABLES-INDEX
 It is possible to set up more complex expression indexes wherein the
configuration name is specified by another column, e.g.:
CREATE INDEX pgweb_idx ON pgweb USING gin(to_tsvector(config_name, body));

 It is possible to set up more complex expression indexes wherein the
configuration name is specified by another column, e.g.:

I did a
CREATE INDEX idx ON test.test1 USING gin(to_tsvector(lang, "text"));
and
CREATE INDEX idx ON test.test1 USING gin(to_tsvector(lang::regconfig,
"text"));

For both I get this error in german:
FEHLER:  Funktionen im Indexausdruck müssen als IMMUTABLE markiert sein
SQL state: 42P17
means in english
functions in index expression must be marked IMMUTABLE

Now I have no idea. My experience with databases and postgresql are too
less to decide if this is a bug, or myself is the bug ;-).

Thanks again for help
Andreas




Re: Fulltext index

From
Sam Mason
Date:
On Mon, Nov 10, 2008 at 09:14:21AM +0100, Andreas Kraftl wrote:
> Am Sat, 08 Nov 2008 09:44:17 +0100 schrieb Andreas Kraftl:
> > How can I create a full text index over b?
>
> thanks for the answers. But nothing matches my problem.

I'm not sure what's wrong with Oleg's suggestion--he's the guy who wrote
most of the code so his suggestions should be reasonable!  I'm just
learning about this stuff myself, so it may be somewhat sub-optimal.
That said, I got things working when doing the following:

  CREATE TABLE test (
    lang TEXT,
    text TEXT
  );

  INSERT INTO test VALUES
    ('german',  'hallo welt'),
    ('english', 'hello world');

  CREATE INDEX idx ON test USING gin(tsvector_concat(
    to_tsvector('german',  CASE lang WHEN 'german'  THEN text ELSE '' END),
    to_tsvector('english', CASE lang WHEN 'english' THEN text ELSE '' END)));

"text" as a column name gets a bit confusing to read, but I'm trying
to follow your names.  Also my version of PG didn't seem to know that
the '||' operator knows how to concat tsvectors, so I had to spell out
tsvector_concat in full.  Querying is a bit awkward, but works:

  SELECT *
  FROM test
  WHERE tsvector_concat(
    to_tsvector('english', CASE lang WHEN 'english' THEN text ELSE '' END),
    to_tsvector('german',  CASE lang WHEN 'german'  THEN text ELSE '' END))
      @@ to_tsquery('english', 'hello');

Putting most of the above into a query would work, as would having PG
automatically maintaining a column of type TSVECTOR.

> I read the manual again and decide me for an other way.
> I change my table that it looks like:
>
> lang    | text
> ----------------------
> german  | hallo welt
> english | hello world

What types do these columns have? if "lang" is of type REGCONFIG all
works for me:

  CREATE TABLE test (
    lang REGCONFIG,
    text TEXT
  );

  INSERT INTO test VALUES
    ('german',  'hallo welt'),
    ('english', 'hello world');

  CREATE INDEX idx ON test USING gin(to_tsvector(lang, text));

  SELECT *
  FROM test
  WHERE to_tsvector(lang, text) @@ to_tsquery('english', 'hello');

This all seems much easier than having "lang" as a TEXT column.

> Now I have no idea. My experience with databases and postgresql are too
> less to decide if this is a bug, or myself is the bug ;-).

I think the awkward thing is that text/strings are visually indistin-
guishable from arbitrary literals in SQL.  The 'english' that's going
into the to_tsquery() call above is actually of type REGCONFIG, but it
looks like a string literal.  I think that could be why you were getting
confused before.

Hope that all makes sense and helps a bit!


  Sam

Re: Fulltext index

From
Sam Mason
Date:
Before waking up properly, I wrote:
> Querying is a bit awkward, but works:
>
>   SELECT *
>   FROM test
>   WHERE tsvector_concat(
>     to_tsvector('english', CASE lang WHEN 'english' THEN text ELSE '' END),
>     to_tsvector('german',  CASE lang WHEN 'german'  THEN text ELSE '' END))
>       @@ to_tsquery('english', 'hello');
>
> Putting most of the above into a query would work

That should be "Putting most of the above into a VIEW would work"!  The
example would be:

CREATE VIEW test_v AS
  SELECT lang, text, tsvector_concat(
    to_tsvector('english', CASE lang WHEN 'english' THEN text ELSE '' END),
    to_tsvector('german',  CASE lang WHEN 'german'  THEN text ELSE '' END))
      AS tsvec
  FROM test;

Allowing you to do:

  SELECT lang, text
  FROM test_v
  WHERE tsvec @@ to_tsquery('english', 'hello');


  Sam