two questions about fulltext searchign / tsvector indexes - Mailing list pgsql-general

From Jonathan Vanasco
Subject two questions about fulltext searchign / tsvector indexes
Date
Msg-id 72D88361-2D6A-452F-A86B-6974E36C94EF@2xlp.com
Whole thread Raw
In response to Re: Best practices for moving UTF8 databases  (Phoenix Kiula <phoenix.kiula@gmail.com>)
Responses Re: two questions about fulltext searchign / tsvector indexes
List pgsql-general
I'm having some issues with fulltext searching.

I've gone though the list archives and stack overflow, but can't seem to get the exact answers.  hoping someone can
help.

Thanks in advance and apologies for these questions being rather basic.  I just felt the docs and some online posts are
leadingme into possibly making the wrong decision and I want to make sure I"m doing this right. 


1.  I need to make both 'title' and 'description' searchable.   What is the current proper way to index multiple
columnsof a table ( ie, not one ) ? 

    I've essentially seen the following in the docs, mailing list, and various websites:

    A unified index
        CREATE INDEX CONCURRENTLY unified_tsvector_idx ON mytable USING gin(to_tsvector('english', title || ' ' ||
description)); 

    Individual indexes
        CREATE INDEX CONCURRENTLY title_tsvector_idx ON mytable USING gin(to_tsvector('english', title ));
        CREATE INDEX CONCURRENTLY description_tsvector_idx ON mytable USING gin(to_tsvector('english', description ));

    Using dedicated columns ( one or more )
        ALTER TABLE ....
        create trigger ....

    I can't figure out which one to use.  This is on a steadily growing table of around 20MM rows that gets 20-80k new
recordsa day, but existing records are rarely updated. 


2. I've been getting a handful of 'can not index words longer than 2047 characters' in my tests.

    if this 2047 character max is on tokens, is there a way to lower it?  or to profile the index for distribution of
tokens?  I don't think we have to support any tokens larger than 20chars or so. 

3a. What should EXPLAIN ANALYZE show if it is using the index ?  i couldn't find an example.

3b. Depending on how I index the column, what do I need to pass into the query so that it uses the index ?

    1.     if the index is created like
            gin(to_tsvector('english', title ));

        do i have to search in this format ?
            to_tsvector('english',title) @@ to_tsquery('english', 'dog') ;

    2.     if i use an index like
             gin(to_tsvector('english', title || ' ' || description ));

        what is the correct way to query the database and let the planner know I want to use the index ?





pgsql-general by date:

Previous
From: Adrian Klaver
Date:
Subject: Re: pg_standby replication problem
Next
From: Khangelani Gama
Date:
Subject: Re: pg_standby replication problem