Thread: tsearch2 query question

tsearch2 query question

From
Anders Østergaard Jensen
Date:
Hello postgresql users, 

I have a question regarding building a tsearch2 query that does wildcard searching by default. Currently, I am using plainto_tsquery(<expr>), but it does not allow me to build more advanced search expressions. I know to_tsquery is the way to go with this, but I need some qualified assistance for building the correct query. 

Consider the following table: 

metabase=# \d customers;
                                           Table "public.customers"
     Column      |           Type           |                            Modifiers                            
-----------------+--------------------------+-----------------------------------------------------------------
 id              | integer                  | not null default nextval(('"customer_id_seq"'::text)::regclass)
 name            | character varying(255)   | not null
 description     | text                     | 
[...] 
search_idx      | tsvector                 | 

search_idx is a tsvector with an ON INSERT OR UPDATE trigger, which automatically updates the search vector as a combination of name and description. No black magic there. 

Now, let's start out with a classic, non-tsearch2 ILIKE query with two wildcards around the search token:
metabase=# select id, name from customers where name ilike '%holstebr%'; 

CASE #1:
  id  |       name        
------+-------------------
 3646 | Holstebro Kommune
(1 row)

I am trying to do the same exactly with tsearch2 but with no luck: 

CASE #2:
metabase=# select id, name from customers where search_idx @@ to_tsquery('*Holstebr*');
 id | name 
----+------
(0 rows)

How do I formulate the input for to_tsquery so that it returns the same as (1)? And subsequently: how do I handle spaces in between, fx. so that 'holstebro komm*' yields 'holstebro kommune'?

If people have any example of expression builders they put on top of their SQL queries, I would love to see it. I know that this has been done before. 

Thank you for your time and help, 

Anders

Re: tsearch2 query question

From
Tom Lane
Date:
Anders Østergaard Jensen <aj@itersys.dk> writes:
> Now, let's start out with a classic, non-tsearch2 ILIKE query with two
> wildcards around the search token:
> metabase=# select id, name from customers where name ilike '%holstebr%';
>   id  |       name
> ------+-------------------
>  3646 | Holstebro Kommune
> (1 row)

> I am trying to do the same exactly with tsearch2 but with no luck:

> metabase=# select id, name from customers where search_idx @@
> to_tsquery('*Holstebr*');

There isn't any exact equivalent for that in tsquery, because it's about
searching for words not arbitrary substrings.  You could do
to_tsquery('Holstebro').  Since 8.4 you could also do a prefix match,
say to_tsquery('Holstebr:*'), but there's no way to omit letters from
the start of the word.  Also, you have to be wary of prefix matches
if you use a stemming dictionary, because dropping characters from the
end of the word might make it stem differently.

> And subsequently: how do I handle spaces in between, fx. so that
> 'holstebro komm*' yields 'holstebro kommune'?

You need to think of that as being two independent word searches,
like to_tsquery('holstebro & komm:*').  AFAIR there isn't any phrase
matching per se in tsquery.  Of course, you could AND this with an ILIKE
clause to filter out matches where the words weren't adjacent.

If you're really dependent on the exact behavior of LIKE-style
searching then tsquery is not going to provide you with an exact
replacement.  You might consider looking at contrib/pg_trgm/ to
find an indexable operation that can speed up LIKE searches.
        regards, tom lane