FTS for a controlled vocab - Mailing list pgsql-general

From Sumit Raja
Subject FTS for a controlled vocab
Date
Msg-id CAB4mO2d=s0KiFy_7fkExexarkMES8Y=0nBFa_SC+YQCYWQmDDQ@mail.gmail.com
Whole thread Raw
List pgsql-general
Hello,

I am trying to identify how best to handle the situation where a controlled vocabulary needs to be searched on using full text search.

I have a list of brand names that have, what FTS deems, blank characters in them that I need to search against. E.g. (+)people, D&G, 100% Design.

These particular combinations are proving to be difficult so I would like to do a replacement at index and query time
(+)people, +people -> pluspeople
100% Design -> 100percent Design
D&G, D & G, DG -> DandG

Running these through the default parser means I get a much reduced lexemes that won't be exact enough.

   alias   |   description   | token |  dictionaries  |  dictionary  | lexemes
-----------+-----------------+-------+----------------+--------------+---------
 asciiword | Word, all ASCII | d     | {english_stem} | english_stem | {d}
 blank     | Space symbols   | &     | {}             |              |
 asciiword | Word, all ASCII | g     | {english_stem} | english_stem | {g}
(3 rows)

 alias   |   description   | token  |  dictionaries  |  dictionary  | lexemes
-----------+-----------------+--------+----------------+--------------+---------
 blank     | Space symbols   | (      | {}             |              |
 blank     | Space symbols   | +)     | {}             |              |
 asciiword | Word, all ASCII | people | {english_stem} | english_stem | {peopl}


Can I achieve this with FTS and dictionaries or would I need a custom parser? Any other ideas on how a search like this could work?

I have considered using the actual text column in the query to try and match the exact term using ilike and pg_trgm. So for a user query of 'D&G dresses' the select could be:

select * from test where text_val @@ plainto_tsquery('english','d&g dresses') and lex ilike'%d&g%';

but there would be some horrible query mangling to find all words that have the blank tokens and use them in multiple ilike comparisons.

Thanks

Sumit

pgsql-general by date:

Previous
From: Joe Van Dyk
Date:
Subject: Shorthand syntax for triggers
Next
From: Chris Ernst
Date:
Subject: Re: pg_upgrade not detecting version properly