Re: tsearch2 and hyphenated terms - Mailing list pgsql-general

From Tom Lane
Subject Re: tsearch2 and hyphenated terms
Date
Msg-id 13306.1207932332@sss.pgh.pa.us
Whole thread Raw
In response to tsearch2 and hyphenated terms  (Reece Hart <reece@harts.net>)
List pgsql-general
Reece Hart <reece@harts.net> writes:
> For the purposes of indexing these names, I suspect I'd get the majority
> of cases by removing a hyphen when it's followed by 1 or 2 chars from
> [a-zA-Z0-9]. Does that require a custom parser?

Yeah, looks like it:

regression=# select * from ts_debug('MCL1 MCL-1');
   alias   |       description        | token |  dictionaries  |  dictionary  | lexemes
-----------+--------------------------+-------+----------------+--------------+---------
 numword   | Word, letters and digits | MCL1  | {simple}       | simple       | {mcl1}
 blank     | Space symbols            |       | {}             |              |
 asciiword | Word, all ASCII          | MCL   | {english_stem} | english_stem | {mcl}
 int       | Signed integer           | -1    | {simple}       | simple       | {-1}
(4 rows)

I had thought you might get a "numhword" output, but that only seems to
happen if there's at least one letter after the dash:

regression=# select * from ts_debug('MCL1 MCL-X1');
      alias      |               description                | token  |  dictionaries  |  dictionary  | lexemes
-----------------+------------------------------------------+--------+----------------+--------------+----------
 numword         | Word, letters and digits                 | MCL1   | {simple}       | simple       | {mcl1}
 blank           | Space symbols                            |        | {}             |              |
 numhword        | Hyphenated word, letters and digits      | MCL-X1 | {simple}       | simple       | {mcl-x1}
 hword_asciipart | Hyphenated word part, all ASCII          | MCL    | {english_stem} | english_stem | {mcl}
 blank           | Space symbols                            | -      | {}             |              |
 hword_numpart   | Hyphenated word part, letters and digits | X1     | {simple}       | simple       | {x1}
(6 rows)

            regards, tom lane

pgsql-general by date:

Previous
From: "Scott Marlowe"
Date:
Subject: Re: PostgreSQL Processes on a linux box
Next
From: Oleg Bartunov
Date:
Subject: Re: tsearch2 and hyphenated terms