Re: daitch_mokotoff module - Mailing list pgsql-hackers

From Dag Lem
Subject Re: daitch_mokotoff module
Date
Msg-id ygezgbdvlqd.fsf@sid.nimrod.no
Whole thread Raw
In response to Re: daitch_mokotoff module  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
List pgsql-hackers
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

> I wonder why do you have it return the multiple alternative codes as a
> space-separated string.  Maybe an array would be more appropriate.  Even
> on your documented example use, the first thing you do is split it on
> spaces.

In the example, the *input* is split on whitespace, the returned soundex
codes are not. The splitting of the input is done in order to code each
word separately. One of the stated rules of the Daitch-Mokotoff Soundex
Coding is that "When a name consists of more than one word, it is coded
as if one word", and this may not always be desired. See
https://www.avotaynu.com/soundex.htm or
https://www.jewishgen.org/InfoFiles/soundex.html for the rules.

The intended use for the Daitch-Mokotoff soundex, as for any other
soundex algorithm, is to index names (or words) on some representation
of sound, so that alike sounding names with different spellings will
match.

In PostgreSQL, the Daitch-Mokotoff Soundex and Full Text Search makes
for a powerful combination to match alike sounding names. Full Text
Search (as any other free text search engine) works with documents, and
thus the Daitch-Mokotoff Soundex implementation produces documents
(words separated by space). As stated in the documentation: "Any
alternative soundex codes are separated by space, which makes the
returned text suited for use in Full Text Search".

Best regards,

Dag Lem



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Error-safe user functions
Next
From: Robert Haas
Date:
Subject: Re: fixing CREATEROLE