Thread: BUG #3798: Add fuzzy string search in TSearch2

BUG #3798: Add fuzzy string search in TSearch2

From
"Rikardo Tinauer"
Date:
The following bug has been logged online:

Bug reference:      3798
Logged by:          Rikardo Tinauer
Email address:      rikardo.tinauer@eba.si
PostgreSQL version: 8.3
Operating system:   All
Description:        Add fuzzy string search in TSearch2
Details:

This is not really a bug but just an idea I saw on IBM DB2. They have the
capability to do the fuzzy search on full text indexes.

It would be great to have this in PostgreSQL.

We are developers of Documents System, out primary dateabase is Postgres and
many of our major customers (large companies) use our DMS on Postgres (by
the way excellent database you guys develop). Since scanning is one of
possibilities how to get document into DMS we thought it would be great if
one could search the words of scanned document (we perform OCR on document).
But OCR recognized words aren't fully trustworthy (they have errors), so
fuzzy search would eliminate them since searching on word 'invoice' would
also return hits with words like 'inv0ice' etc...

Re: BUG #3798: Add fuzzy string search in TSearch2

From
Stefan Kaltenbrunner
Date:
Rikardo Tinauer wrote:
> The following bug has been logged online:
>
> Bug reference:      3798
> Logged by:          Rikardo Tinauer
> Email address:      rikardo.tinauer@eba.si
> PostgreSQL version: 8.3
> Operating system:   All
> Description:        Add fuzzy string search in TSearch2
> Details:
>
> This is not really a bug but just an idea I saw on IBM DB2. They have the
> capability to do the fuzzy search on full text indexes.
>
> It would be great to have this in PostgreSQL.
>
> We are developers of Documents System, out primary dateabase is Postgres and
> many of our major customers (large companies) use our DMS on Postgres (by
> the way excellent database you guys develop). Since scanning is one of
> possibilities how to get document into DMS we thought it would be great if
> one could search the words of scanned document (we perform OCR on document).
> But OCR recognized words aren't fully trustworthy (they have errors), so
> fuzzy search would eliminate them since searching on word 'invoice' would
> also return hits with words like 'inv0ice' etc...

you might want to look into some of the contrib modules provided with
the main tarball:

http://www.postgresql.org/docs/8.3/static/pgtrgm.html
http://www.postgresql.org/docs/8.3/static/fuzzystrmatch.html


Stefan