Similarity Search with Wildcards - Mailing list pgsql-general

From Ghislain Hachey
Subject Similarity Search with Wildcards
Date
Msg-id 512EFAB9.80908@gmail.com
Whole thread Raw
Responses Re: Similarity Search with Wildcards  (John R Pierce <pierce@hogranch.com>)
List pgsql-general
Hi list,

I have a varchar column with content such as "Client Name - Brief
Description of Problem" (it's a help desk ticket system). I want to
generate reports by clients and the only thing I can base my query on is
this column. The client names often contain typos or are entered
slightly differently. I installed the pg_trgm extension and it almost
does what I want. The problem is that it searches the similarity of the
whole field and not just the client name resulting in not so similar
searches (I include my query below).

SELECT
  tickets.id as ticket_id,
  tickets.subject as ticket_subject,
  similarity(tickets.subject, 'Client Name') AS sml,
FROM
 tickets
WHERE
  tickets.subject % 'Client Name';

I thought about using wildcards as discussed here
<http://www.postgresql.org/message-id/flat/4D3CC2DC.6060002@wulczer.org#4D3CC2DC.6060002@wulczer.org>
but this does not seem to have any effect (I include the query I tried
below).

SELECT
  tickets.id as ticket_id,
  tickets.subject as ticket_subject,
  similarity(tickets.subject, '%Client Name%') AS sml,
FROM
 tickets
WHERE
  tickets.subject % '%Client Name%';

Both queries result in the same similarity. I would hope that the
similarity algorithm would only work on the "Client Name" part of the
string and ignore what is before and after; in other words, the latter
query above would return a similarity factor of 1 on the content "Client
Name - Brief Description of Problem".

Any pointer in a right direction would be appreciated.

--
GH<www.ghachey.info>

pgsql-general by date:

Previous
From: Merlin Moncure
Date:
Subject: Re: Poor performance when using a window function in a view
Next
From: John R Pierce
Date:
Subject: Re: Similarity Search with Wildcards