Thread: determine how many matches of a string in a field

determine how many matches of a string in a field

From
"Dave [Hawk-Systems]"
Date:
resolved the previous question regarding scoring to some extent...

currently have this select;

SELECT *, (
(CASE WHEN (keywords ~* '.*MySearchString.*') THEN 5 ELSE 0 END) +
(CASE WHEN (title ~* '.*MySearchString.*') THEN 3 ELSE 0 END) +
(CASE WHEN (description ~* '.*MySearchString.*') THEN 1 ELSE 0 END)
) AS score FROM catalog_table WHERE
keywords ~* '.*MySearchString.*' or
title ~* '.*MySearchString.*' or
description ~* '.*MySearchString.*'
AND status='D' ORDER BY score DESC;

this does a great job of finding results that match 1 or more occurances of
MySearchString in the fields selected...  I can't figure out how to determine if
multiple results exist in each field, and if so, how to multiply the results.

essentially looking for a legitimate form of this pseudo code;

(CASE WHEN (keywords ~* '.*MySearchString.*') THEN (substring_count(keywords
from '.*MySearchString.*')*5) ELSE 0 END)

thoughts?

Dave



Re: determine how many matches of a string in a field

From
Joe Conway
Date:
Dave [Hawk-Systems] wrote:
> (CASE WHEN (keywords ~* '.*MySearchString.*') THEN (substring_count(keywords
> from '.*MySearchString.*')*5) ELSE 0 END)

This question (essentially) came up twice in July (once here and once on
the SQL list -- author copied) and I don't recall seeing an answer
posted. I've been saving it for when I had the time and energy to work
out a solution. Not sure you're still in need of this, and I'm sure
there are more efficient ways to do this sort of thing (certainly
implementing it in C would help), but FWIW this seems to do what you wanted:

create or replace function substr_count(text, text) returns integer as '
declare
   t1 alias for $1;  -- source string
   t2 alias for $2;  -- search pattern
   match integer := 0;
   pos integer := 0;
   p integer := 0;
   px integer := 0;
   len1 integer := 0;
   len2 integer := 0;
begin
   len1 := length(t1);
   len2 := length(t2);

   if len2 < 1 then
     return 0; /* result for empty pattern */
   end if;

   /* no use in searching t1 past point where t2 will fit */
   px := len1 - len2 + 1;

   for p in 1..px loop
     if substr(t1, p, len2) = t2 then
       match := match + 1;
     end if;
   end loop;
   return match;
end;
' language plpgsql immutable strict;

regression=# select substr_count('abcdefedcba', 'a');
  substr_count
--------------
             2
(1 row)

regression=# select substr_count('aaa', 'aa');
  substr_count
--------------
             2
(1 row)

regression=# select substr_count('aabaacaaadefabaa', 'aa');
  substr_count
--------------
             5
(1 row)

HTH,

Joe