Thread: Number of occurrence of characters?

Number of occurrence of characters?

From
Ivan Voras
Date:
Hi,

I have a need to fairly often select data where the number of
occurrences of a character in the field is "x". Semantically, it's
literally "SELECT something FROM table WHERE numch('/', field)=$x".

The problem is how to do it efficiently. I see there isn't a built-in
function that counts character occurrences so I'd have to write it
myself. An additional constraint is that it must be implemented with
built-in capabilities, i.e. SQL and plpsql languages. I can do it the
brute force way, looping over the string and processing one by one
character with substring(), but is there a faster way?

Whatever the function is, I intend to create an index on it.


Attachment

Re: Number of occurrence of characters?

From
hubert depesz lubaczewski
Date:
On Thu, Feb 05, 2009 at 02:31:24PM +0100, Ivan Voras wrote:
> The problem is how to do it efficiently. I see there isn't a built-in
> function that counts character occurrences so I'd have to write it
> myself. An additional constraint is that it must be implemented with
> built-in capabilities, i.e. SQL and plpsql languages. I can do it the
> brute force way, looping over the string and processing one by one
> character with substring(), but is there a faster way?

# select length(regexp_replace('/some/string/with/slashes', '[^/]+', '', 'g'));
 length
--------
      4
(1 row)

depesz

--
Linkedin: http://www.linkedin.com/in/depesz  /  blog: http://www.depesz.com/
jid/gtalk: depesz@depesz.com / aim:depeszhdl / skype:depesz_hdl / gg:6749007

Re: Number of occurrence of characters?

From
Heikki Linnakangas
Date:
Ivan Voras wrote:
> I have a need to fairly often select data where the number of
> occurrences of a character in the field is "x". Semantically, it's
> literally "SELECT something FROM table WHERE numch('/', field)=$x".
>
> The problem is how to do it efficiently. I see there isn't a built-in
> function that counts character occurrences so I'd have to write it
> myself. An additional constraint is that it must be implemented with
> built-in capabilities, i.e. SQL and plpsql languages. I can do it the
> brute force way, looping over the string and processing one by one
> character with substring(), but is there a faster way?

Hmm, you could do this:

CREATE OR REPLACE FUNCTION numch(text, text) RETURNS integer AS $$
SELECT length($2) - length(replace($2, $1, '')) $$ LANGUAGE SQL;

ie. remove the characters we're counting, and see how much shorter the
string became. I don't know if this is any faster than looping in a
plpgsql function, but it might be.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: Number of occurrence of characters?

From
Ivan Voras
Date:
hubert depesz lubaczewski wrote:
> On Thu, Feb 05, 2009 at 02:31:24PM +0100, Ivan Voras wrote:
>> The problem is how to do it efficiently. I see there isn't a built-in
>> function that counts character occurrences so I'd have to write it
>> myself. An additional constraint is that it must be implemented with
>> built-in capabilities, i.e. SQL and plpsql languages. I can do it the
>> brute force way, looping over the string and processing one by one
>> character with substring(), but is there a faster way?
>
> # select length(regexp_replace('/some/string/with/slashes', '[^/]+', '', 'g'));
>  length
> --------
>       4
> (1 row)

Thank you (and Heikki), I had a feeling I was missing an approach.


Attachment

Re: Number of occurrence of characters?

From
Grzegorz Jaśkiewicz
Date:
It all depends at the end of a day, how crucial is that functionality
to your app ?
If very, I would add to every insert/update a trigger, that would
update info on other table(s) with stats per character.

Other option, is to write a function in C that would parse word and
chop it in C,O (character, number of occurrences ) result.


--
GJ