Thread: Number of occurrence of characters?
Hi, I have a need to fairly often select data where the number of occurrences of a character in the field is "x". Semantically, it's literally "SELECT something FROM table WHERE numch('/', field)=$x". The problem is how to do it efficiently. I see there isn't a built-in function that counts character occurrences so I'd have to write it myself. An additional constraint is that it must be implemented with built-in capabilities, i.e. SQL and plpsql languages. I can do it the brute force way, looping over the string and processing one by one character with substring(), but is there a faster way? Whatever the function is, I intend to create an index on it.
Attachment
On Thu, Feb 05, 2009 at 02:31:24PM +0100, Ivan Voras wrote: > The problem is how to do it efficiently. I see there isn't a built-in > function that counts character occurrences so I'd have to write it > myself. An additional constraint is that it must be implemented with > built-in capabilities, i.e. SQL and plpsql languages. I can do it the > brute force way, looping over the string and processing one by one > character with substring(), but is there a faster way? # select length(regexp_replace('/some/string/with/slashes', '[^/]+', '', 'g')); length -------- 4 (1 row) depesz -- Linkedin: http://www.linkedin.com/in/depesz / blog: http://www.depesz.com/ jid/gtalk: depesz@depesz.com / aim:depeszhdl / skype:depesz_hdl / gg:6749007
Ivan Voras wrote: > I have a need to fairly often select data where the number of > occurrences of a character in the field is "x". Semantically, it's > literally "SELECT something FROM table WHERE numch('/', field)=$x". > > The problem is how to do it efficiently. I see there isn't a built-in > function that counts character occurrences so I'd have to write it > myself. An additional constraint is that it must be implemented with > built-in capabilities, i.e. SQL and plpsql languages. I can do it the > brute force way, looping over the string and processing one by one > character with substring(), but is there a faster way? Hmm, you could do this: CREATE OR REPLACE FUNCTION numch(text, text) RETURNS integer AS $$ SELECT length($2) - length(replace($2, $1, '')) $$ LANGUAGE SQL; ie. remove the characters we're counting, and see how much shorter the string became. I don't know if this is any faster than looping in a plpgsql function, but it might be. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
hubert depesz lubaczewski wrote: > On Thu, Feb 05, 2009 at 02:31:24PM +0100, Ivan Voras wrote: >> The problem is how to do it efficiently. I see there isn't a built-in >> function that counts character occurrences so I'd have to write it >> myself. An additional constraint is that it must be implemented with >> built-in capabilities, i.e. SQL and plpsql languages. I can do it the >> brute force way, looping over the string and processing one by one >> character with substring(), but is there a faster way? > > # select length(regexp_replace('/some/string/with/slashes', '[^/]+', '', 'g')); > length > -------- > 4 > (1 row) Thank you (and Heikki), I had a feeling I was missing an approach.
Attachment
It all depends at the end of a day, how crucial is that functionality to your app ? If very, I would add to every insert/update a trigger, that would update info on other table(s) with stats per character. Other option, is to write a function in C that would parse word and chop it in C,O (character, number of occurrences ) result. -- GJ