Re: counting words in a text or char varying field - Mailing list pgsql-general

From Greg Sabino Mullane
Subject Re: counting words in a text or char varying field
Date
Msg-id E16i19A-0001EG-00@granger.mail.mindspring.net
Whole thread Raw
In response to counting words in a text or char varying field  ("Thomas T. Thai" <tom@minnesota.com>)
List pgsql-general
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


> i'm looking for a way to count the words in text field or char varying
> separated by white spaces within postgresql itself.

That depends on what you mean by "postgresql itself". You cannot do this
with ordinary SQL: you are going to need a procedural language. Perl
would make an excellent choice, but that would be too easy :), so I'll
try and come up with a quick plpgsql example. This is only a test script,
and 'whitespace' in this case means, literally, one or more space
characters. It's a start. :)


CREATE FUNCTION countwords(TEXT) RETURNS INTEGER AS '
DECLARE
  mystring ALIAS FOR $1;
  words    INTEGER := 0;
  inspace  BOOL := true;       -- ## Catch the first word
BEGIN
  FOR pos IN 1 .. CHAR_LENGTH(mystring) LOOP

    -- There are 2 single quotes, a space, and 2 single quotes below:
    IF SUBSTRING(mystring,pos,1) =  '' '' THEN
      inspace := true;
    ELSE
      IF inspace THEN
        inspace := false;
        words := words+1;
      END IF;
    END IF;
  END LOOP;
  IF inspace is false THEN     -- ## Catch the last word
    words := words+1;
  END IF;

  RETURN words;
END;
' LANGUAGE 'plpgsql';

Let's give it a whirl:

select to_char(cdate, 'YYYYMMDD'), countwords(body)
from turnstep_mail where whofrom like '%tom@minnesota.com%'
order by 1 desc limit 5;

 to_char  | countwords
- ----------+------------
 20020304 |        493
 20020304 |         35
 20020303 |        194
 20020303 |         74
 20020302 |         44


Greg Sabino Mullane  greg@turnstep.com
PGP Key: 0x14964AC8 200203041730

-----BEGIN PGP SIGNATURE-----
Comment: http://www.turnstep.com/pgp.html

iD8DBQE8g/m/vJuQZxSWSsgRAqpfAKDSh7WYjF6l3cWfIw+8O/UPeKgjmwCfRlsc
OI69IIGHlMKMIbR0+Mc/q/g=
=0ral
-----END PGP SIGNATURE-----



pgsql-general by date:

Previous
From: Jean-Luc Lachance
Date:
Subject: Re: Validating Whole Numbers
Next
From: Jean-Luc Lachance
Date:
Subject: Re: Listing Numbers