Thread: Using regexp_replace to remove small words

Using regexp_replace to remove small words

From
"Henrique de Lima Trindade"
Date:

Hi,

 

I'm trying to find a regular expression that removes all small (length < N) words from a string. But, until now I've not been successful.

 

For example:

 

If I pass 'George W Bush' as parameter, I want regexp_replace to return 'George Bush'.

 

Other examples are:

 

select regexp_replace( 'Tommy Lee Jones', 'REGEX', ' ', 'g' )    => 'Tommy Jones'

select regexp_replace( 'Dr Tommy Lee Jones', 'REGEX', ' ', 'g' ) => 'Tommy Jones'

select regexp_replace( 'Tommy Lee Jones Jr', 'REGEX', ' ', 'g' ) => 'Tommy Jones'

 

Thanks in advance,

 

 

HENRIQUE TRINDADE

ANALISTA DE SISTEMA II

( 55 (31) 3025-3550

+henrique@vivver.com.br

assinatura

 

Attachment

Re: Using regexp_replace to remove small words

From
Peter Eisentraut
Date:
On fre, 2010-12-10 at 10:47 -0200, Henrique de Lima Trindade wrote:
> I'm trying to find a regular expression that removes all small (length < N)
> words from a string. But, until now I've not been successful.

Here is a start:

select regexp_replace('Tommy Lee Jones', $$\y\w{2,3}\y$$, ' ', 'g' );

If you want to normalize the spaces after the removal and handle
beginning and end of the word, you will need to expand this to cover
those cases, but the example should contain the key ingredients.



Re: Using regexp_replace to remove small words

From
rodrigo mendonca
Date:
Hi Peter,

your REGEX is exactly what we were trying to find.

Thank you very much!

Rodrigo.

--
Rodrigo Mendonça
Diretor de Tecnologia
Tel: 31 3025 3550 / Cel: 31 8489 3119
rodrigo@vivver.com.br / diretoria@vivver.com.br

Re: Using regexp_replace to remove small words

From
Vick Khera
Date:
On Fri, Dec 10, 2010 at 5:59 PM, Peter Eisentraut <peter_e@gmx.net> wrote:
> select regexp_replace('Tommy Lee Jones', $$\y\w{2,3}\y$$, ' ', 'g' );
>

speaking of regular expressions... the other day I was trying to find
where the 'flags' option to regexp_replace() is described, but I
cannot find it in the 9.0 manual in any obvious place.  it is not
described in the string functions section, nor in the regexp section.

RES: Using regexp_replace to remove small words

From
"Henrique de Lima Trindade"
Date:
Hi Peter,

Your example works perfectly. But, I need Your help with on another situation.
We're trying to create a plpgsql function with the expression. But, I'm getting a syntax error:

-----------------
create or replace function sp_remove_small_words( ptext text ) returns text immutable as
$$
begin

return regexp_replace( ptext, $$\y\w{1,3}\y$$, '', 'g' );

end;
$$ language plpgsql
;
-----------------
ERRO:  erro de sintaxe em ou próximo a "\"
LINE 6: return regexp_replace( ptext, $$\y\w{1,3}\y$$, '', 'g' );
                                        ^

********** Error **********

ERRO: erro de sintaxe em ou próximo a "\"
SQL state: 42601
Character: 138

Thanks again!


-----Mensagem original-----
De: Peter Eisentraut [mailto:peter_e@gmx.net]
Enviada em: sexta-feira, 10 de dezembro de 2010 20:59
Para: Henrique de Lima Trindade
Cc: pgsql-general@postgresql.org
Assunto: Re: [GENERAL] Using regexp_replace to remove small words

On fre, 2010-12-10 at 10:47 -0200, Henrique de Lima Trindade wrote:
> I'm trying to find a regular expression that removes all small (length < N)
> words from a string. But, until now I've not been successful.

Here is a start:

select regexp_replace('Tommy Lee Jones', $$\y\w{2,3}\y$$, ' ', 'g' );

If you want to normalize the spaces after the removal and handle
beginning and end of the word, you will need to expand this to cover
those cases, but the example should contain the key ingredients.



Re: Using regexp_replace to remove small words

From
Vick Khera
Date:
On Mon, Dec 13, 2010 at 7:24 AM, Henrique de Lima Trindade
<henrique@vivver.com.br> wrote:
> Your example works perfectly. But, I need Your help with on another situation.
> We're trying to create a plpgsql function with the expression. But, I'm getting a syntax error:

You're using $$ quoting for the function, and $$ quoting for the
string in the regexp.  Think of it like you're trying to embed a ' in
a string delimited by ' also.

Re: Using regexp_replace to remove small words

From
Tom Lane
Date:
Vick Khera <vivek@khera.org> writes:
> speaking of regular expressions... the other day I was trying to find
> where the 'flags' option to regexp_replace() is described, but I
> cannot find it in the 9.0 manual in any obvious place.  it is not
> described in the string functions section, nor in the regexp section.

It's described in the same paragraph that describes regexp_replace, in
http://www.postgresql.org/docs/9.0/static/functions-matching.html

    The regexp_replace function provides substitution of new text for
    substrings that match POSIX regular expression patterns. It has the
    syntax regexp_replace(source, pattern, replacement [, flags ]). The
    source string is returned unchanged if there is no match to the
    pattern. If there is a match, the source string is returned with the
    replacement string substituted for the matching substring. The
    replacement string can contain \n, where n is 1 through 9, to
    indicate that the source substring matching the n'th parenthesized
    subexpression of the pattern should be inserted, and it can contain
    \& to indicate that the substring matching the entire pattern should
    be inserted. Write \\ if you need to put a literal backslash in the
    replacement text. (As always, remember to double backslashes written
    in literal constant strings, assuming escape string syntax is used.)
    The flags parameter is an optional text string containing zero or
    more single-letter flags that change the function's behavior. Flag i
    specifies case-insensitive matching, while flag g specifies
    replacement of each matching substring rather than only the first
    one. Other supported flags are described in Table 9-19.

            regards, tom lane

Re: RES: Using regexp_replace to remove small words

From
Robert Gravsjö
Date:

On 2010-12-13 13.24, Henrique de Lima Trindade wrote:
> Hi Peter,
>
> Your example works perfectly. But, I need Your help with on another situation.
> We're trying to create a plpgsql function with the expression. But, I'm getting a syntax error:
>
> -----------------
> create or replace function sp_remove_small_words( ptext text ) returns text immutable as
> $$
> begin
>
> return regexp_replace( ptext, $$\y\w{1,3}\y$$, '', 'g' );
>
> end;
> $$ language plpgsql
> ;
> -----------------
> ERRO:  erro de sintaxe em ou próximo a "\"
> LINE 6: return regexp_replace( ptext, $$\y\w{1,3}\y$$, '', 'g' );

You're ending the function declaration with the $$. Using $fun$ as For
example, the following works:

create or replace function sp_remove_small_words( ptext text ) returns
text immutable as
              $fun$
                          begin
                                      return regexp_replace( ptext,
$$\y\w{1,3}\y$$, '', 'g' );                          end;
                                                                 $fun$
language plpgsql;


For details on dollar quoting see:
http://www.postgresql.org/docs/current/interactive/sql-syntax-lexical.html#SQL-SYNTAX-DOLLAR-QUOTING

Regards,
roppert
>                                          ^
>
> ********** Error **********
>
> ERRO: erro de sintaxe em ou próximo a "\"
> SQL state: 42601
> Character: 138
>
> Thanks again!
>
>
> -----Mensagem original-----
> De: Peter Eisentraut [mailto:peter_e@gmx.net]
> Enviada em: sexta-feira, 10 de dezembro de 2010 20:59
> Para: Henrique de Lima Trindade
> Cc: pgsql-general@postgresql.org
> Assunto: Re: [GENERAL] Using regexp_replace to remove small words
>
> On fre, 2010-12-10 at 10:47 -0200, Henrique de Lima Trindade wrote:
>> I'm trying to find a regular expression that removes all small (length<  N)
>> words from a string. But, until now I've not been successful.
>
> Here is a start:
>
> select regexp_replace('Tommy Lee Jones', $$\y\w{2,3}\y$$, ' ', 'g' );
>
> If you want to normalize the spaces after the removal and handle
> beginning and end of the word, you will need to expand this to cover
> those cases, but the example should contain the key ingredients.
>
>
>

--
Regards,
Robert "roppert" Gravsjö

Re: Using regexp_replace to remove small words

From
Vick Khera
Date:
On Mon, Dec 13, 2010 at 10:00 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Flag i
>    specifies case-insensitive matching, while flag g specifies
>    replacement of each matching substring rather than only the first
>    one. Other supported flags are described in Table 9-19.

Thanks.  Quite well hidden in plain sight. :(

As a user, I was looking for a table similar to 9-19, since I
discounted it because it described the embedded flags.  Based on the
above text though it seems they are also valid for the flags
parameter.