Thread: Removing whitespace using regexp_replace

Removing whitespace using regexp_replace

From
Thomas Kellerer
Date:
Hi,

I have a column with the datatype "text" that may contain leading whitespace 
(tabs, spaces newlines, ...) and I would like to remove them all (ideally 
leading and trailing).

I tried

SELECT regexp_replace(myfield, '\A\s*', '')
FROM mytable;

(for leading whitespace, to start with)

But it does not remove anything. I replace my first attempt '^\s*' with '\A\s*' 
after reading the chapter about newline-sensitive matching, but that doesn't 
seem to do the trick either.

Just for a test I changed this to

SELECT regexp_replace(myfield, '\s*', '')
FROM mytable;

and expected *all* whitespace to be removed from my string, but only the leading 
ones were replaced. Which I don't understand at all. Why weren't other 
whitespace sequences not replaced with that expression?

What would be the correct RE to replace leading and trailing whitespace without 
affecting anything inbetween?

I'm pretty sure I'm missing someting very obvious...

Thanks in advance
Thomas




Re: Removing whitespace using regexp_replace

From
Andreas Kretschmer
Date:
Thomas Kellerer <spam_eater@gmx.net> schrieb:

> Hi,
> 
> I have a column with the datatype "text" that may contain leading 
> whitespace (tabs, spaces newlines, ...) and I would like to remove them all 
> (ideally leading and trailing).

You can use trim() for that:

select 'x' || trim(both '\t' from trim(both ' ' from ' \t\tfoo bar  ')) || 'x';

(for testing with 'x' around the result)


> 
> I tried
> 
> SELECT regexp_replace(myfield, '\A\s*', '')
> FROM mytable;

For regexp_replace() you need an extra parameter 'g' like below:

test=*# select 'x' || regexp_replace(regexp_replace(' \t\tfoo bar  ', '^[ \t]+','','g'),'[ \t]+$','','g') || 'x';
                                                                          ^^^               ^^^
 


Andreas
-- 
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect.                              (Linus Torvalds)
"If I was god, I would recompile penguin with --enable-fly."    (unknow)
Kaufbach, Saxony, Germany, Europe.              N 51.05082°, E 13.56889°


Re: Removing whitespace using regexp_replace

From
Thomas Kellerer
Date:
Andreas Kretschmer wrote on 28.10.2007 12:42:
>> I have a column with the datatype "text" that may contain leading 
>> whitespace (tabs, spaces newlines, ...) and I would like to remove them all 
>> (ideally leading and trailing).
> 
> You can use trim() for that:
> 
> select 'x' || trim(both '\t' from trim(both ' ' from ' \t\tfoo bar  ')) || 'x';
> 
> (for testing with 'x' around the result)
Yes I was thinking about a solution like that as well, but wouldn't that only 
work if the order in which spaces and tabs appear is always the same?
The above would replace ' \t' but not '\t ', right?


> For regexp_replace() you need an extra parameter 'g' like below:
Cool, works like a charm.
Didn't see that parameter when first reading that chapter.

But it seems my problem was actually caused by something else:

SELECT regexp_replace(myfield, '\s*', '', 'g')
FROM mytable;

does not replace anything, but

SELECT regexp_replace(myfield, '[ \t\n\r]*', '', 'g')
FROM mytable;

does replace all whitespaces (as I expected). And subsequently

SELECT regexp_replace(myfield, '^[ \t\n\r]*', '', 'g')
FROM mytable;

replaces only the whitespace at the beginning.

I thought \s is a "shortcut" for "whitespace", which in my understanding is the 
same as [ \t\r\n]. Am I wrong here?

Cheers
Thomas





Re: Removing whitespace using regexp_replace

From
Andreas Kretschmer
Date:
Thomas Kellerer <spam_eater@gmx.net> schrieb:

> Andreas Kretschmer wrote on 28.10.2007 12:42:
> >>I have a column with the datatype "text" that may contain leading 
> >>whitespace (tabs, spaces newlines, ...) and I would like to remove them 
> >>all (ideally leading and trailing).
> >You can use trim() for that:
> >select 'x' || trim(both '\t' from trim(both ' ' from ' \t\tfoo bar  ')) 
> >|| 'x';
> >(for testing with 'x' around the result)
> Yes I was thinking about a solution like that as well, but wouldn't that 
> only work if the order in which spaces and tabs appear is always the same?
> The above would replace ' \t' but not '\t ', right?

Oh, yes.


> 
> 
> >For regexp_replace() you need an extra parameter 'g' like below:
> Cool, works like a charm.

Nice, i'm glad to help you.


> But it seems my problem was actually caused by something else:
> 
> SELECT regexp_replace(myfield, '\s*', '', 'g')
> FROM mytable;

you should escape the \, change to ...'\\s*'...

But without anchors this replaces all whitespaces, also within the text
and not only at the beginning/end (^ and $)


Andreas
-- 
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect.                              (Linus Torvalds)
"If I was god, I would recompile penguin with --enable-fly."    (unknow)
Kaufbach, Saxony, Germany, Europe.              N 51.05082°, E 13.56889°


Re: Removing whitespace using regexp_replace

From
Thomas Kellerer
Date:
Andreas Kretschmer wrote on 28.10.2007 13:32:
>> But it seems my problem was actually caused by something else:
>>
>> SELECT regexp_replace(myfield, '\s*', '', 'g')
>> FROM mytable;
> 
> you should escape the \, change to ...'\\s*'...
Ah! Didn't think this was necessary, as \t or \n did not need to be escaped.

> But without anchors this replaces all whitespaces, also within the text
> and not only at the beginning/end (^ and $)
Yes of course, this was only for testing ;)

Thanks for your help!

Thomas