Thread: Removing whitespace using regexp_replace
Hi, I have a column with the datatype "text" that may contain leading whitespace (tabs, spaces newlines, ...) and I would like to remove them all (ideally leading and trailing). I tried SELECT regexp_replace(myfield, '\A\s*', '') FROM mytable; (for leading whitespace, to start with) But it does not remove anything. I replace my first attempt '^\s*' with '\A\s*' after reading the chapter about newline-sensitive matching, but that doesn't seem to do the trick either. Just for a test I changed this to SELECT regexp_replace(myfield, '\s*', '') FROM mytable; and expected *all* whitespace to be removed from my string, but only the leading ones were replaced. Which I don't understand at all. Why weren't other whitespace sequences not replaced with that expression? What would be the correct RE to replace leading and trailing whitespace without affecting anything inbetween? I'm pretty sure I'm missing someting very obvious... Thanks in advance Thomas
Thomas Kellerer <spam_eater@gmx.net> schrieb: > Hi, > > I have a column with the datatype "text" that may contain leading > whitespace (tabs, spaces newlines, ...) and I would like to remove them all > (ideally leading and trailing). You can use trim() for that: select 'x' || trim(both '\t' from trim(both ' ' from ' \t\tfoo bar ')) || 'x'; (for testing with 'x' around the result) > > I tried > > SELECT regexp_replace(myfield, '\A\s*', '') > FROM mytable; For regexp_replace() you need an extra parameter 'g' like below: test=*# select 'x' || regexp_replace(regexp_replace(' \t\tfoo bar ', '^[ \t]+','','g'),'[ \t]+$','','g') || 'x'; ^^^ ^^^ Andreas -- Really, I'm not out to destroy Microsoft. That will just be a completely unintentional side effect. (Linus Torvalds) "If I was god, I would recompile penguin with --enable-fly." (unknow) Kaufbach, Saxony, Germany, Europe. N 51.05082°, E 13.56889°
Andreas Kretschmer wrote on 28.10.2007 12:42: >> I have a column with the datatype "text" that may contain leading >> whitespace (tabs, spaces newlines, ...) and I would like to remove them all >> (ideally leading and trailing). > > You can use trim() for that: > > select 'x' || trim(both '\t' from trim(both ' ' from ' \t\tfoo bar ')) || 'x'; > > (for testing with 'x' around the result) Yes I was thinking about a solution like that as well, but wouldn't that only work if the order in which spaces and tabs appear is always the same? The above would replace ' \t' but not '\t ', right? > For regexp_replace() you need an extra parameter 'g' like below: Cool, works like a charm. Didn't see that parameter when first reading that chapter. But it seems my problem was actually caused by something else: SELECT regexp_replace(myfield, '\s*', '', 'g') FROM mytable; does not replace anything, but SELECT regexp_replace(myfield, '[ \t\n\r]*', '', 'g') FROM mytable; does replace all whitespaces (as I expected). And subsequently SELECT regexp_replace(myfield, '^[ \t\n\r]*', '', 'g') FROM mytable; replaces only the whitespace at the beginning. I thought \s is a "shortcut" for "whitespace", which in my understanding is the same as [ \t\r\n]. Am I wrong here? Cheers Thomas
Thomas Kellerer <spam_eater@gmx.net> schrieb: > Andreas Kretschmer wrote on 28.10.2007 12:42: > >>I have a column with the datatype "text" that may contain leading > >>whitespace (tabs, spaces newlines, ...) and I would like to remove them > >>all (ideally leading and trailing). > >You can use trim() for that: > >select 'x' || trim(both '\t' from trim(both ' ' from ' \t\tfoo bar ')) > >|| 'x'; > >(for testing with 'x' around the result) > Yes I was thinking about a solution like that as well, but wouldn't that > only work if the order in which spaces and tabs appear is always the same? > The above would replace ' \t' but not '\t ', right? Oh, yes. > > > >For regexp_replace() you need an extra parameter 'g' like below: > Cool, works like a charm. Nice, i'm glad to help you. > But it seems my problem was actually caused by something else: > > SELECT regexp_replace(myfield, '\s*', '', 'g') > FROM mytable; you should escape the \, change to ...'\\s*'... But without anchors this replaces all whitespaces, also within the text and not only at the beginning/end (^ and $) Andreas -- Really, I'm not out to destroy Microsoft. That will just be a completely unintentional side effect. (Linus Torvalds) "If I was god, I would recompile penguin with --enable-fly." (unknow) Kaufbach, Saxony, Germany, Europe. N 51.05082°, E 13.56889°
Andreas Kretschmer wrote on 28.10.2007 13:32: >> But it seems my problem was actually caused by something else: >> >> SELECT regexp_replace(myfield, '\s*', '', 'g') >> FROM mytable; > > you should escape the \, change to ...'\\s*'... Ah! Didn't think this was necessary, as \t or \n did not need to be escaped. > But without anchors this replaces all whitespaces, also within the text > and not only at the beginning/end (^ and $) Yes of course, this was only for testing ;) Thanks for your help! Thomas