Thread: How to extract a substring using Regex
Hi, I'm new to Regex in Postgres. Can someone give me a quick pointer on how I'd SELECT the substring between '<name>' and '</name>' in a field? Sample field data: address city here <name>Rogers, Jim</name> zip code place and I'd like the SELECT to return only: Rogers, Jim Thanks!
On Aug 23, 2007, at 19:33 , Postgres User wrote: > I'm new to Regex in Postgres. Can someone give me a quick pointer on > how I'd SELECT the substring between '<name>' and '</name>' in > a field? Check out regexp_replace: http://www.postgresql.org/docs/8.2/interactive/functions- matching.html#FUNCTIONS-POSIX-REGEXP One of the forms of substring might work for you, too. Michael Glaesemann grzm seespotcode net
Yes, I read the manual. I think I had a problem because of the special chars (< / >) that I'm trying to search for... Still looking for the right syntax. On 8/23/07, Michael Glaesemann <grzm@seespotcode.net> wrote: > > On Aug 23, 2007, at 19:33 , Postgres User wrote: > > > I'm new to Regex in Postgres. Can someone give me a quick pointer on > > how I'd SELECT the substring between '<name>' and '</name>' in > > a field? > > Check out regexp_replace: > > http://www.postgresql.org/docs/8.2/interactive/functions- > matching.html#FUNCTIONS-POSIX-REGEXP > > One of the forms of substring might work for you, too. > > Michael Glaesemann > grzm seespotcode net > > >
[Please don't top post as it makes the discussion more difficult to follow, and please reply to the list so that others may benefit from and participate in the discussion.] On Aug 23, 2007, at 19:49 , Postgres User wrote: > Yes, I read the manual. I think I had a problem because of the > special chars (< / >) that I'm trying to search for... Still looking > for the right syntax. Why don't you show us what you've tried and the errors you're getting? That way we can help you figure out what you're doing wrong rather than just give you an answer. Michael Glaesemann grzm seespotcode net
> > Yes, I read the manual. I think I had a problem because of the > > special chars (< / >) that I'm trying to search for... Still looking > > for the right syntax. > > Why don't you show us what you've tried and the errors you're > getting? That way we can help you figure out what you're doing wrong > rather than just give you an answer. > > Michael Glaesemann SELECT substring(data_field from '<name>(.)</name>') FROM myTable
On Aug 23, 2007, at 20:01 , Postgres User wrote: >>> Yes, I read the manual. I think I had a problem because of the >>> special chars (< / >) that I'm trying to search for... Still >>> looking >>> for the right syntax. >> >> Why don't you show us what you've tried and the errors you're >> getting? That way we can help you figure out what you're doing wrong >> rather than just give you an answer. >> >> Michael Glaesemann > > SELECT substring(data_field from '<name>(.)</name>') > FROM myTable Looks like you might want to brush up on regular expressions in general. Your expression will match a single character between the <name> tags. You might want to try something like .+ instead. Michael Glaesemann grzm seespotcode net
On 8/23/07, Michael Glaesemann <grzm@seespotcode.net> wrote: > > On Aug 23, 2007, at 20:01 , Postgres User wrote: > > >>> Yes, I read the manual. I think I had a problem because of the > >>> special chars (< / >) that I'm trying to search for... Still > >>> looking > >>> for the right syntax. > >> > >> Why don't you show us what you've tried and the errors you're > >> getting? That way we can help you figure out what you're doing wrong > >> rather than just give you an answer. > >> > >> Michael Glaesemann > > > > SELECT substring(data_field from '<name>(.)</name>') > > FROM myTable > > Looks like you might want to brush up on regular expressions in > general. Your expression will match a single character between the > <name> tags. You might want to try something like .+ instead. > > Michael Glaesemann You're right, that was a typo, I didn't copy and paste. I found the problem, I was using 2 forward slashes instead of a backslash + forward slash when pattern matching. The correct regex to extract my substring: substring(data_field from '<name>(.+)<\/name>')
On Aug 23, 2007, at 21:08 , Postgres User wrote: > You're right, that was a typo, I didn't copy and paste. > I found the problem, I was using 2 forward slashes instead of a > backslash + forward slash when pattern matching. The correct regex to > extract my substring: > > substring(data_field from '<name>(.+)<\/name>') I don't think the backslash is actually doing anything, (though two forward slashes would definitely affect your result) as the slash doesn't have a special meaning in the regexp. test=# select substring('address city here <name>Rogers, Jim</name> zip code place' from '<name>(.+)<\/name>'); substring ------------- Rogers, Jim (1 row) test=# select substring('address city here <name>Rogers, Jim</name> zip code place' from '<name>(.+)</name>'); substring ------------- Rogers, Jim (1 row) Some scripting languages that use slashes to delimit regular expressions, and therefore require slashes to be escaped, because otherwise the slash would prematurely end the regexp. In past versions of PostgreSQL, a backslash was used to escape single quotes and enter other characters (e.g., \n). This is contrary to the SQL spec, so you can now turn off this behavior by turning on standard_conforming_strings. You'll see warnings if you use a backslash in 8.2 with standard_conforming_strings off. test=# show standard_conforming_strings; standard_conforming_strings ----------------------------- off (1 row) test=# select substring('address city here <name>Rogers, Jim</name> zip code place' from '<name>(.+)<\/name>'); WARNING: nonstandard use of escape in a string literal LINE 1: ...ere <name>Rogers, Jim</name> zip code place' from '<name>(. +... ^ HINT: Use the escape string syntax for escapes, e.g., E'\r\n'. substring ------------- Rogers, Jim (1 row) Michael Glaesemann grzm seespotcode net
On Thu, 2007-08-23 at 19:08 -0700, Postgres User wrote: > On 8/23/07, Michael Glaesemann <grzm@seespotcode.net> wrote: > > > > On Aug 23, 2007, at 20:01 , Postgres User wrote: > > > > >>> Yes, I read the manual. I think I had a problem because of the > > >>> special chars (< / >) that I'm trying to search for... Still > > >>> looking > > >>> for the right syntax. > > >> > > >> Why don't you show us what you've tried and the errors you're > > >> getting? That way we can help you figure out what you're doing wrong > > >> rather than just give you an answer. > > >> > > >> Michael Glaesemann > > > > > > SELECT substring(data_field from '<name>(.)</name>') > > > FROM myTable > > > > Looks like you might want to brush up on regular expressions in > > general. Your expression will match a single character between the > > <name> tags. You might want to try something like .+ instead. > > > > Michael Glaesemann > > You're right, that was a typo, I didn't copy and paste. > I found the problem, I was using 2 forward slashes instead of a > backslash + forward slash when pattern matching. The correct regex to > extract my substring: > > substring(data_field from '<name>(.+)<\/name>') FWIW, I find the following site extremely useful when trying to create moderately complex regular expressions: http://www.rexv.org. HTH, Mark. -- ILande - Open Source Consultancy http://www.ilande.co.uk
On Fri, 24 Aug 2007 06:31:58 +0100 Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> wrote: > > substring(data_field from '<name>(.+)<\/name>') > > FWIW, I find the following site extremely useful when trying to > create moderately complex regular expressions: http://www.rexv.org. Nice site - here's another good one for an installed app that is excellent for even advanced regexp's: http://weitz.de/regex-coach/ It used to be cross-platform but now he only develops a Windows version. Thanks, Josh