Re: back references using regex - Mailing list pgsql-general
From | Michael Fuhr |
---|---|
Subject | Re: back references using regex |
Date | |
Msg-id | 20050908225459.GA63089@winnie.fuhr.org Whole thread Raw |
In response to | Re: back references using regex (Matthew Peter <survivedsushi@yahoo.com>) |
Responses |
Re: back references using regex
Re: back references using regex |
List | pgsql-general |
On Thu, Sep 08, 2005 at 01:52:35PM -0700, Matthew Peter wrote: > It's not a complex regex as I have wrote one that does > what I want, yet not at the database level. The docs > didn't help clarify anything. I'm still not clear on > how it determines where the back reference comes from > in the previous example you gave. And why digits > wouldn't work. Back references work as they usually do in regular expressions: they refer to the matched value of a previous parenthesized subexpression. If you have multiple open parentheses then you need to refer to the correct subexpression to get what you want. Example: CREATE TABLE foo (t text); INSERT INTO foo VALUES ('abc.foo.foo.xyz'); INSERT INTO foo VALUES ('12.00.00.34'); INSERT INTO foo VALUES ('abc.def.ghi'); SELECT t FROM foo WHERE t ~ $$(\w+)\.\1$$; t ----------------- abc.foo.foo.xyz 12.00.00.34 (2 rows) In the above query the regular expression has only one set of parentheses, so the back reference refers to \1. The result set contains the two rows that have one or more word characters followed by a dot followed by the same set of characters. In the following query, note the difference between the regular expression in the select list and the one in the where clause: SELECT substring(t FROM $$((\w+)\.\2)$$) FROM foo WHERE t ~ $$(\w+)\.\1$$; substring ----------- foo.foo 00.00 (2 rows) In the regular expression in the select list, we use the outermost set of parentheses for grouping, so the back reference needs to refer to the subexpression that begins with the second open parenthesis (i.e., we must use \2). In the regular expression in the where clause, we have only one set of parentheses so the back reference is \1. Regarding digits, you didn't post any output in your example, so we don't know if it really doesn't work or if it just doesn't do what you were expecting. Here's what I get from your examples: SELECT substring('12.00.00.34' FROM $$((\d+)\.\2)$$); substring ----------- 00.00 (1 row) SELECT substring('12.00.00.34' FROM $$((\w+)\.\2)$$); substring ----------- 00.00 (1 row) SELECT substring('abc.foo.foo.xyz' FROM $$((\w+)\.\2)$$); substring ----------- foo.foo (1 row) Do you get different results, or do these results surprise you? They all appear to be correct. > What I basically want to do is have a slice function > like Python, where I can slice out items from a \s, \. > or \n\n separated list. Where I'll just change the > delimiter for the query that it applies. > > Where I could start it at a certain point and end it > at another. Like slicing out paragraph 3-6 (delimiter > \n\n) or the 2nd-6th sentence in a article (delimiter > \.). That is what I am trying to do. You can use split_part() to get a single item or string_to_array() to build an array from which you can extract multiple items. CREATE TABLE foo (t text); INSERT INTO foo VALUES ('one.two.three.four.five.six.'); SELECT (string_to_array(t, '.'))[3:5] FROM foo; string_to_array ------------------- {three,four,five} (1 row) SELECT array_to_string((string_to_array(t, '.'))[3:5], '.') FROM foo; array_to_string ----------------- three.four.five (1 row) Is that what you're looking for? -- Michael Fuhr
pgsql-general by date: