Re: back references using regex - Mailing list pgsql-general
From | Matthew Peter |
---|---|
Subject | Re: back references using regex |
Date | |
Msg-id | 20050908235207.88561.qmail@web35214.mail.mud.yahoo.com Whole thread Raw |
In response to | Re: back references using regex (Michael Fuhr <mike@fuhr.org>) |
List | pgsql-general |
Thank you for your patience and such a complete answer. I'm not on the pgbox right now but those examples did help clarify how to reference the back references, which was my problem. I wasn't aware the 1st parenthesis must be counted as part of the regex, I assumed it was a wrapper. Thanks for helping me out and putting up with me. :) matt --- Michael Fuhr <mike@fuhr.org> wrote: > On Thu, Sep 08, 2005 at 01:52:35PM -0700, Matthew > Peter wrote: > > It's not a complex regex as I have wrote one that > does > > what I want, yet not at the database level. The > docs > > didn't help clarify anything. I'm still not clear > on > > how it determines where the back reference comes > from > > in the previous example you gave. And why digits > > wouldn't work. > > Back references work as they usually do in regular > expressions: > they refer to the matched value of a previous > parenthesized > subexpression. If you have multiple open > parentheses then you > need to refer to the correct subexpression to get > what you want. > Example: > > CREATE TABLE foo (t text); > INSERT INTO foo VALUES ('abc.foo.foo.xyz'); > INSERT INTO foo VALUES ('12.00.00.34'); > INSERT INTO foo VALUES ('abc.def.ghi'); > > SELECT t FROM foo WHERE t ~ $$(\w+)\.\1$$; > t > ----------------- > abc.foo.foo.xyz > 12.00.00.34 > (2 rows) > > In the above query the regular expression has only > one set of > parentheses, so the back reference refers to \1. > The result set > contains the two rows that have one or more word > characters followed > by a dot followed by the same set of characters. > > In the following query, note the difference between > the regular > expression in the select list and the one in the > where clause: > > SELECT substring(t FROM $$((\w+)\.\2)$$) > FROM foo > WHERE t ~ $$(\w+)\.\1$$; > substring > ----------- > foo.foo > 00.00 > (2 rows) > > In the regular expression in the select list, we use > the outermost > set of parentheses for grouping, so the back > reference needs to > refer to the subexpression that begins with the > second open > parenthesis (i.e., we must use \2). In the regular > expression > in the where clause, we have only one set of > parentheses so the > back reference is \1. > > Regarding digits, you didn't post any output in your > example, so > we don't know if it really doesn't work or if it > just doesn't do > what you were expecting. Here's what I get from > your examples: > > SELECT substring('12.00.00.34' FROM > $$((\d+)\.\2)$$); > substring > ----------- > 00.00 > (1 row) > > SELECT substring('12.00.00.34' FROM > $$((\w+)\.\2)$$); > substring > ----------- > 00.00 > (1 row) > > SELECT substring('abc.foo.foo.xyz' FROM > $$((\w+)\.\2)$$); > substring > ----------- > foo.foo > (1 row) > > Do you get different results, or do these results > surprise you? > They all appear to be correct. > > > What I basically want to do is have a slice > function > > like Python, where I can slice out items from a > \s, \. > > or \n\n separated list. Where I'll just change the > > delimiter for the query that it applies. > > > > Where I could start it at a certain point and end > it > > at another. Like slicing out paragraph 3-6 > (delimiter > > \n\n) or the 2nd-6th sentence in a article > (delimiter > > \.). That is what I am trying to do. > > You can use split_part() to get a single item or > string_to_array() > to build an array from which you can extract > multiple items. > > CREATE TABLE foo (t text); > INSERT INTO foo VALUES > ('one.two.three.four.five.six.'); > > SELECT (string_to_array(t, '.'))[3:5] FROM foo; > string_to_array > ------------------- > {three,four,five} > (1 row) > > SELECT array_to_string((string_to_array(t, > '.'))[3:5], '.') FROM foo; > array_to_string > ----------------- > three.four.five > (1 row) > > Is that what you're looking for? > > -- > Michael Fuhr > > ---------------------------(end of > broadcast)--------------------------- > TIP 2: Don't 'kill -9' the postmaster > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
pgsql-general by date: