Thread: substring and POSIX re's
Hi. First: PG version 7.4 and 8.0. I have a question regarding the following simplified query: gds2=# select substring('NE NE SE 2310 FSL 330 FEL' from '^([A-Z][A-Z] )+'); substring ----------- SE (1 row) The pg docs say that this form of substring uses POSIX re's, and my understanding of POSIX re's is they are always greedy. So, why do I get only SE instead of NE NE SE? Pilot error, probably, but would someone please enlighten me? Thank you very much. Don p.s. The target string can have from 1 to 6 of the 2 char strings, not just 3 as shown in this example.
Don Isgitt <djisgitt@soundenergy.com> writes: > gds2=# select substring('NE NE SE 2310 FSL 330 FEL' from '^([A-Z][A-Z] )+'); > substring > ----------- > SE > (1 row) > The pg docs say that this form of substring uses POSIX re's, and my > understanding of POSIX re's is they are always greedy. So, why do I get > only SE instead of NE NE SE? Pilot error, probably, but would someone > please enlighten me? Thank you very much. I think you want regression=# select substring('NE NE SE 2310 FSL 330 FEL' from '^(([A-Z][A-Z] )+)'); substring ----------- NE NE SE (1 row) ie, you need the "+" to be *inside* the capturing parentheses. When it's outside, I guess the engine chooses to consider the last match of the parenthesized subexpression as the thing to return. (I can't recall if this choice is specified in the docs or not.) regards, tom lane
Tom Lane wrote: >Don Isgitt <djisgitt@soundenergy.com> writes: > > >>gds2=# select substring('NE NE SE 2310 FSL 330 FEL' from '^([A-Z][A-Z] )+'); >> substring >>----------- >> SE >>(1 row) >> >> > > > >>The pg docs say that this form of substring uses POSIX re's, and my >>understanding of POSIX re's is they are always greedy. So, why do I get >>only SE instead of NE NE SE? Pilot error, probably, but would someone >>please enlighten me? Thank you very much. >> >> > >I think you want > >regression=# select substring('NE NE SE 2310 FSL 330 FEL' from '^(([A-Z][A-Z] )+)'); > substring >----------- > NE NE SE >(1 row) > >ie, you need the "+" to be *inside* the capturing parentheses. When >it's outside, I guess the engine chooses to consider the last match >of the parenthesized subexpression as the thing to return. (I can't >recall if this choice is specified in the docs or not.) > > regards, tom lane > > > > Thanks, Tom. Interestingly enough, neither my original query or your corrected one returns anything with pg 7.4--another good reason to upgrade to 8.* Don
On Tue, Apr 19, 2005 at 11:17:46AM -0500, Don Isgitt wrote: > > Thanks, Tom. Interestingly enough, neither my original query or your > corrected one returns anything with pg 7.4--another good reason to > upgrade to 8.* Hmmm...for me both queries give the results shown if I run them in 7.3.9, 7.4.7, 8.0.2, or 8.1devel. What's different about your 7.4 installation? -- Michael Fuhr http://www.fuhr.org/~mfuhr/
On Tue, Apr 19, 2005 at 10:03:45AM -0600, Michael Fuhr wrote: > On Tue, Apr 19, 2005 at 11:17:46AM -0500, Don Isgitt wrote: > > > > Thanks, Tom. Interestingly enough, neither my original query or your > > corrected one returns anything with pg 7.4--another good reason to > > upgrade to 8.* > > Hmmm...for me both queries give the results shown if I run them in > 7.3.9, 7.4.7, 8.0.2, or 8.1devel. What's different about your 7.4 > installation? Maybe the regex_flavor setting? (not sure of the exact name) -- Alvaro Herrera (<alvherre[@]dcc.uchile.cl>) Maybe there's lots of data loss but the records of data loss are also lost. (Lincoln Yeoh)
On Tue, Apr 19, 2005 at 12:39:52PM -0400, Alvaro Herrera wrote: > On Tue, Apr 19, 2005 at 10:03:45AM -0600, Michael Fuhr wrote: > > On Tue, Apr 19, 2005 at 11:17:46AM -0500, Don Isgitt wrote: > > > > > > Thanks, Tom. Interestingly enough, neither my original query or your > > > corrected one returns anything with pg 7.4--another good reason to > > > upgrade to 8.* > > > > Hmmm...for me both queries give the results shown if I run them in > > 7.3.9, 7.4.7, 8.0.2, or 8.1devel. What's different about your 7.4 > > installation? > > Maybe the regex_flavor setting? (not sure of the exact name) Ah yes, I forgot about that.... test=> SET regex_flavor TO basic; SET test=> SELECT substring('NE NE SE 2310 FSL 330 FEL' FROM '^(([A-Z][A-Z] )+)'); substring ----------- (1 row) test=> SET regex_flavor TO advanced; SET test=> SELECT substring('NE NE SE 2310 FSL 330 FEL' FROM '^(([A-Z][A-Z] )+)'); substring ----------- NE NE SE (1 row) test=> SELECT version(); version --------------------------------------------------------------------------- PostgreSQL 7.4.7 on sparc-sun-solaris2.9, compiled by GCC gcc (GCC) 3.4.2 (1 row) -- Michael Fuhr http://www.fuhr.org/~mfuhr/
Michael Fuhr wrote: >On Tue, Apr 19, 2005 at 12:39:52PM -0400, Alvaro Herrera wrote: > > >>On Tue, Apr 19, 2005 at 10:03:45AM -0600, Michael Fuhr wrote: >> >> >>>On Tue, Apr 19, 2005 at 11:17:46AM -0500, Don Isgitt wrote: >>> >>> >>>>Thanks, Tom. Interestingly enough, neither my original query or your >>>>corrected one returns anything with pg 7.4--another good reason to >>>>upgrade to 8.* >>>> >>>> >>>Hmmm...for me both queries give the results shown if I run them in >>>7.3.9, 7.4.7, 8.0.2, or 8.1devel. What's different about your 7.4 >>>installation? >>> >>> >>Maybe the regex_flavor setting? (not sure of the exact name) >> >> > >Ah yes, I forgot about that.... > >test=> SET regex_flavor TO basic; >SET >test=> SELECT substring('NE NE SE 2310 FSL 330 FEL' FROM '^(([A-Z][A-Z] )+)'); > substring >----------- > >(1 row) > >test=> SET regex_flavor TO advanced; >SET >test=> SELECT substring('NE NE SE 2310 FSL 330 FEL' FROM '^(([A-Z][A-Z] )+)'); > substring >----------- > NE NE SE >(1 row) > >test=> SELECT version(); > version >--------------------------------------------------------------------------- > PostgreSQL 7.4.7 on sparc-sun-solaris2.9, compiled by GCC gcc (GCC) 3.4.2 >(1 row) > > > Thank you, Alvaro and Michael, The regex_flavor setting was the culprit; I never knew of such a creature in pg! Mystery solved. The members on this board are great. Don