Thread: New string functions; initdb required
I've just committed changes which implement three SQL99 functions and operators. OVERLAY() allows substituting a string into another string, SIMILAR TO is an operator for pattern matching, and a new variant of SUBSTRING() accepts a pattern to match. Regression tests have been augmented and pass. Docs have been updated. The system catalogs were updated, so it is initdb time. Details from the cvs log below... - Thomas Implement SQL99 OVERLAY(). Allows substitution of a substring in a string. Implement SQL99 SIMILAR TO as a synonym for our existing operator "~". Implement SQL99 regular expression SUBSTRING(string FROM pat FOR escape).Extend the definition to make the FOR clause optional.Define textregexsubstr() to actually implement this feature. Update the regression test to include these new string features.All tests pass. Rename the regular expression support routines from "pg95_xxx" to "pg_xxx". Define CREATE CHARACTER SET in the parser per SQL99. No implementation yet.
Thomas, > I've just committed changes which implement three SQL99 functions and > operators. OVERLAY() allows substituting a string into another string, > SIMILAR TO is an operator for pattern matching, and a new variant of > SUBSTRING() accepts a pattern to match. Way cool! Thank you ... this replaces several of my custom PL/pgSQL functions. How is SIMILAR TO different from ~ ? -- -Josh Berkus ______AGLIO DATABASE SOLUTIONS___________________________ Josh Berkus Complete informationtechnology josh@agliodbs.com and data management solutions (415) 565-7293 for law firms, small businesses fax 621-2533 and non-profit organizations. San Francisco
On Tue, Jun 11, 2002 at 11:08:11AM -0700, Josh Berkus wrote: > Thomas, > > > I've just committed changes which implement three SQL99 functions and > > operators. OVERLAY() allows substituting a string into another string, > > SIMILAR TO is an operator for pattern matching, and a new variant of > > SUBSTRING() accepts a pattern to match. > > Way cool! Thank you ... this replaces several of my custom PL/pgSQL > functions. > > How is SIMILAR TO different from ~ ? From the part of Thomas's email you snipped: Implement SQL99 SIMILAR TO as a synonym for our existing operator "~". So the answer is "not at all" Ross
Thomas Lockhart wrote: > I've just committed changes which implement three SQL99 functions and > operators. OVERLAY() allows substituting a string into another string, > SIMILAR TO is an operator for pattern matching, and a new variant of TODO item marked as done: * -Add SIMILAR TO to allow character classes, 'pg_[a-c]%' -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> TODO item marked as done: > * -Add SIMILAR TO to allow character classes, 'pg_[a-c]%' Darn. Will have to be more careful next time ;) - Thomas
> > How is SIMILAR TO different from ~ ? > >From the part of Thomas's email you snipped: > Implement SQL99 SIMILAR TO as a synonym for our existing operator "~". > So the answer is "not at all" Right. I'm not certain about the regex syntax defined by SQL99; I used the syntax that we already have enabled and it looks like we have a couple of other variants available if we need them. If someone wants to research the *actual* syntax specified by SQL99 that would be good... - Thomas
Thomas Lockhart wrote: > Right. I'm not certain about the regex syntax defined by SQL99; I used > the syntax that we already have enabled and it looks like we have a > couple of other variants available if we need them. If someone wants to > research the *actual* syntax specified by SQL99 that would be good... As usual: ( ) + * [ ] |Instead of dot . there is underscore _There is % to mean .* just like LIKEThere is no ? or ^ or $ Regular expressions match the whole string, as if there were animplicit ^ before and $ after the pattern. You have to add% ifyou want to match anywhere in a string. As far as I can tell, there is no default escape character like \but you can specify one. 8.6 Similar predicate Function Specify a character string similarity by means of a regular expression. Format <similar predicate> ::= <character match value> [ NOT ] SIMILAR TO <similar pattern> [ ESCAPE <escape character>] <similar pattern> ::= <character value expression> <regular expression> ::= <regular term> | <regular expression> <vertical bar> <regular term> <regular term> ::= <regular factor> | <regular term> <regular factor> <regular factor> ::= <regular primary> | <regular primary> <asterisk> | <regular primary> <plus sign> <regular primary> ::= <character specifier> | <percent> | <regular character set> | <left paren><regular expression> <right paren> <character specifier> ::= <non-escaped character> | <escaped character> <non-escaped character> ::= !! See the Syntax Rules <escaped character> ::= !! See the Syntax Rules <regular character set> ::= <underscore> | <left bracket> <character enumeration>... <right bracket> |<left bracket> <circumflex> <character enumeration>... <right bracket> | <left bracket> <colon> <regularcharacter set identifier> <colon> <right bracket> <character enumeration> ::= <character specifier> | <character specifier> <minus sign> <character specifier> <regular character set identifier> ::= <identifier> *stuff omitted* 3) The value of the <identifier> that is a <regular character set identifier> shall be either ALPHA, UPPER, LOWER, DIGIT, or ALNUM. *collating stuff omitted* 5) A <non-escaped character> is any single character from the character set of the <similar pattern> that is not a <left bracket>, <right bracket>, <left paren>, <right paren>, <vertical bar>, <circumflex>, <minus sign>, <plus sign>, <asterisk>, <underscore>, <percent>, or the character specified by the result of the <character value expression> of <escape character>. A <character specifier> that is a <non-escaped character> represents itself. 6) An <escaped character> is a sequence of two characters: the character specified by the result of the <character value expression> of <escape character>, followed by a second character that is a <left bracket>, <right bracket>, <left paren>, <right paren>, <vertical bar>, <circumflex>, <minus sign>, <plus sign>, <asterisk>, <underscore>, <percent>, or the character specified by the result of the <character value expression> of <escape character>. A <character specifier> that is an <escaped character> represents its second character.
Thanks for the info! I have a question... > As usual: ( ) + * [ ] | > Instead of dot . there is underscore _ > There is % to mean .* just like LIKE > There is no ? or ^ or $ > Regular expressions match the whole string, as if there were an > implicit ^ before and $ after the pattern. You have to add % if > you want to match anywhere in a string. Hmm. So if there are no explicit anchors then there must be a slightly different syntax for the regular-expression version of the substring() function? Otherwise, substrings would always have to start from the first character, right? Percents and underscores carried over from LIKE are really annoying. I'll think about implementing an expression rewriter to convert SQL99 to our modern regexp syntax. - Thomas