Thread: New string functions; initdb required

New string functions; initdb required

From

Thomas Lockhart

Date:

11 June 2002, 12:51:53

I've just committed changes which implement three SQL99 functions and
operators. OVERLAY() allows substituting a string into another string,
SIMILAR TO is an operator for pattern matching, and a new variant of
SUBSTRING() accepts a pattern to match.

Regression tests have been augmented and pass. Docs have been updated.
The system catalogs were updated, so it is initdb time. Details from the
cvs log below...
                    - Thomas

Implement SQL99 OVERLAY(). Allows substitution of a substring in a
string.
Implement SQL99 SIMILAR TO as a synonym for our existing operator "~".
Implement SQL99 regular expression SUBSTRING(string FROM pat FOR
escape).Extend the definition to make the FOR clause optional.Define textregexsubstr() to actually implement this
feature.
Update the regression test to include these new string features.All tests pass.
Rename the regular expression support routines from "pg95_xxx" to
"pg_xxx".
Define CREATE CHARACTER SET in the parser per SQL99. No implementation
yet.

Re: New string functions; initdb required

From

Josh Berkus

Date:

11 June 2002, 14:32:50

Thomas,

> I've just committed changes which implement three SQL99 functions and
> operators. OVERLAY() allows substituting a string into another string,
> SIMILAR TO is an operator for pattern matching, and a new variant of
> SUBSTRING() accepts a pattern to match.

Way cool!  Thank you ... this replaces several of my custom PL/pgSQL
functions.

How is SIMILAR TO different from ~ ?

--
-Josh Berkus

______AGLIO DATABASE SOLUTIONS___________________________                                       Josh Berkus  Complete
informationtechnology     josh@agliodbs.com   and data management solutions     (415) 565-7293  for law firms, small
businesses     fax 621-2533   and non-profit organizations.     San Francisco

Re: New string functions; initdb required

From

"Ross J. Reedstrom"

Date:

11 June 2002, 17:02:40

On Tue, Jun 11, 2002 at 11:08:11AM -0700, Josh Berkus wrote:
> Thomas,
> 
> > I've just committed changes which implement three SQL99 functions and
> > operators. OVERLAY() allows substituting a string into another string,
> > SIMILAR TO is an operator for pattern matching, and a new variant of
> > SUBSTRING() accepts a pattern to match.
> 
> Way cool!  Thank you ... this replaces several of my custom PL/pgSQL 
> functions.
> 
> How is SIMILAR TO different from ~ ?

From the part of Thomas's email you snipped:
  Implement SQL99 SIMILAR TO as a synonym for our existing operator "~".

So the answer is "not at all"

Ross

Re: New string functions; initdb required

From

Bruce Momjian

Date:

11 June 2002, 17:27:43

Thomas Lockhart wrote:
> I've just committed changes which implement three SQL99 functions and
> operators. OVERLAY() allows substituting a string into another string,
> SIMILAR TO is an operator for pattern matching, and a new variant of

TODO item marked as done:
* -Add SIMILAR TO to allow character classes, 'pg_[a-c]%'

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026

Re: New string functions; initdb required

From

Thomas Lockhart

Date:

11 June 2002, 17:51:48

> TODO item marked as done:
>         * -Add SIMILAR TO to allow character classes, 'pg_[a-c]%'

Darn. Will have to be more careful next time ;)
               - Thomas

Re: New string functions; initdb required

From

Thomas Lockhart

Date:

11 June 2002, 17:53:37

> > How is SIMILAR TO different from ~ ?
> >From the part of Thomas's email you snipped:
>    Implement SQL99 SIMILAR TO as a synonym for our existing operator "~".
> So the answer is "not at all"

Right. I'm not certain about the regex syntax defined by SQL99; I used
the syntax that we already have enabled and it looks like we have a
couple of other variants available if we need them. If someone wants to
research the *actual* syntax specified by SQL99 that would be good...
                 - Thomas

Re: New string functions; initdb required

From

"Ken Hirsch"

Date:

12 June 2002, 19:08:36

Thomas Lockhart wrote:
> Right. I'm not certain about the regex syntax defined by SQL99; I used
> the syntax that we already have enabled and it looks like we have a
> couple of other variants available if we need them. If someone wants to
> research the *actual* syntax specified by SQL99 that would be good...
As usual: ( ) + * [ ] |Instead of dot . there is underscore _There is % to mean .* just like LIKEThere is no ? or ^ or
$
Regular expressions match the whole string, as if there were animplicit ^ before and $ after the pattern. You have to
add% ifyou want to match anywhere in a string.

As far as I can tell, there is no default escape character like \but you can specify one.

8.6 Similar predicate
Function
Specify a character string similarity by means of a regular expression.

Format
 <similar predicate> ::=       <character match value> [ NOT ] SIMILAR TO <similar pattern>       [ ESCAPE <escape
character>]        <similar pattern> ::= <character value expression>

 <regular expression> ::=       <regular term>       | <regular expression> <vertical bar> <regular term>
 <regular term> ::=       <regular factor>       | <regular term> <regular factor>
 <regular factor> ::=       <regular primary>       | <regular primary> <asterisk>       | <regular primary> <plus
sign>
 <regular primary> ::=       <character specifier>       | <percent>       | <regular character set>       | <left
paren><regular expression> <right paren>

 <character specifier> ::=       <non-escaped character>       | <escaped character>
 <non-escaped character> ::= !! See the Syntax Rules <escaped character> ::= !! See the Syntax Rules
 <regular character set> ::=       <underscore>       | <left bracket> <character enumeration>... <right bracket>
|<left bracket> <circumflex>            <character enumeration>... <right bracket>       | <left bracket> <colon>
<regularcharacter set identifier>             <colon> <right bracket>

 <character enumeration> ::=       <character specifier>       | <character specifier> <minus sign> <character
specifier>
 <regular character set identifier> ::= <identifier>

*stuff omitted*

3) The value of the <identifier> that is a <regular character set
identifier> shall be either ALPHA, UPPER, LOWER, DIGIT, or ALNUM.

*collating stuff omitted*

5) A <non-escaped character> is any single character from the
character set of the <similar pattern> that is not a <left bracket>,
<right bracket>, <left paren>, <right paren>, <vertical bar>,
<circumflex>, <minus sign>, <plus sign>, <asterisk>, <underscore>,
<percent>, or the character specified by the result of the <character
value expression> of <escape character>. A <character specifier> that
is a <non-escaped character> represents itself.

6) An <escaped character> is a sequence of two characters: the
character specified by the result of the <character value expression>
of <escape character>, followed by a second character that is a <left
bracket>, <right bracket>, <left paren>, <right paren>, <vertical
bar>, <circumflex>, <minus sign>, <plus sign>, <asterisk>,
<underscore>, <percent>, or the character specified by the result of
the <character value expression> of <escape character>. A <character
specifier> that is an <escaped character> represents its second
character.

Re: New string functions; initdb required

From

Thomas Lockhart

Date:

13 June 2002, 09:01:15

Thanks for the info! I have a question...

>  As usual: ( ) + * [ ] |
>  Instead of dot . there is underscore _
>  There is % to mean .* just like LIKE
>  There is no ? or ^ or $
>  Regular expressions match the whole string, as if there were an
>  implicit ^ before and $ after the pattern. You have to add % if
>  you want to match anywhere in a string.

Hmm. So if there are no explicit anchors then there must be a slightly
different syntax for the regular-expression version of the substring()
function? Otherwise, substrings would always have to start from the
first character, right?

Percents and underscores carried over from LIKE are really annoying.
I'll think about implementing an expression rewriter to convert SQL99 to
our modern regexp syntax.
                    - Thomas