Re: Re: Re: Allow Reg Expressions in Position function - Mailing list pgsql-general

From Ken Tanzer
Subject Re: Re: Re: Allow Reg Expressions in Position function
Date
Msg-id CAD3a31UT7YO1oWNi8EUnYgxUKP-L4KXoeGemY0qQSpzoPSsqOg@mail.gmail.com
Whole thread Raw
List pgsql-general

Gives 5. It's wrong.

True.  Though your SO example didn't have the https in it.

 
For some reason, substring() returns the parenthesised subexpression rather than the top level..

The comment in testregexsubstr does say that it does this, but it's not clear from the documentation at all, unless I'm missing where it says it.

The description of substring (https://www.postgresql.org/docs/9.6/static/functions-string.html) says "See Section 9.7 for more information on pattern matching."  Section 9.7.3 ("POSIX Regular Expressions") says this:

The substring function with two parameters, substring(string from pattern), provides extraction of a substring that matches a POSIX regular expression pattern. It returns null if there is no match, otherwise the portion of the text that matched the pattern. But if the pattern contains any parentheses, the portion of the text that matched the first parenthesized subexpression (the one whose left parenthesis comes first) is returned. You can put parentheses around the whole expression if you want to use parentheses within it without triggering this exception.

 
You can work around this by putting parentheses around the whole expression, because that way the first subexpression is the whole match.

db=# SELECT position(substring('https://www.webexample.com/s/help?' FROM '(/(s|b|t)/)') IN 'https://www.webexample.com/s/help?');
 position
----------
       27

Geoff

Thanks for the correction!

A less-fragile way to do this is to comparing length of whole string to length of matched string.  See below.
 
For the greater good I sent the email requesting to allow reg exp in the position functions.
Not sure if you will implement it... Just wanted to let you know that the limited capabilities of this function create overhead.

Not clear to me if you're talking CPU overhead or syntax complexity.  But if you really want this function for yourself, you can have it.

CREATE OR REPLACE FUNCTION position_regex (text, text) RETURNS INT AS $$

SELECT COALESCE(LENGTH($2)-LENGTH(SUBSTRING($2,'('||$1||'.*)$'))+1,0);

$$ LANGUAGE SQL IMMUTABLE;

Cheers,

Ken


--
AGENCY Software  
A Free Software data system
By and for non-profits
(253) 245-3801

learn more about AGENCY or
follow the discussion.

pgsql-general by date:

Previous
From: Nicolas Paris
Date:
Subject: Re: Multiple COPY on the same table
Next
From: Vijaykumar Jain
Date:
Subject: Re: [External] Multiple COPY on the same table