BUG #6381: Incorrect greediness behavior in certain regular expressions - Mailing list pgsql-bugs

From code@phaedrusdeinus.org
Subject BUG #6381: Incorrect greediness behavior in certain regular expressions
Date
Msg-id E1RixjF-0006gP-Dt@wrigleys.postgresql.org
Whole thread Raw
Responses Re: BUG #6381: Incorrect greediness behavior in certain regular expressions
List pgsql-bugs
The following bug has been logged on the website:

Bug reference:      6381
Logged by:          john melesky
Email address:      code@phaedrusdeinus.org
PostgreSQL version: 9.1.1
Operating system:   x86_64-pc-linux-gnu
Description:=20=20=20=20=20=20=20=20

This simple regexp returns correctly (that is, (.*?) matches
'blahblah.com'):

=3D# select regexp_matches('http://blahblah.com/asdf',
'http://(.*?)(/|%2f|$)');
  regexp_matches=20=20
------------------
 {blahblah.com,/}

This, more complex/complete version, matches greedily, which is incorrect:

=3D# select regexp_matches('http://blahblah.com/asdf',
'http(s?)(:|%3a)(//|%2f%2f)(.*?)(/|%2f|$)');
         regexp_matches=20=20=20=20=20=20=20=20=20
--------------------------------
 {"",:,//,blahblah.com/asdf,""}

(That is, (.*?) matches 'blahblah.com/asdf')

The problem appears to be the inclusion of '$' in the final paren group. So,
this works:

select regexp_matches('http://blahblah.com/asdf',
'http(s?)(:|%3a)(//|%2f%2f)(.*?)(/|%2f)');
      regexp_matches=20=20=20=20=20=20
--------------------------
 {"",:,//,blahblah.com,/}

pgsql-bugs by date:

Previous
From: David Fetter
Date:
Subject: Re: Proble Postgre SQL version 7.4.1
Next
From: Tom Lane
Date:
Subject: Re: BUG #6381: Incorrect greediness behavior in certain regular expressions