The following bug has been logged on the website:
Bug reference: 6381
Logged by: john melesky
Email address: code@phaedrusdeinus.org
PostgreSQL version: 9.1.1
Operating system: x86_64-pc-linux-gnu
Description:=20=20=20=20=20=20=20=20
This simple regexp returns correctly (that is, (.*?) matches
'blahblah.com'):
=3D# select regexp_matches('http://blahblah.com/asdf',
'http://(.*?)(/|%2f|$)');
regexp_matches=20=20
------------------
{blahblah.com,/}
This, more complex/complete version, matches greedily, which is incorrect:
=3D# select regexp_matches('http://blahblah.com/asdf',
'http(s?)(:|%3a)(//|%2f%2f)(.*?)(/|%2f|$)');
regexp_matches=20=20=20=20=20=20=20=20=20
--------------------------------
{"",:,//,blahblah.com/asdf,""}
(That is, (.*?) matches 'blahblah.com/asdf')
The problem appears to be the inclusion of '$' in the final paren group. So,
this works:
select regexp_matches('http://blahblah.com/asdf',
'http(s?)(:|%3a)(//|%2f%2f)(.*?)(/|%2f)');
regexp_matches=20=20=20=20=20=20
--------------------------
{"",:,//,blahblah.com,/}