Thread: Regexp confusion

Regexp confusion

From
Doug Gorley
Date:
Trying to match some numbers, and I'm having some regexp problems.  I've
boiled it down to the following:

/* (1) */   select '3.14' similar to E'^\\d+\\.\\d+$';       -- true
/* (2) */   select '3.14' similar to E'^\\d+(\\.\\d+)$';     -- true
/* (3) */   select '3.14' similar to E'^\\d+(\\.\\d+)*$';    -- true
/* (4) */   select '3.14' similar to E'^\\d+(\\.\\d+)?$';    -- false
/* (5) */   select '3.14' similar to E'^\\d+(\\.\\d+)+$';    -- true

So, based on (1) and (2), the pattern '\.\d+' occurs once.  So why does
(4) return false?  between (3), (4), and (5), it appears as though the
group is matching multiple times.

Thanks,

--
------------------------------------------------------------------------
*Doug Gorley* | doug.gorley@gmail.com <mailto:doug.gorley@gmail.com>



Re: Regexp confusion

From
Alvaro Herrera
Date:
Doug Gorley escribió:
> Trying to match some numbers, and I'm having some regexp problems.
> I've boiled it down to the following:
>
> /* (1) */   select '3.14' similar to E'^\\d+\\.\\d+$';       -- true
> /* (2) */   select '3.14' similar to E'^\\d+(\\.\\d+)$';     -- true
> /* (3) */   select '3.14' similar to E'^\\d+(\\.\\d+)*$';    -- true
> /* (4) */   select '3.14' similar to E'^\\d+(\\.\\d+)?$';    -- false
> /* (5) */   select '3.14' similar to E'^\\d+(\\.\\d+)+$';    -- true
>
> So, based on (1) and (2), the pattern '\.\d+' occurs once.  So why
> does (4) return false?  between (3), (4), and (5), it appears as
> though the group is matching multiple times.

I think the confusion is about what SIMILAR TO supports.  ? it doesn't.
See here:
http://www.postgresql.org/docs/8.4/static/functions-matching.html#FUNCTIONS-SIMILARTO-REGEXP

You probably want to use ~ instead of SIMILAR TO.

(SIMILAR TO is a weird beast that the SQL committee came up with,
vaguely based on regular expressions.)

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: Regexp confusion

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Doug Gorley escribi�:
>> Trying to match some numbers, and I'm having some regexp problems.
>> I've boiled it down to the following:
>>
>> /* (1) */   select '3.14' similar to E'^\\d+\\.\\d+$';       -- true
>> /* (2) */   select '3.14' similar to E'^\\d+(\\.\\d+)$';     -- true
>> /* (3) */   select '3.14' similar to E'^\\d+(\\.\\d+)*$';    -- true
>> /* (4) */   select '3.14' similar to E'^\\d+(\\.\\d+)?$';    -- false
>> /* (5) */   select '3.14' similar to E'^\\d+(\\.\\d+)+$';    -- true
>>
>> So, based on (1) and (2), the pattern '\.\d+' occurs once.  So why
>> does (4) return false?  between (3), (4), and (5), it appears as
>> though the group is matching multiple times.

> I think the confusion is about what SIMILAR TO supports.  ? it doesn't.
> See here:
> http://www.postgresql.org/docs/8.4/static/functions-matching.html#FUNCTIONS-SIMILARTO-REGEXP

> You probably want to use ~ instead of SIMILAR TO.

> (SIMILAR TO is a weird beast that the SQL committee came up with,
> vaguely based on regular expressions.)

Hmm ... actually I think *none* of those should have succeeded, because
^ and $ are not supposed to be metacharacters in SIMILAR TO.  We are
failing to quote them, but apparently we need to --- it looks like the
regexp engine processes ^^ at the start of the pattern the same as ^,
and likewise for $$ at the end.

            regards, tom lane

Re: Regexp confusion

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> I think the confusion is about what SIMILAR TO supports.  ? it doesn't.

Actually, upon looking into SQL:2008, it seems it's supposed to support
? now, and also {m,n} style bounds.  Those weren't there in SQL99 ...

I've changed the similar_escape code to not escape ? and {, so that
those things will work now, and to escape ^ and $ instead.

            regards, tom lane