Hi David:
On Sun, Oct 18, 2015 at 7:49 PM, David G. Johnston
<david.g.johnston@gmail.com> wrote:
> Other implementation of regular expressions handle "newline" mechanics
> related to "^" and "$" semantically instead of literally. By that I mean
> that both "\r\n" and "\n" are considered "newlines" instead of just "\n".
Which ones ? AFAIK this kind of thing is usually done by C ( and
related ) runtimes when reading text files.
At least in my machine perl does not do it:
censored:~$ perl -e 'print( ("A\r\n" =~ /A$/) ? "matched\n" : "NO MATCH\n");'
NO MATCH
censored:~$ perl -e 'print( ("A\r\n" =~ /A.$/) ? "matched\n" : "NO MATCH\n");'
matched
censored:~$ perl -e 'print( ("A\r\n" =~ /A\s$/) ? "matched\n" : "NO MATCH\n");'
matched
Normally when reading lines in CP/M and related ( MSDOS, Windows ) the
CRT does collapse them ( and sometimes just zaps \r, or collapse any
run, or consider [\r*]\n[\r*] or.... ). But I normally do not see that
behaviour in regexes.
> If changing behavior is not desirable I would be content with another flag
> that would toggle such behavior.
> In code - both of these subqueries should match whereas presently only the
> first one does.
> SELECT regexp_matches(E'123\n', E'123$', 'w');
> SELECT regexp_matches(E'123\r\n', E'123$', 'w');
> I don't know if this is server O/S dependent...but I would not expect it to
> be so.
Neither do I ( expect it to be os dep. ) , but I find the current
behaviour correct. I mean, newline stuff is OS dependent, and you
should convert when ingesting data, when matching them it should
already have been converted to whatever the language uses for newlines
( in C and perl that means \n, which needs not be \012, BTW . In unix
\n=\012 on disk, on CP/M it's \015\012 and when I worked with Mac (
before the unixy osX they use now ) it was \015, and I cannot think on
what they can use on EBCDIC machines ).
Francisco Olarte.