Re: Can we make regexp processing more friendly by recognizing "\r\n" as a "newline" for "^$" purposes? - Mailing list pgsql-general

From Francisco Olarte
Subject Re: Can we make regexp processing more friendly by recognizing "\r\n" as a "newline" for "^$" purposes?
Date
Msg-id CA+bJJbzNHEqufUh=SUGJ_zSXU5TEAgdTgHqpzv_UZ9SVgg6KUg@mail.gmail.com
Whole thread Raw
In response to Can we make regexp processing more friendly by recognizing "\r\n" as a "newline" for "^$" purposes?  ("David G. Johnston" <david.g.johnston@gmail.com>)
Responses Re: Can we make regexp processing more friendly by recognizing "\r\n" as a "newline" for "^$" purposes?
List pgsql-general
Hi David:

On Sun, Oct 18, 2015 at 7:49 PM, David G. Johnston
<david.g.johnston@gmail.com> wrote:
> Other implementation of regular expressions handle "newline" mechanics
> related to "^" and "$" semantically instead of literally.  By that I mean
> that both "\r\n" and "\n" are considered "newlines" instead of just "\n".

Which ones ? AFAIK this kind of thing is usually done by C ( and
related ) runtimes when reading text files.

At least in my machine perl does not do it:

censored:~$ perl -e 'print( ("A\r\n" =~ /A$/) ? "matched\n" : "NO MATCH\n");'
NO MATCH
censored:~$ perl -e 'print( ("A\r\n" =~ /A.$/) ? "matched\n" : "NO MATCH\n");'
matched
censored:~$ perl -e 'print( ("A\r\n" =~ /A\s$/) ? "matched\n" : "NO MATCH\n");'
matched

Normally when reading lines in CP/M and related ( MSDOS, Windows ) the
CRT does collapse them ( and sometimes just zaps \r, or collapse any
run, or consider [\r*]\n[\r*] or.... ). But I normally do not see that
behaviour in regexes.

> If changing behavior is not desirable I would be content with another flag
> that would toggle such behavior.
> In code - both of these subqueries should match whereas presently only the
> first one does.
> SELECT regexp_matches(E'123\n',   E'123$', 'w');
> SELECT regexp_matches(E'123\r\n', E'123$', 'w');
> I don't know if this is server O/S dependent...but I would not expect it to
> be so.

Neither do I ( expect it to be os dep. ) , but I find the current
behaviour correct. I mean, newline stuff is OS dependent, and you
should convert when ingesting data, when matching them it should
already have been converted to whatever the language uses for newlines
( in C and perl that means \n, which needs not be \012, BTW . In unix
\n=\012 on disk, on CP/M it's \015\012 and when I worked with Mac (
before the unixy osX they use now ) it was \015, and I cannot think on
what they can use on EBCDIC machines ).

Francisco Olarte.


pgsql-general by date:

Previous
From: Jeff Janes
Date:
Subject: Re: Version management for extensions
Next
From: Sven Löschner
Date:
Subject: postgresql 9.4 streaming replication