Thread: PL/Python patch for Universal Newline Support

PL/Python patch for Universal Newline Support

From
Michael Fuhr
Date:
The attached patch mimics Universal Newline Support for PL/Python
functions.  For background, see PEP 278 and the recent "plpython
function problem workaround" thread in pgsql-general:

http://www.python.org/peps/pep-0278.html
http://archives.postgresql.org/pgsql-general/2005-03/msg00823.php

In short, embedded Python code must have lines that end in \n, not
\r\n or \r.  This patch adds logic to PLy_procedure_munge_source()
to skip \r followed by \n, otherwise to convert \r to \n.  I tested
it in HEAD but REL8_0_STABLE has the same version of plpython.c (1.58),
so it should work there as well.

Should the PL/Python documentation mention this behavior?  If so
then I can resubmit with a documentation patch.

How should I submit regression tests?  Here's what I did:

CREATE FUNCTION test_lf() RETURNS integer AS
'x=100\ny=23\nreturn x+y\n'
LANGUAGE plpythonu;

CREATE FUNCTION test_crlf() RETURNS integer AS
'x=100\r\ny=23\r\nreturn x+y\r\n'
LANGUAGE plpythonu;

CREATE FUNCTION test_cr() RETURNS integer AS
'x=100\ry=23\rreturn x+y\r'
LANGUAGE plpythonu;

Here's what an unpatched system does:

SELECT test_lf();
 test_lf
---------
     123
(1 row)

SELECT test_cr();
ERROR:  plpython: could not compile function "test_cr"
DETAIL:  exceptions.SyntaxError: invalid syntax (line 2)

SELECT test_crlf();
ERROR:  plpython: could not compile function "test_crlf"
DETAIL:  exceptions.SyntaxError: invalid syntax (line 2)

Here's a patched system:

SELECT test_lf();
 test_lf
---------
     123
(1 row)

SELECT test_cr();
 test_cr
---------
     123
(1 row)

SELECT test_crlf();
 test_crlf
-----------
       123
(1 row)

This patch doesn't address the indentation problem mentioned here:

http://archives.postgresql.org/pgsql-general/2005-03/msg00762.php

The solution to that problem is still being discussed.

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/

Attachment

Re: PL/Python patch for Universal Newline Support

From
Neil Conway
Date:
Michael Fuhr wrote:
> Should the PL/Python documentation mention this behavior?

Isn't this the behavior the user would expect? If so, I guess it's okay
not to document it.

> How should I submit regression tests?

Yes, please.

> *** src/pl/plpython/plpython.c    17 Dec 2004 02:14:48 -0000    1.58
> --- src/pl/plpython/plpython.c    19 Mar 2005 04:29:55 -0000
> ***************
> *** 1206,1215 ****
>
>       while (*sp != '\0')
>       {
> !         if (*sp == '\n')
>           {
> !             *mp++ = *sp++;
>               *mp++ = '\t';
>           }
>           else
>               *mp++ = *sp++;
> --- 1206,1219 ----
>
>       while (*sp != '\0')
>       {
> !         if (*sp == '\r' && *(sp + 1) == '\n')
> !             sp++;
> !
> !         if (*sp == '\n' || *sp == '\r')
>           {
> !             *mp++ = '\n';
>               *mp++ = '\t';
> +             sp++;
>           }
>           else
>               *mp++ = *sp++;

Does this work for "\r\n" embedded in string literals?

-Neil

Re: PL/Python patch for Universal Newline Support

From
Tom Lane
Date:
Neil Conway <neilc@samurai.com> writes:
> Does this work for "\r\n" embedded in string literals?

I believe we'd concluded that Python will unconditionally convert all
\r\n to \n when reading any text file --- including script files ---
and therefore that's what Python programmers will expect to have happen
to scripts.  In other words we should deliberately be non-syntax-aware
when stripping \r.

            regards, tom lane

Re: PL/Python patch for Universal Newline Support

From
Michael Fuhr
Date:
On Mon, Mar 21, 2005 at 10:28:28AM +1100, Neil Conway wrote:
> Michael Fuhr wrote:
>
> >How should I submit regression tests?
>
> Yes, please.

The operative word there was "how" :-)  I don't see anything testing
PL/{Python,Perl,Tcl} under src/test/regress -- should I put something
there?  Can regression tests be run conditionally?  If so, what's
the preferred way to do that?  If regression tests for this kind
of patch need to be done another way, how should I submit them?

> Does this work for "\r\n" embedded in string literals?

Literal carriage returns (ASCII character 13) will be translated,
just as Python would do when reading a script from a file if Python
is built with Universal Newline Support (enabled by default, at
least in recent versions).  Escape sequences won't be touched if
they're stored in prosrc as escape sequences (backslash-r) instead
of actual carriage returns.  For example:

CREATE FUNCTION foo() RETURNS text AS $$
return "\r\n"
$$ LANGUAGE plpythonu IMMUTABLE;

SELECT prosrc FROM pg_proc WHERE proname = 'foo';
     prosrc
-----------------

return "\r\n"

(1 row)

SELECT length(foo()), ascii(foo()), ascii(substr(foo(), 2));
 length | ascii | ascii
--------+-------+-------
      2 |    13 |    10
(1 row)

But the following fails (the function is in single quotes instead
of dollar quotes, so \r\n becomes an actual CRLF):

CREATE OR REPLACE FUNCTION foo() RETURNS text AS '
return "\r\n"
' LANGUAGE plpythonu IMMUTABLE;

SELECT prosrc FROM pg_proc WHERE proname = 'foo';
    prosrc
---------------

return "
"

(1 row)

SELECT foo();
ERROR:  plpython: could not compile function "foo"
DETAIL:  exceptions.SyntaxError: EOL while scanning single-quoted string (line 3)

An ordinary Python script that looked like that would fail as well:

% cat -v foo.py
print "^M
"

% python foo.py
  File "foo.py", line 1
    print "
          ^
SyntaxError: EOL while scanning single-quoted string

Note Python's translation in this case:

% cat -v foo.py
print """a^M
b^M
c^M
"""

% python foo.py | od -tx1
0000000    61  0a  62  0a  63  0a  0a
0000007

(The extra newline is appended by "print".)

I was thinking that the patch could be applied to HEAD and hopefully
somebody could do some additional testing with Windows clients and
servers.  If there are no problems, and especially if the patch
solves the problem it's intended to solve, then the patch could be
applied to REL8_0_STABLE so it would be in 8.0.2 whenever that comes
out.  Tom, that sounded reasonable to you, didn't it?

http://archives.postgresql.org/pgsql-general/2005-03/msg00842.php

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/

Re: PL/Python patch for Universal Newline Support

From
Tom Lane
Date:
Michael Fuhr <mike@fuhr.org> writes:
> The operative word there was "how" :-)  I don't see anything testing
> PL/{Python,Perl,Tcl} under src/test/regress -- should I put something
> there?

No.

The PLs have their own regression tests in their individual src/pl
directories; feel free to hack on those (most are pretty lame :-()

            regards, tom lane

Re: PL/Python patch for Universal Newline Support

From
Michael Fuhr
Date:
On Mon, Mar 21, 2005 at 02:50:47AM -0500, Tom Lane wrote:
> Michael Fuhr <mike@fuhr.org> writes:
> > The operative word there was "how" :-)  I don't see anything testing
> > PL/{Python,Perl,Tcl} under src/test/regress -- should I put something
> > there?
>
> No.
>
> The PLs have their own regression tests in their individual src/pl
> directories; feel free to hack on those (most are pretty lame :-()

Thanks -- I'll add some tests there and resubmit.

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/

Re: PL/Python patch for Universal Newline Support

From
Tom Lane
Date:
Michael Fuhr <mike@fuhr.org> writes:
> On Mon, Mar 21, 2005 at 02:50:47AM -0500, Tom Lane wrote:
>> The PLs have their own regression tests in their individual src/pl
>> directories; feel free to hack on those (most are pretty lame :-()

> Thanks -- I'll add some tests there and resubmit.

One thing that needs some thought is how you are going to test this
robustly.  I'd not feel any great deal of confidence in a test that
expects that we can push \r\n sequences into CVS and expect them to
survive unmodified into distributed filesets.  Every so often someone
commits a DOS-newlines file into CVS, and they hear about it PDQ...

            regards, tom lane

Re: PL/Python patch for Universal Newline Support

From
Michael Fuhr
Date:
On Mon, Mar 21, 2005 at 03:02:57AM -0500, Tom Lane wrote:
>
> One thing that needs some thought is how you are going to test this
> robustly.  I'd not feel any great deal of confidence in a test that
> expects that we can push \r\n sequences into CVS and expect them to
> survive unmodified into distributed filesets.  Every so often someone
> commits a DOS-newlines file into CVS, and they hear about it PDQ...

We're testing what happens when \r gets into prosrc, right?  Shouldn't
creating the function with \r in single quotes instead of dollar
quotes work, as in the tests I showed in my original submission?

http://archives.postgresql.org/pgsql-patches/2005-03/msg00186.php

The functions with \r failed as expected in an unpatched system,
and they worked in a patched system.  The CREATE FUNCTION statements
used \r escape sequences, so that shouldn't be a problem with CVS
since they're not actual carriage returns in the SQL, right?

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/

Re: PL/Python patch for Universal Newline Support

From
Michael Fuhr
Date:
On Mon, Mar 21, 2005 at 01:19:35AM -0700, Michael Fuhr wrote:

> We're testing what happens when \r gets into prosrc, right?  Shouldn't
> creating the function with \r in single quotes instead of dollar
> quotes work, as in the tests I showed in my original submission?

...or what about a little script or C program that generates CREATE
FUNCTION statements with various line endings?  I suppose that would
more accurately simulate what client apps are doing.

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/

Re: PL/Python patch for Universal Newline Support

From
"Andrew Dunstan"
Date:
Michael Fuhr said:
> On Mon, Mar 21, 2005 at 03:02:57AM -0500, Tom Lane wrote:
>>
>> One thing that needs some thought is how you are going to test this
>> robustly.  I'd not feel any great deal of confidence in a test that
>> expects that we can push \r\n sequences into CVS and expect them to
>> survive unmodified into distributed filesets.  Every so often someone
>> commits a DOS-newlines file into CVS, and they hear about it PDQ...
>
> We're testing what happens when \r gets into prosrc, right?  Shouldn't
> creating the function with \r in single quotes instead of dollar
> quotes work, as in the tests I showed in my original submission?
>

Yes - also worked in testing CSV multiline support - see copy regression
test.
cheers

andrew



Re: PL/Python patch for Universal Newline Support

From
Tom Lane
Date:
Michael Fuhr <mike@fuhr.org> writes:
> We're testing what happens when \r gets into prosrc, right?  Shouldn't
> creating the function with \r in single quotes instead of dollar
> quotes work, as in the tests I showed in my original submission?

Yeah, that should work.  I was too tired to think of that last night ;-)

            regards, tom lane

Re: PL/Python patch for Universal Newline Support

From
Michael Fuhr
Date:
Here's a PL/Python patch that includes a few tests for Universal
Newline Support.  "gmake installcheck" in src/pl/plpython is
successful in HEAD on my Solaris 9 box running Python 2.4.1c2.

I ran the tests against an unpatched 8.0.1 server and they failed
as expected.

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/

Attachment

Re: PL/Python patch for Universal Newline Support

From
Tom Lane
Date:
Michael Fuhr <mike@fuhr.org> writes:
> Here's a PL/Python patch that includes a few tests for Universal
> Newline Support.

Applied to HEAD and 8.0 branch.

            regards, tom lane