Thread: PL/Python patch for Universal Newline Support
The attached patch mimics Universal Newline Support for PL/Python functions. For background, see PEP 278 and the recent "plpython function problem workaround" thread in pgsql-general: http://www.python.org/peps/pep-0278.html http://archives.postgresql.org/pgsql-general/2005-03/msg00823.php In short, embedded Python code must have lines that end in \n, not \r\n or \r. This patch adds logic to PLy_procedure_munge_source() to skip \r followed by \n, otherwise to convert \r to \n. I tested it in HEAD but REL8_0_STABLE has the same version of plpython.c (1.58), so it should work there as well. Should the PL/Python documentation mention this behavior? If so then I can resubmit with a documentation patch. How should I submit regression tests? Here's what I did: CREATE FUNCTION test_lf() RETURNS integer AS 'x=100\ny=23\nreturn x+y\n' LANGUAGE plpythonu; CREATE FUNCTION test_crlf() RETURNS integer AS 'x=100\r\ny=23\r\nreturn x+y\r\n' LANGUAGE plpythonu; CREATE FUNCTION test_cr() RETURNS integer AS 'x=100\ry=23\rreturn x+y\r' LANGUAGE plpythonu; Here's what an unpatched system does: SELECT test_lf(); test_lf --------- 123 (1 row) SELECT test_cr(); ERROR: plpython: could not compile function "test_cr" DETAIL: exceptions.SyntaxError: invalid syntax (line 2) SELECT test_crlf(); ERROR: plpython: could not compile function "test_crlf" DETAIL: exceptions.SyntaxError: invalid syntax (line 2) Here's a patched system: SELECT test_lf(); test_lf --------- 123 (1 row) SELECT test_cr(); test_cr --------- 123 (1 row) SELECT test_crlf(); test_crlf ----------- 123 (1 row) This patch doesn't address the indentation problem mentioned here: http://archives.postgresql.org/pgsql-general/2005-03/msg00762.php The solution to that problem is still being discussed. -- Michael Fuhr http://www.fuhr.org/~mfuhr/
Attachment
Michael Fuhr wrote: > Should the PL/Python documentation mention this behavior? Isn't this the behavior the user would expect? If so, I guess it's okay not to document it. > How should I submit regression tests? Yes, please. > *** src/pl/plpython/plpython.c 17 Dec 2004 02:14:48 -0000 1.58 > --- src/pl/plpython/plpython.c 19 Mar 2005 04:29:55 -0000 > *************** > *** 1206,1215 **** > > while (*sp != '\0') > { > ! if (*sp == '\n') > { > ! *mp++ = *sp++; > *mp++ = '\t'; > } > else > *mp++ = *sp++; > --- 1206,1219 ---- > > while (*sp != '\0') > { > ! if (*sp == '\r' && *(sp + 1) == '\n') > ! sp++; > ! > ! if (*sp == '\n' || *sp == '\r') > { > ! *mp++ = '\n'; > *mp++ = '\t'; > + sp++; > } > else > *mp++ = *sp++; Does this work for "\r\n" embedded in string literals? -Neil
Neil Conway <neilc@samurai.com> writes: > Does this work for "\r\n" embedded in string literals? I believe we'd concluded that Python will unconditionally convert all \r\n to \n when reading any text file --- including script files --- and therefore that's what Python programmers will expect to have happen to scripts. In other words we should deliberately be non-syntax-aware when stripping \r. regards, tom lane
On Mon, Mar 21, 2005 at 10:28:28AM +1100, Neil Conway wrote: > Michael Fuhr wrote: > > >How should I submit regression tests? > > Yes, please. The operative word there was "how" :-) I don't see anything testing PL/{Python,Perl,Tcl} under src/test/regress -- should I put something there? Can regression tests be run conditionally? If so, what's the preferred way to do that? If regression tests for this kind of patch need to be done another way, how should I submit them? > Does this work for "\r\n" embedded in string literals? Literal carriage returns (ASCII character 13) will be translated, just as Python would do when reading a script from a file if Python is built with Universal Newline Support (enabled by default, at least in recent versions). Escape sequences won't be touched if they're stored in prosrc as escape sequences (backslash-r) instead of actual carriage returns. For example: CREATE FUNCTION foo() RETURNS text AS $$ return "\r\n" $$ LANGUAGE plpythonu IMMUTABLE; SELECT prosrc FROM pg_proc WHERE proname = 'foo'; prosrc ----------------- return "\r\n" (1 row) SELECT length(foo()), ascii(foo()), ascii(substr(foo(), 2)); length | ascii | ascii --------+-------+------- 2 | 13 | 10 (1 row) But the following fails (the function is in single quotes instead of dollar quotes, so \r\n becomes an actual CRLF): CREATE OR REPLACE FUNCTION foo() RETURNS text AS ' return "\r\n" ' LANGUAGE plpythonu IMMUTABLE; SELECT prosrc FROM pg_proc WHERE proname = 'foo'; prosrc --------------- return " " (1 row) SELECT foo(); ERROR: plpython: could not compile function "foo" DETAIL: exceptions.SyntaxError: EOL while scanning single-quoted string (line 3) An ordinary Python script that looked like that would fail as well: % cat -v foo.py print "^M " % python foo.py File "foo.py", line 1 print " ^ SyntaxError: EOL while scanning single-quoted string Note Python's translation in this case: % cat -v foo.py print """a^M b^M c^M """ % python foo.py | od -tx1 0000000 61 0a 62 0a 63 0a 0a 0000007 (The extra newline is appended by "print".) I was thinking that the patch could be applied to HEAD and hopefully somebody could do some additional testing with Windows clients and servers. If there are no problems, and especially if the patch solves the problem it's intended to solve, then the patch could be applied to REL8_0_STABLE so it would be in 8.0.2 whenever that comes out. Tom, that sounded reasonable to you, didn't it? http://archives.postgresql.org/pgsql-general/2005-03/msg00842.php -- Michael Fuhr http://www.fuhr.org/~mfuhr/
Michael Fuhr <mike@fuhr.org> writes: > The operative word there was "how" :-) I don't see anything testing > PL/{Python,Perl,Tcl} under src/test/regress -- should I put something > there? No. The PLs have their own regression tests in their individual src/pl directories; feel free to hack on those (most are pretty lame :-() regards, tom lane
On Mon, Mar 21, 2005 at 02:50:47AM -0500, Tom Lane wrote: > Michael Fuhr <mike@fuhr.org> writes: > > The operative word there was "how" :-) I don't see anything testing > > PL/{Python,Perl,Tcl} under src/test/regress -- should I put something > > there? > > No. > > The PLs have their own regression tests in their individual src/pl > directories; feel free to hack on those (most are pretty lame :-() Thanks -- I'll add some tests there and resubmit. -- Michael Fuhr http://www.fuhr.org/~mfuhr/
Michael Fuhr <mike@fuhr.org> writes: > On Mon, Mar 21, 2005 at 02:50:47AM -0500, Tom Lane wrote: >> The PLs have their own regression tests in their individual src/pl >> directories; feel free to hack on those (most are pretty lame :-() > Thanks -- I'll add some tests there and resubmit. One thing that needs some thought is how you are going to test this robustly. I'd not feel any great deal of confidence in a test that expects that we can push \r\n sequences into CVS and expect them to survive unmodified into distributed filesets. Every so often someone commits a DOS-newlines file into CVS, and they hear about it PDQ... regards, tom lane
On Mon, Mar 21, 2005 at 03:02:57AM -0500, Tom Lane wrote: > > One thing that needs some thought is how you are going to test this > robustly. I'd not feel any great deal of confidence in a test that > expects that we can push \r\n sequences into CVS and expect them to > survive unmodified into distributed filesets. Every so often someone > commits a DOS-newlines file into CVS, and they hear about it PDQ... We're testing what happens when \r gets into prosrc, right? Shouldn't creating the function with \r in single quotes instead of dollar quotes work, as in the tests I showed in my original submission? http://archives.postgresql.org/pgsql-patches/2005-03/msg00186.php The functions with \r failed as expected in an unpatched system, and they worked in a patched system. The CREATE FUNCTION statements used \r escape sequences, so that shouldn't be a problem with CVS since they're not actual carriage returns in the SQL, right? -- Michael Fuhr http://www.fuhr.org/~mfuhr/
On Mon, Mar 21, 2005 at 01:19:35AM -0700, Michael Fuhr wrote: > We're testing what happens when \r gets into prosrc, right? Shouldn't > creating the function with \r in single quotes instead of dollar > quotes work, as in the tests I showed in my original submission? ...or what about a little script or C program that generates CREATE FUNCTION statements with various line endings? I suppose that would more accurately simulate what client apps are doing. -- Michael Fuhr http://www.fuhr.org/~mfuhr/
Michael Fuhr said: > On Mon, Mar 21, 2005 at 03:02:57AM -0500, Tom Lane wrote: >> >> One thing that needs some thought is how you are going to test this >> robustly. I'd not feel any great deal of confidence in a test that >> expects that we can push \r\n sequences into CVS and expect them to >> survive unmodified into distributed filesets. Every so often someone >> commits a DOS-newlines file into CVS, and they hear about it PDQ... > > We're testing what happens when \r gets into prosrc, right? Shouldn't > creating the function with \r in single quotes instead of dollar > quotes work, as in the tests I showed in my original submission? > Yes - also worked in testing CSV multiline support - see copy regression test. cheers andrew
Michael Fuhr <mike@fuhr.org> writes: > We're testing what happens when \r gets into prosrc, right? Shouldn't > creating the function with \r in single quotes instead of dollar > quotes work, as in the tests I showed in my original submission? Yeah, that should work. I was too tired to think of that last night ;-) regards, tom lane
Here's a PL/Python patch that includes a few tests for Universal Newline Support. "gmake installcheck" in src/pl/plpython is successful in HEAD on my Solaris 9 box running Python 2.4.1c2. I ran the tests against an unpatched 8.0.1 server and they failed as expected. -- Michael Fuhr http://www.fuhr.org/~mfuhr/
Attachment
Michael Fuhr <mike@fuhr.org> writes: > Here's a PL/Python patch that includes a few tests for Universal > Newline Support. Applied to HEAD and 8.0 branch. regards, tom lane