Thread: Multiline plpython procedure

Multiline plpython procedure

From
Hong Yuan
Date:
Hi,

I am biwildered at how to create a multi-line plpython function in
Postgres. When I create the function in one line like this:

CREATE or REPLACE FUNCTION circ (float)
RETURNS float AS 'from math import pi; return 2*pi*args[0]' LANGUAGE
plpythonu;

and then use SELECT circ(1) to test it, it runs well.

But if I try to make the code looks better by separating it into
mulitple lines, like this:

CREATE or REPLACE FUNCTION circ (float)
RETURNS float AS '
from math import pi
return 2*pi*args[0]' LANGUAGE plpythonu;

I got an error message:

ERROR:  plpython: could not compile function "circ"
DETAIL:  exceptions.SyntaxError: invalid syntax (line 2)

How to get Postgres to accept a normal looking python function?

Greetings

--
HONG Yuan
Homemaster Trading Co., Ltd.
No. 601, Bldg. 41, 288 Shuangyang Rd. (N)
Shanghai 200433, P.R.C.
Tel: +86 21 55056553
Fax: +86 21 55067325
E-mail: hongyuan@homemaster.cn


Re: Multiline plpython procedure

From
Adrian Klaver
Date:
On Monday 17 January 2005 01:54 am, Hong Yuan wrote:
I entered the multilineversion of this function exactly as written here and it
ran properly. This was with version 8.0 of Postgres.  You might want to do
a /df+ circ in psql to see if your editor is putting a space at the beginning
of line 2.
> Hi,
>
> I am biwildered at how to create a multi-line plpython function in
> Postgres. When I create the function in one line like this:
>
> CREATE or REPLACE FUNCTION circ (float)
> RETURNS float AS 'from math import pi; return 2*pi*args[0]' LANGUAGE
> plpythonu;
>
> and then use SELECT circ(1) to test it, it runs well.
>
> But if I try to make the code looks better by separating it into
> mulitple lines, like this:
>
> CREATE or REPLACE FUNCTION circ (float)
> RETURNS float AS '
> from math import pi
> return 2*pi*args[0]' LANGUAGE plpythonu;
>
> I got an error message:
>
> ERROR:  plpython: could not compile function "circ"
> DETAIL:  exceptions.SyntaxError: invalid syntax (line 2)
>
> How to get Postgres to accept a normal looking python function?
>
> Greetings

--
Adrian Klaver
aklaver@comcast.net

Re: Multiline plpython procedure

From
Hong Yuan
Date:
Adrian Klaver wrote:

>On Monday 17 January 2005 01:54 am, Hong Yuan wrote:
>I entered the multilineversion of this function exactly as written here and it
>ran properly. This was with version 8.0 of Postgres.  You might want to do
>a /df+ circ in psql to see if your editor is putting a space at the beginning
>of line 2.
>
>
The strange thing is that if I create the function in psql, the function
works. But if I create the function from PgAdmin III, there was an
error, although with \df+ circ I can not identify any visual difference
between the two versions of function. See below:


.... I create the function with psql ....

homemaster=# \df+ circ
                                            List of functions
 Result data type | Schema | Name | Argument data types |  Owner  |
Language  | Source code | Description
------------------+--------+------+---------------------+---------+-----------+-------------+-------------
 double precision | public | circ | double precision    | zopeapp |
plpythonu |
from math import pi
return 2*pi*args[0]            |
(1 row)

homemaster=# select circ(1);
     circ
---------------
 6.28318530718
(1 row)

.... Now I recreate the function with pgAdmin ....

homemaster=# \df+ circ
                                            List of functions
 Result data type | Schema | Name | Argument data types |  Owner  |
Language  | Source code | Description
------------------+--------+------+---------------------+---------+-----------+-------------+-------------
 double precision | public | circ | double precision    | zopeapp |
plpythonu |
from math import pi
return 2*pi*args[0]            |
(1 row)

homemaster=# select circ(1);
ERROR:  plpython: could not compile function "circ"
DETAIL:  exceptions.SyntaxError: invalid syntax (line 2)

I suppose there are some invisible characters inserted into the function
body by pgAdmin. I am using pgAdmin III Version 1.2.0 under Chinese
Windows XP, while the database is 7.4.6 under Linux.

Should be a bug of pgAdmin with encoding or something. At least I know
the workaround now. Thanks.

--
HONG Yuan
Homemaster Trading Co., Ltd.
No. 601, Bldg. 41, 288 Shuangyang Rd. (N)
Shanghai 200433, P.R.C.
Tel: +86 21 55056553
Fax: +86 21 55067325
E-mail: hongyuan@homemaster.cn


Re: Multiline plpython procedure

From
Tom Lane
Date:
Hong Yuan <hongyuan@homemaster.cn> writes:
> I suppose there are some invisible characters inserted into the function
> body by pgAdmin.

Seems like it must be.  You could try pg_dump'ing both versions of the
function and comparing the output files byte-by-byte.

Be sure to let the pgAdmin guys know what you find out.  I'm sure they
are not intentionally trying to break plpython.  (Python is, however,
very far out in left field in considering the amount of whitespace to
be semantically significant.  I personally won't use it voluntarily
because of that one reason...)

            regards, tom lane

Re: Multiline plpython procedure

From
Hong Yuan
Date:
Tom Lane wrote:

>Hong Yuan <hongyuan@homemaster.cn> writes:
>
>
>>I suppose there are some invisible characters inserted into the function
>>body by pgAdmin.
>>
>>
>
>Seems like it must be.  You could try pg_dump'ing both versions of the
>function and comparing the output files byte-by-byte.
>
>Be sure to let the pgAdmin guys know what you find out.  I'm sure they
>are not intentionally trying to break plpython.  (Python is, however,
>very far out in left field in considering the amount of whitespace to
>be semantically significant.  I personally won't use it voluntarily
>because of that one reason...)
>
>            regards, tom lane
>
>
Following your direction, I did a byte-by-byte comparasion of the dump
files of the two different version. The sole difference is that in the
working version, the line break is represented the unix way as one 0x0a,
while in the non-working version the line break is the Windows/Dos
version of 0x0d, 0x0a.

I am no longer sure whether this is a problem of Postgres or of pgAdmin.
Maybe it is just one database configuration?

Best regards

--
HONG Yuan
Homemaster Trading Co., Ltd.
No. 601, Bldg. 41, 288 Shuangyang Rd. (N)
Shanghai 200433, P.R.C.
Tel: +86 21 55056553
Fax: +86 21 55067325
E-mail: hongyuan@homemaster.cn


Re: Multiline plpython procedure

From
Tom Lane
Date:
Hong Yuan <hongyuan@homemaster.cn> writes:
> Following your direction, I did a byte-by-byte comparasion of the dump
> files of the two different version. The sole difference is that in the
> working version, the line break is represented the unix way as one 0x0a,
> while in the non-working version the line break is the Windows/Dos
> version of 0x0d, 0x0a.

Now that you say that, I seem to recall that this has been reported
before.  It seems odd that in today's climate the Python interpreter
would not cope well with Windows-style newlines.  Maybe there is some
configuration issue with Python itself?

> I am no longer sure whether this is a problem of Postgres or of pgAdmin.

Postgres is just passing the bytes from point A to point B.  You could
maybe argue that it's a bad idea for pgAdmin to be using a
Windows-centric notion of newline, but I doubt you'll be able to sell
those guys on it.  Really this seems like a robustness issue for Python.

            regards, tom lane

Re: Multiline plpython procedure

From
Hong Yuan
Date:
Tom Lane wrote:

>Now that you say that, I seem to recall that this has been reported
>before.  It seems odd that in today's climate the Python interpreter
>would not cope well with Windows-style newlines.  Maybe there is some
>configuration issue with Python itself?
>
>
I found a thread on exactly the same problem in the archive
http://archives.postgresql.org/pgadmin-support/2004-08/msg00078.php.
However the discussion seemed to have died down.

>>I am no longer sure whether this is a problem of Postgres or of pgAdmin.
>>
>>
>
>Postgres is just passing the bytes from point A to point B.  You could
>maybe argue that it's a bad idea for pgAdmin to be using a
>Windows-centric notion of newline, but I doubt you'll be able to sell
>those guys on it.  Really this seems like a robustness issue for Python.
>
>
>
I can not find any problem with the python configuration itself. It runs
program with Windows-styled newlines perfectly. I suspect the problem
might lie in the module pl/python, which receives functions from
Postgres and passes them to python.

Maybe I should file a bug report.

--
HONG Yuan
Homemaster Trading Co., Ltd.
No. 601, Bldg. 41, 288 Shuangyang Rd. (N)
Shanghai 200433, P.R.C.
Tel: +86 21 55056553
Fax: +86 21 55067325
E-mail: hongyuan@homemaster.cn


Re: Multiline plpython procedure

From
Michael Fuhr
Date:
On Tue, Jan 18, 2005 at 01:24:31AM -0500, Tom Lane wrote:
>
> Now that you say that, I seem to recall that this has been reported
> before.  It seems odd that in today's climate the Python interpreter
> would not cope well with Windows-style newlines.  Maybe there is some
> configuration issue with Python itself?

Hmmmm....

#include <Python.h>

int
main(void)
{
    Py_Initialize();
    PyRun_SimpleString("print 'What hath'\n"
                       "print 'Guido wrought?'\n");
    Py_Finalize();
    return 0;
}

As written this program works, at least with Python 2.4 on FreeBSD
4.11 and Solaris 9:

  % ./foo
  What hath
  Guido wrought?

But if you change LF to CRLF like this:

    PyRun_SimpleString("print 'What hath'\r\n"
                       "print 'Guido wrought?'\r\n");

then you get this:

  % ./foo
    File "<string>", line 1
      print 'What hath'
                     ^
  SyntaxError: invalid syntax

I don't know if that behavior is configurable or not.

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/

Re: Multiline plpython procedure

From
Michael Fuhr
Date:
On Tue, Jan 18, 2005 at 01:24:31AM -0500, Tom Lane wrote:
>
> It seems odd that in today's climate the Python interpreter
> would not cope well with Windows-style newlines.  Maybe there is some
> configuration issue with Python itself?

http://docs.python.org/ref/physical.html

"A physical line ends in whatever the current platform's convention
is for terminating lines.  On Unix, this is the ASCII LF (linefeed)
character.  On Windows, it is the ASCII sequence CR LF (return
followed by linefeed).  On Macintosh, it is the ASCII CR (return)
character."

And yet simple tests show that normal scripts run on Unix-like
platforms regardless of whether their lines end in LF, CRLF, or CR.

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/

Re: Multiline plpython procedure

From
Tom Lane
Date:
Michael Fuhr <mike@fuhr.org> writes:
> http://docs.python.org/ref/physical.html

> "A physical line ends in whatever the current platform's convention
> is for terminating lines.  On Unix, this is the ASCII LF (linefeed)
> character.  On Windows, it is the ASCII sequence CR LF (return
> followed by linefeed).  On Macintosh, it is the ASCII CR (return)
> character."

Seems like Guido has missed a bet here: namely the case of a script
generated on one platform and fed to an interpreter running on another.
If I were designing it, I would say that any Python interpreter should
take all three variants no matter which platform the interpreter itself
is sitting on.  Or is cross-platform support not a Python goal?

In short, any bug report on this ought to go to the Python project.

            regards, tom lane

Re: Multiline plpython procedure

From
Adrian Klaver
Date:
Actually universal newline support seems to be covered by the following PEP
and is present in the version of Python(2.3) I am running.
http://www.python.org/peps/pep-0278.txt
I would tend to agree with Hong Yuan that the problem exists in plpythonu's
handling of newlines.

On Tuesday 18 January 2005 05:19 am, Tom Lane wrote:
> Michael Fuhr <mike@fuhr.org> writes:
> > http://docs.python.org/ref/physical.html
> >
> > "A physical line ends in whatever the current platform's convention
> > is for terminating lines.  On Unix, this is the ASCII LF (linefeed)
> > character.  On Windows, it is the ASCII sequence CR LF (return
> > followed by linefeed).  On Macintosh, it is the ASCII CR (return)
> > character."
>
> Seems like Guido has missed a bet here: namely the case of a script
> generated on one platform and fed to an interpreter running on another.
> If I were designing it, I would say that any Python interpreter should
> take all three variants no matter which platform the interpreter itself
> is sitting on.  Or is cross-platform support not a Python goal?
>
> In short, any bug report on this ought to go to the Python project.
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings

--
Adrian Klaver
aklaver@comcast.net

Re: Multiline plpython procedure

From
Michael Fuhr
Date:
On Tue, Jan 18, 2005 at 07:34:59PM -0800, Adrian Klaver wrote:

> Actually universal newline support seems to be covered by the following PEP
> and is present in the version of Python(2.3) I am running.
> http://www.python.org/peps/pep-0278.txt

I see the following in the PEP:

  There is no support for universal newlines in strings passed to
  eval() or exec.  It is envisioned that such strings always have the
  standard \n line feed, if the strings come from a file that file can
  be read with universal newlines.

Does the above mean that the PyRun_*() family doesn't have universal
newline support?  Or at least that some members of the family don't?
That would explain why the simple C program I tested failed.

http://archives.postgresql.org/pgsql-general/2005-01/msg00876.php

> I would tend to agree with Hong Yuan that the problem exists in plpythonu's
> handling of newlines.

If Python's behavior is intentional then the newline burden would
seem to be on the user or on plpythonu.  I think Tom's point is
that that's just silly....

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/

Re: Multiline plpython procedure

From
Stuart Bishop
Date:
Michael Fuhr wrote:
> On Tue, Jan 18, 2005 at 07:34:59PM -0800, Adrian Klaver wrote:
>
>
>>Actually universal newline support seems to be covered by the following PEP
>>and is present in the version of Python(2.3) I am running.
>>http://www.python.org/peps/pep-0278.txt
>
>
> I see the following in the PEP:
>
>   There is no support for universal newlines in strings passed to
>   eval() or exec.  It is envisioned that such strings always have the
>   standard \n line feed, if the strings come from a file that file can
>   be read with universal newlines.
>
> Does the above mean that the PyRun_*() family doesn't have universal
> newline support?  Or at least that some members of the family don't?
> That would explain why the simple C program I tested failed.
>
> http://archives.postgresql.org/pgsql-general/2005-01/msg00876.php
>
>
>>I would tend to agree with Hong Yuan that the problem exists in plpythonu's
>>handling of newlines.
>
>
> If Python's behavior is intentional then the newline burden would
> seem to be on the user or on plpythonu.  I think Tom's point is
> that that's just silly....

Changing this behavior in Python would break backwards compatibility. In
particular, the exec() function accepts strings that have already been
unescaped:

 >>> exec('print """\n\r\n\r\n"""')

In the above example, the exec function is being passed a string
containing carridge returns and line feeds - not '\n' and '\r' character
sequences.

It is too late for the Python 2.3 series anyway - 2.3.5 is being
released Jan 26th and there won't be a 2.3.6. If it was championed and
it decided that the above example is a bug and not a feature and a patch
produced, it could get into 2.4.1 due April and 2.5+

I suspect this means fixing this problem in plpythonu for 8.1.

--
Stuart Bishop <stuart@stuartbishop.net>
http://www.stuartbishop.net/

Re: Multiline plpython procedure

From
Martijn van Oosterhout
Date:
On Wed, Jan 19, 2005 at 06:28:25PM +1100, Stuart Bishop wrote:
> Michael Fuhr wrote:
> >If Python's behavior is intentional then the newline burden would
> >seem to be on the user or on plpythonu.  I think Tom's point is
> >that that's just silly....
>
> Changing this behavior in Python would break backwards compatibility. In
> particular, the exec() function accepts strings that have already been
> unescaped:
>
> >>> exec('print """\n\r\n\r\n"""')
>
> In the above example, the exec function is being passed a string
> containing carridge returns and line feeds - not '\n' and '\r' character
> sequences.

Ofcourse, if the \r is within a literal string, then ofcourse you can't
ignore it. Other languages like C and Perl also maintain any character
within a string. The point is that outside of character strings, there
is no need to consider a \n different form a \r (unless there is a
place in Python where an extra newline changes the meaning).

Sure, you can't just run dos2unix over the code, but within the parser
this is a simple change.

> It is too late for the Python 2.3 series anyway - 2.3.5 is being
> released Jan 26th and there won't be a 2.3.6. If it was championed and
> it decided that the above example is a bug and not a feature and a patch
> produced, it could get into 2.4.1 due April and 2.5+
>
> I suspect this means fixing this problem in plpythonu for 8.1.

I suggest adding to the Release Notes:

  User defined functions using the Python language must use the newline
  delimiter of the server OS. There is currently no standard way of
  determining the newline delimiter of the server. Note this also
  affects the portability of pg_dump output.

Hope this helps,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Attachment

Re: Multiline plpython procedure

From
Marco Colombo
Date:
On Tue, 18 Jan 2005, Tom Lane wrote:

> Michael Fuhr <mike@fuhr.org> writes:
>> http://docs.python.org/ref/physical.html
>
>> "A physical line ends in whatever the current platform's convention
>> is for terminating lines.  On Unix, this is the ASCII LF (linefeed)
>> character.  On Windows, it is the ASCII sequence CR LF (return
>> followed by linefeed).  On Macintosh, it is the ASCII CR (return)
>> character."
>
> Seems like Guido has missed a bet here: namely the case of a script
> generated on one platform and fed to an interpreter running on another.

I think you're missing that vendors define what a 'text file' is on their
platform, not Guido. Guido just says that a Python program is a text file,
which is a very sound decision, since it makes perfectlty sense to be able
to edit it with native tools (text editors which do not support alien
textfile formats).
What you seem to be missing is that before scripts are "fed to interpreter
running on another [platform]" they need to be transferred there!
Conversion must happen (if necessary) at that point. That's why the
2000 years old protocol FTP (well, maybe not 2000 years but it _is_ old)
has an ASCII transfer mode. Is this situation unfortunate? Yes. Every
(programming) language is (or should be) affected by the same problem,
since I expect the source file being a _text_ file, everywhere.
A \n line-terminated file is not a text file under Windows, per specs.
A \r\n line-terminated is not a text file under Unix, per specs.
A \r line-terminated is not a text file neither Win or Unix, per specs.
(I'm not sure what the specs are under Mac).
Those are facts of life you have to deal with everytime you move a text
file (such as the source of a program) from platform X to platform Y.
It may affect a Cobol or Lisp or C compiler as well.

> If I were designing it, I would say that any Python interpreter should
> take all three variants no matter which platform the interpreter itself
> is sitting on.  Or is cross-platform support not a Python goal?

Changing what a text file is under all platforms Python aims to run on
is not. (I can't speak for Guido of course, but I'm pretty sure it isn't).
I'm not against your suggestion, but that won't help with the simple fact
that text files need to be converted to what the platform they sit on
defines a text file to be. Otherwise, many other native tools fail in
treating them as text file.

> In short, any bug report on this ought to go to the Python project.

Definitely not a bug report for Python. It seems to me is works as expected
(that is, as documented). The bug is on the application that transferred
the text file over the wire from platform X to platform Y.

OTOH, it's true that on the client-server "wire" no text file is tranferred,
strictly speaking. Just a string which happens to be a valid python
program. Moreover, Python is used more like an embedded scripting language
(not as a standalone programming language). So you're right when you expect
it to be more tolerant.

This is a grey area. It is pretty clear that a text file is a sequence
of lines: the separator is platform specific but the user/application
becomes aware of it only when the text file is accessed as in binary mode
(with some quirks... most native unix tool will precisely that,
think of md5sum, since there's no way to recognise a "text file").
It happens that when a text file is _correctly_ accessed, the platform
hides the separator (or should do).

For "multiline strings" (which is the right data type for a python
embedded script), everything is just worse (there's nothing even close
to a vague definition). IMHO, everytime such a string is handed to a
native tool, it should be converted to the platform specific multiline
format (that is, with the right separator). You shouldn't expect the
external tool to be able to cope with alien line formats.
Alternatively, you should _define_ what the separator is for python
embedded script in PostgreSQL, and have the interpreter accept it on
every platform unconditionally (I'm not sure whether this is easy or not).

Just my 0.03 eurocents.
.TM.
--
       ____/  ____/   /
      /      /       /            Marco Colombo
     ___/  ___  /   /              Technical Manager
    /          /   /             ESI s.r.l.
  _____/ _____/  _/               Colombo@ESI.it

Re: Multiline plpython procedure

From
Martijn van Oosterhout
Date:
On Wed, Jan 19, 2005 at 12:20:23PM +0100, Marco Colombo wrote:
> On Tue, 18 Jan 2005, Tom Lane wrote:
> >>followed by linefeed).  On Macintosh, it is the ASCII CR (return)
> >>character."
> >
> >Seems like Guido has missed a bet here: namely the case of a script
> >generated on one platform and fed to an interpreter running on another.
>
> I think you're missing that vendors define what a 'text file' is on their
> platform, not Guido. Guido just says that a Python program is a text file,
> which is a very sound decision, since it makes perfectlty sense to be able
> to edit it with native tools (text editors which do not support alien
> textfile formats).

Sure, some text editors don't. Some text editors do. But the C compiler
accepts programs in any of these formats. And consider multiple
machines working off the same file server. There is no "standard" text
format and everyone should just get along.

The C standard explicitly defines \r and \n as whitespace, thus neatly
avoiding the entire issue. Many other languages do the same. The fact
is the python is the odd one out.

Be liberal in what you receive. After, what's the benefit of having
python source that's not runnable on every computer. Without
conversion.

Hope this helps,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Attachment

Re: Multiline plpython procedure

From
Stuart Bishop
Date:
Martijn van Oosterhout wrote:
> On Wed, Jan 19, 2005 at 06:28:25PM +1100, Stuart Bishop wrote:
>
>>Michael Fuhr wrote:
>>
>>>If Python's behavior is intentional then the newline burden would
>>>seem to be on the user or on plpythonu.  I think Tom's point is
>>>that that's just silly....
>>
>>Changing this behavior in Python would break backwards compatibility. In
>>particular, the exec() function accepts strings that have already been
>>unescaped:
>>
>>
>>>>>exec('print """\n\r\n\r\n"""')
>>
>>In the above example, the exec function is being passed a string
>>containing carridge returns and line feeds - not '\n' and '\r' character
>>sequences.
>
>
> Ofcourse, if the \r is within a literal string, then ofcourse you can't
> ignore it. Other languages like C and Perl also maintain any character
> within a string. The point is that outside of character strings, there
> is no need to consider a \n different form a \r (unless there is a
> place in Python where an extra newline changes the meaning).
>
> Sure, you can't just run dos2unix over the code, but within the parser
> this is a simple change.

Oh - I had confused myself. Your point about dos2unix shows that
plpythonu might not be able to do this 'correctly' unless it understands
a good chunk of Python syntax. It could do it 'good enough' if that is
acceptible.

I'll take this to python-dev, but unfortunately I think my comment below
about the 2.3 series still stands unless higher powers believe this is a
show stopper.

>>It is too late for the Python 2.3 series anyway - 2.3.5 is being
>>released Jan 26th and there won't be a 2.3.6. If it was championed and
>>it decided that the above example is a bug and not a feature and a patch
>>produced, it could get into 2.4.1 due April and 2.5+
>>
>>I suspect this means fixing this problem in plpythonu for 8.1.
>
>
> I suggest adding to the Release Notes:
>
>   User defined functions using the Python language must use the newline
>   delimiter of the server OS. There is currently no standard way of
>   determining the newline delimiter of the server. Note this also
>   affects the portability of pg_dump output.

I don't see how it affects the portability of pg_dump. If you have a
working Python function (with unix line endings), won't pg_dump dump the
source with unix line endings?

--
Stuart Bishop <stuart@stuartbishop.net>
http://www.stuartbishop.net/

Re: Multiline plpython procedure

From
Alvaro Herrera
Date:
On Thu, Jan 20, 2005 at 09:50:43AM +1100, Stuart Bishop wrote:
> Martijn van Oosterhout wrote:

> >  User defined functions using the Python language must use the newline
> >  delimiter of the server OS. There is currently no standard way of
> >  determining the newline delimiter of the server. Note this also
> >  affects the portability of pg_dump output.
>
> I don't see how it affects the portability of pg_dump. If you have a
> working Python function (with unix line endings), won't pg_dump dump the
> source with unix line endings?

It will ... so it won't work on Windows (maybe Mac OS X) servers.

--
Alvaro Herrera (<alvherre[@]dcc.uchile.cl>)
"La grandeza es una experiencia transitoria.  Nunca es consistente.
Depende en gran parte de la imaginación humana creadora de mitos"
(Irulan)

Re: Multiline plpython procedure

From
"Frank D. Engel, Jr."
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Uh, does the Python doc specify "platform" line endings, or "normal
(\n)" line endings?  It sounded to me like it always wanted the
UNIX-style \n line endings, so that using those would result in
portability...

On Jan 19, 2005, at 6:03 PM, Alvaro Herrera wrote:

> On Thu, Jan 20, 2005 at 09:50:43AM +1100, Stuart Bishop wrote:
>> Martijn van Oosterhout wrote:
>
>>>  User defined functions using the Python language must use the
>>> newline
>>>  delimiter of the server OS. There is currently no standard way of
>>>  determining the newline delimiter of the server. Note this also
>>>  affects the portability of pg_dump output.
>>
>> I don't see how it affects the portability of pg_dump. If you have a
>> working Python function (with unix line endings), won't pg_dump dump
>> the
>> source with unix line endings?
>
> It will ... so it won't work on Windows (maybe Mac OS X) servers.
>
> --
> Alvaro Herrera (<alvherre[@]dcc.uchile.cl>)
> "La grandeza es una experiencia transitoria.  Nunca es consistente.
> Depende en gran parte de la imaginación humana creadora de mitos"
> (Irulan)
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if
> your
>       joining column's datatypes do not match
>
>
- -----------------------------------------------------------
Frank D. Engel, Jr.  <fde101@fjrhome.net>

$ ln -s /usr/share/kjvbible /usr/manual
$ true | cat /usr/manual | grep "John 3:16"
John 3:16 For God so loved the world, that he gave his only begotten
Son, that whosoever believeth in him should not perish, but have
everlasting life.
$
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (Darwin)

iD8DBQFB7vYZ7aqtWrR9cZoRAszxAKCJpwXuWU/icjj8YSyKej/daEhQyQCfRB4I
mk5qcMxWeDRa5RHIxP+9lfw=
=UUu8
-----END PGP SIGNATURE-----



___________________________________________________________
$0 Web Hosting with up to 120MB web space, 1000 MB Transfer
10 Personalized POP and Web E-mail Accounts, and much more.
Signup at www.doteasy.com


Re: Multiline plpython procedure

From
Martijn van Oosterhout
Date:
On Wed, Jan 19, 2005 at 07:06:49PM -0500, Frank D. Engel, Jr. wrote:
> Uh, does the Python doc specify "platform" line endings, or "normal
> (\n)" line endings?  It sounded to me like it always wanted the
> UNIX-style \n line endings, so that using those would result in
> portability...

That's the problem, Python specifies "platform" line endings. See this
note:

http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/66434
> When working between platforms, it is often necessary to convert the
> line endings on files for them to work, especially when it comes to
> code. Pass Unix Python code with \r and it goes nowhere. Same on Mac
> Python with \n. This code simply and easily fixes the problem.

Note, one of these links provides a possible solution:

http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/59867
> This code eliminates the need to convert line endings when moving .py
> modules between OSes. Put in your sitecustomize.py, anywhere on
> sys.path, and you'll be able to import Python modules with any of
> Unix, Mac, or Windows line endings, on any OS.

I don't understand how it works, but people claim it solves their
problem...

Hope this helps,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Attachment

Re: Multiline plpython procedure

From
Marco Colombo
Date:
On Wed, 19 Jan 2005, Martijn van Oosterhout wrote:

> On Wed, Jan 19, 2005 at 12:20:23PM +0100, Marco Colombo wrote:
>> I think you're missing that vendors define what a 'text file' is on their
>> platform, not Guido. Guido just says that a Python program is a text file,
>> which is a very sound decision, since it makes perfectlty sense to be able
>> to edit it with native tools (text editors which do not support alien
>> textfile formats).
>
> Sure, some text editors don't. Some text editors do. But the C compiler
> accepts programs in any of these formats. And consider multiple
> machines working off the same file server. There is no "standard" text
> format and everyone should just get along.

Exaclty. Or, one could say: the "standard" text format is the one the
platform you are running on dictates. Which is what python does.
Multiple machine from a file server had better to agree on what a text
file is. Or do runtime conversions. Or let the server do that.
The issue affects _any_ text file (this email to name one) not only
python programs. [aside note: for e-mail there actually is a well
defined "on the wire" format, and applications are expected to make
the conversion when needed]

> The C standard explicitly defines \r and \n as whitespace, thus neatly
> avoiding the entire issue. Many other languages do the same. The fact
> is the python is the odd one out.

You're missing the point. The C source file is not a text file, it's
a binary sequence of bytes (which is quite unfortunate, you may
d/l a .c file and be not able to see/read it on your platform, while
the C compliler groks it happily). There's no _line_ separator in C.
If you've ever heard of obfuscated-C contexts, you know that you
can write a complete C program that actually does something in one
line, since C uses a _statements_ separator (';') and not a line
separator. So C is precisely an example of what you should not do:
use a binary file as source, pretending it's a text file. This may
actually make sense, historically, but definitely it's against
python attitude. Python source files are, like it or not, well formed
text file, and the parser even requires correct indentation.

"Be very picky in what you accept"... after all, you're a _formal_
language. You already put a thousand requirements (a whole grammar)
in what you receive, why not adding also a few ones that force an
improved readability. Think of how hard it is for newbies to spot
a missing ; in C. Compare to how easy is to spot a missing line
break (actually, I think any newbie gets line breaking naturally
right from the start). Having the source of your programs be
line-oriented (opposed to statement oriented) is big win for a
language designer. And correclty indended from the start is even
better.
You may not agree with the last statements, but that's the python
way, a design (and general attitude) decision. There's no point
in sending a bug report about it.

> Be liberal in what you receive. After, what's the benefit of having
> python source that's not runnable on every computer. Without
> conversion.

Python source of course is runnable on every computer, provided that
the source file is a real text file for that platform.
If you downloaded any text file (not just python source files) by
the _wrong_ mean (e.g. FTP binary mode from a Unix server) on Windows
you'll have problems in handling it. You cannot view it (notepad)
you - very likely - cannot print it. (Yeah your <insert favorite 3rd
party editor> may be able perform both operations, but that's not the
point). Are you expecting your python interpreter on windows to be able
to handle it? Why? It's not a text file, it's binary garbage, the same
you see with notepad or on your printer when you try and print it.
See the point? (It's subtle: python somehow requires a program to be
human readable, and that means it has to be a text file, correctly
formatted for the platform).

I can see only two ways to address the issue:

- convert the string that represents the python program to a correct
   multi-line string (according to the rules of the platform we're
   running on) before we pass it to the python interpreter;

- explicitly set one format as the right one for our purpose
   ("embedded python in PostgreSQL"), and have the python interpreter
   we use comply, no matter of the platform we're running on.

Of course, setting the rule:

- python scripts should be correctly formatted multi-line strings
   according to _server_ platform,

will work as well, but places extra burden on the clients (and/or users).

Note that an option or env. variable like:

$ python -T dos file.py

$ export PYTHONTEXTFORMAT=dos
$ python file.py

would be great to have, of course (and that can be suggested).

.TM.
--
       ____/  ____/   /
      /      /       /            Marco Colombo
     ___/  ___  /   /              Technical Manager
    /          /   /             ESI s.r.l.
  _____/ _____/  _/               Colombo@ESI.it

Re: Multiline plpython procedure

From
Greg Stark
Date:
Marco Colombo <marco@esi.it> writes:

> Exaclty. Or, one could say: the "standard" text format is the one the
> platform you are running on dictates. Which is what python does.

Egads. So the set of valid Python programs is different depending on what
platform you're on? That's just, uhm, insane. So essentially Python isn't a
single language, it's a set of languages, Python-NL, Python-NLCR, Python-CR,
(and in theory others).

So if I generate a database with a Python-CRNL function on windows, then
pg_dump it and load it on Unix the function won't run because it's the wrong
language, Unix only supports Python-NL.

I don't think it's reasonable for pg_dump to think about converting data from
one language to another. It's important for pg_dump to restore an identical
database. Having it start with special case data conversation from one flavour
to another seems too dangerous.


Incidentally, are we sure we've diagnosed this correctly? I'm discussing this
with some Python developers and they're expressing skepticism. One just tried
a quick test with a Python program containing a mixture of all three newline
flavours and it ran fine.


--
greg

Re: Multiline plpython procedure

From
Tom Lane
Date:
Greg Stark <gsstark@mit.edu> writes:
> Egads. So the set of valid Python programs is different depending on what
> platform you're on? That's just, uhm, insane.

No quibble here.

> Incidentally, are we sure we've diagnosed this correctly? I'm discussing this
> with some Python developers and they're expressing skepticism. One just tried
> a quick test with a Python program containing a mixture of all three newline
> flavours and it ran fine.

He tried reading 'em from files, using Python's text-format-converting
file reader, no?  See the test case posted up-thread, which demonstrates
that feeding a string directly to PyExec (or whatever its called)
produces newline-sensitive results.

            regards, tom lane

Re: Multiline plpython procedure

From
Stuart Bishop
Date:
Tom Lane wrote:
> Greg Stark <gsstark@mit.edu> writes:
>
>>Egads. So the set of valid Python programs is different depending on what
>>platform you're on? That's just, uhm, insane.
>
>
> No quibble here.
>
>
>>Incidentally, are we sure we've diagnosed this correctly? I'm discussing this
>>with some Python developers and they're expressing skepticism. One just tried
>>a quick test with a Python program containing a mixture of all three newline
>>flavours and it ran fine.
>
>
> He tried reading 'em from files, using Python's text-format-converting
> file reader, no?  See the test case posted up-thread, which demonstrates
> that feeding a string directly to PyExec (or whatever its called)
> produces newline-sensitive results.

This is currently being discussed on python-dev:

http://mail.python.org/pipermail/python-dev/2005-January/051203.html

It looks like my earlier concerns were unwarrented - current consensus
seems to be to transform line endings in the string to the
one-true-format expected by Python's guts:

http://mail.python.org/pipermail/python-dev/2005-January/051214.html

--
Stuart Bishop <stuart@stuartbishop.net>
http://www.stuartbishop.net/

Re: Multiline plpython procedure

From
Tino Wildenhain
Date:
On Fri, 2005-01-21 at 01:30 -0500, Tom Lane wrote:
> Greg Stark <gsstark@mit.edu> writes:
> > Egads. So the set of valid Python programs is different depending on what
> > platform you're on? That's just, uhm, insane.
>
> No quibble here.

Funny discussion. Did you ever try a shell script written on
dos/windows, transferred binary to *nix, with for example bash?
Strange errors at the #! line? Yes. Here you are. You cant
blame python for it.

Python bytecode however is in fact running on every platform.

I think an approach where plpython compiles just like plpgsql
should be perfectly valid.
--
Tino Wildenhain <tino@wildenhain.de>


Re: Multiline plpython procedure

From
Marco Colombo
Date:
On Fri, 21 Jan 2005, Greg Stark wrote:

>
> Marco Colombo <marco@esi.it> writes:
>
>> Exaclty. Or, one could say: the "standard" text format is the one the
>> platform you are running on dictates. Which is what python does.
>
> Egads. So the set of valid Python programs is different depending on what
> platform you're on? That's just, uhm, insane. So essentially Python isn't a
> single language, it's a set of languages, Python-NL, Python-NLCR, Python-CR,
> (and in theory others).

No. Just any other application that reads text files, it reads text files.
That simple. It's unfortunate that 'textfile' means different things
on different platforms.

> So if I generate a database with a Python-CRNL function on windows, then
> pg_dump it and load it on Unix the function won't run because it's the wrong
> language, Unix only supports Python-NL.
>
> I don't think it's reasonable for pg_dump to think about converting data from
> one language to another. It's important for pg_dump to restore an identical
> database. Having it start with special case data conversation from one flavour
> to another seems too dangerous.

Makes no sense. pg_dump already make a lot of conversions: from internal
representation (which may be platform dependent) to some common format,
say text. It's just multi-line text which is a hard to deal with, because
there _no_ single format for it. pg_dump may just choose one format, and
stick with it. Every dump/restore will work. You may have trouble editing
a text dump, but that's another matter. BTW, what pg_dump does on windows?
I mean with -F p. Does it produce a text file with CRNL line seperator?
What happens if you feed that file to psql on a Unix box?
I've tried (adding spurious CRs) on Unix, and I think SQL treats CR as
whitespace so it's no issue. But what for opposite? Is psql on Windows
able to recognize SQL scripts made on Unix? (I can't try this).

Anyway, think of floats. If you want do to FP maths fast, you need to use
the native format supported by the CPU. When you dump, you get a text
form of the FP number, and when you restore on a different platform you
may get a _different_ number. And you have to live with it. Kiss goodbye
to your "indentical database".

> Incidentally, are we sure we've diagnosed this correctly? I'm discussing this
> with some Python developers and they're expressing skepticism. One just tried
> a quick test with a Python program containing a mixture of all three newline
> flavours and it ran fine.

Recent python has universal newline support. It works for files, and it's
enabled by default when it read source files. But it's NOT part of the parser,
AFAIK, and the source file gets converted to UNIX format before being
fed to the parser (lexxer). Problem is that I'm not sure that's the way
python is used by PostgreSQL. It works only when the program is read from
a file. That's what the guy tested, probably. If you build a program, put
it in a string, and invoke the parser, the string must be in Unix format.

I'm for defining a format used by PostgreSQL, and force the python parser
into accepting it on all platforms. That is, let's set the rule that
python programs to be embedded into PostgreSQL use \n as line termination.

Think of this: tomorrow we meet people from Mars. One of them really likes
PostgreSQL, and ports it to their platform. Being a martian platform, it
uses a different text file format. Line separator there is the first 1000
binary digits of PI. When he writes a small python function on his client
and tries to have it run on a server on Earth, it fails, cause the python
parser here won't handle PI-terminated lines correctly. What would you do?
Bug python developers because "python it's a set of languages, Python-Earth,
Python-Mars, Python-Venus (and in theory others)"? (BTW, in that situation,
I bet Perl would fail as well). Or would you ask the martian guy to add,
as part of his port effort, support for the martian line format to PostgreSQL,
so that the server can convert the python program to Earth format before
feeding it to python? Or, alternatively, just tell him: python programs
in PostgreSQL are \n terminated? Which one is the simplest?

.TM.
--
       ____/  ____/   /
      /      /       /            Marco Colombo
     ___/  ___  /   /              Technical Manager
    /          /   /             ESI s.r.l.
  _____/ _____/  _/               Colombo@ESI.it

Re: Multiline plpython procedure

From
Marco Colombo
Date:
On Fri, 21 Jan 2005, Stuart Bishop wrote:

> This is currently being discussed on python-dev:
>
> http://mail.python.org/pipermail/python-dev/2005-January/051203.html
>
> It looks like my earlier concerns were unwarrented - current consensus seems
> to be to transform line endings in the string to the one-true-format expected
> by Python's guts:
>
> http://mail.python.org/pipermail/python-dev/2005-January/051214.html

I'm not sure that exec expects strings in Unix format:

http://docs.python.org/ref/physical.html
"A physical line ends in whatever the current platform's convention is for
  terminating lines. On Unix, this is the ASCII LF (linefeed) character.
  On Windows, it is the ASCII sequence CR LF (return followed by linefeed).
  On Macintosh, it is the ASCII CR (return) character."

Reading the following message from Guido, and the above paragraph,
I think that the lexxer always uses platform conventions to split lines.

http://mail.python.org/pipermail/python-dev/2002-March/021741.html
"I still think that this PEP is a big hack -- but as big hacks go, it
  seems to have a pretty good payback.

  I'm hoping that eventually the parser (really the lexer) will be able
  to open the file in binary mode and recognize all three newline styles
  directly.  That would solve the problems with exec, eval, and compile."

So I think you can't just pass exec a string with \n as line terminator.
That would work only under Unix. You should use os.linesep instead.
E.g.:

import os
import re

# a small program with mixed line ending conventions
a = "print 1\rprint 2\r\nprint 3\nprint 4\n"

try:
         exec a
except SyntaxError:
         print "SyntaxError"
         pass

# transform it according to local line ending convention
b = os.linesep.join(re.split("\r\n|\r|\n", a))

try:
         exec b
except SyntaxError:
         print "SyntaxError"
         pass

This produces, under Unix:
SyntaxError
1
2
3
4

Anyone can try it under Windows and under some old Mac?
I think we could do the same before passing the string to the interpreter
(I'm not familiar with how the interpreter is called in PostgreSQL, tho).

For those not fluent in python, this line:

os.linesep.join(re.split("\r\n|\r|\n", a))

first splits the string into a list for lines, using this pattern: \r\n|\r|\n
as separator. Then joins the lines again using os.linesep as separator instead.

.TM.
--
       ____/  ____/   /
      /      /       /            Marco Colombo
     ___/  ___  /   /              Technical Manager
    /          /   /             ESI s.r.l.
  _____/ _____/  _/               Colombo@ESI.it

Re: Multiline plpython procedure

From
Martijn van Oosterhout
Date:
On Fri, Jan 21, 2005 at 12:02:09PM +0100, Marco Colombo wrote:
> On Fri, 21 Jan 2005, Greg Stark wrote:
> >I don't think it's reasonable for pg_dump to think about converting
> >data from one language to another. It's important for pg_dump to
> >restore an identical database. Having it start with special case
> >data conversation from one flavour to another seems too dangerous.
>
> Makes no sense. pg_dump already make a lot of conversions: from internal
> representation (which may be platform dependent) to some common format,
> say text. It's just multi-line text which is a hard to deal with, because
> there _no_ single format for it. pg_dump may just choose one format, and
> stick with it. Every dump/restore will work. You may have trouble editing
> a text dump, but that's another matter. BTW, what pg_dump does on windows?
> I mean with -F p. Does it produce a text file with CRNL line seperator?
> What happens if you feed that file to psql on a Unix box?

Ah, but you see, looking at it from your point of view, pg_dump doesn't
interpret text strings. For example, the python script in a function is
a opaque string. Not multiline, nothing. All postgresql does is pass
that block of opaque data to the interpreter for that language. pg_dump
dumps that opaque data into the output, and the CREATE FUNCTION dumps
that opaque data back into the system tables. Postgresql doesn't
understand python any more or less than perl, tcl, R or any other
language.

The argument here is that basically this opaque data has different
meanings for Python on windows and Python on unix. You can't make any
special cases because I can rename plperl.so to plpython.so (or
vice-versa) the opaque data won't be passed to the interpreter that
you'd expect from looking at the definition.

> I'm for defining a format used by PostgreSQL, and force the python parser
> into accepting it on all platforms. That is, let's set the rule that
> python programs to be embedded into PostgreSQL use \n as line termination.

Wouldn't that disadvantage non-unix pl/python users, whose python
functions would have to be converted at run-time to conform to the
local text format. With the extra bummer that the resulting string may
not be the same size either. Remember, postgresql uses the standard
shared library for the language on the platform, it doesn't build its
own.

But sure, preprocessing the source at run-time seems to be the only
realistic solution without a change to the interpreter.

> Think of this: tomorrow we meet people from Mars. One of them really likes
> PostgreSQL, and ports it to their platform. Being a martian platform, it
> uses a different text file format. Line separator there is the first 1000

<snip>

Spurious argument. You're assuming Martians would use ASCII to write
programs without using one of the two defined line-ending characters.
If they were smart they'd simply use a character set which doesn't have
the ambiguity. If they even use 8-bit bytes. An ASCII C compiler won't
compile EBCDIC source code either, but nobody thinks that's
unreasonable, probably because nobody uses EBCDIC anymore :).

No-one is complaining about the use of line-ending characters, they
could have said that you need a semi-colon to seperate "lines". The
problem is that it's *not consistant* across platforms.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Attachment

Re: Multiline plpython procedure

From
Bruno Wolff III
Date:
On Fri, Jan 21, 2005 at 12:02:09 +0100,
  Marco Colombo <pgsql@esiway.net> wrote:
> On Fri, 21 Jan 2005, Greg Stark wrote:
>
> Anyway, think of floats. If you want do to FP maths fast, you need to use
> the native format supported by the CPU. When you dump, you get a text
> form of the FP number, and when you restore on a different platform you
> may get a _different_ number. And you have to live with it. Kiss goodbye
> to your "indentical database".

If you are going to another system that uses the same floating point
representation, you should get the same number. pg_dump writes out
enough digits that the exact number can be recovered when the dump
has been reloaded. This has been the case since 7.3.

If you move the data to a machine with a different floating point
representation you might get a different number even if the original number
could be represented exactly in the new representation.

Re: Multiline plpython procedure

From
Marco Colombo
Date:
On Fri, 21 Jan 2005, Bruno Wolff III wrote:

> On Fri, Jan 21, 2005 at 12:02:09 +0100,
> If you are going to another system that uses the same floating point
> representation, you should get the same number. pg_dump writes out
> enough digits that the exact number can be recovered when the dump
> has been reloaded. This has been the case since 7.3.
>
> If you move the data to a machine with a different floating point
> representation you might get a different number even if the original number
> could be represented exactly in the new representation.

So the same pg_dump file _may_ lead to different databases on different
platforms, even right now. So the issue of 'identical' databases is
not serious.

Note that the float case is worse than the multiline text one. With multiline
text there is always a way to convert it w/o loss or change of information.
All you need is to treat it as a "sequence of lines".
"sequences of lines" are differently represented on different platforms,
just like floats. Whenever you use it on a platform, use the platform
dependent format. When you export it, use a defined external format.
On import, convert the defined external format to the internal one.
Does this lead to different internal formats on different platforms?
Yeah but what's the problem?

The problem originated from a Windows application storing a multiline
text (python function body) and this text being handed to a unix
program that expect a multiline text as input (the python interpreter).
This is a particular case only. _Any_ windows client that inserts
a multiline text is likely to use \r\n as separator, while any unix
client is likely to insert text with \n. For the same input (same
sequence of lines typed by the user), the result is different.
There's no way to write a server-side application that handles that
correctly, right now. Of course 'Hello\r\nWorld\r\n' is different
from 'Hello\n\World\n', as far as the server is concerned. But if
you think of what the users typed, you realize they should be equal.
It _is_ the same data (line 1 is 'Hello' and line 2 is 'World'), just
in different formats. Either the client library should handle that
transparently (converting to an on-the-wire format), or the server should
be aware of what convention the client is using.

Right now the application developer should take care of it, since
PostgreSQL (including client library) treats text as opaque binary data.

(I'm not arguing we should change that. I'm just saying it's not a python bug.)

.TM.
--
       ____/  ____/   /
      /      /       /            Marco Colombo
     ___/  ___  /   /              Technical Manager
    /          /   /             ESI s.r.l.
  _____/ _____/  _/               Colombo@ESI.it

Re: Multiline plpython procedure

From
Tom Lane
Date:
Stuart Bishop <stuart@stuartbishop.net> writes:
> Changing this behavior in Python would break backwards compatibility.

Nonsense.  It would not affect the behavior of any script that was
accepted before.

> In particular, the exec() function accepts strings that have already been
> unescaped:

> exec('print """\n\r\n\r\n"""')

So?  That has nothing to do with the treatment of \r and \n outside a
quoted string.

> I suspect this means fixing this problem in plpythonu for 8.1.

plpython does not have a charter to work around Python bugs.  If the
Python community feels that this is correct behavior for Python,
then so be it.

            regards, tom lane

Re: Multiline plpython procedure

From
Marco Colombo
Date:
On Fri, 21 Jan 2005, Martijn van Oosterhout wrote:

> On Fri, Jan 21, 2005 at 12:02:09PM +0100, Marco Colombo wrote:
>> On Fri, 21 Jan 2005, Greg Stark wrote:
>>> I don't think it's reasonable for pg_dump to think about converting
>>> data from one language to another. It's important for pg_dump to
>>> restore an identical database. Having it start with special case
>>> data conversation from one flavour to another seems too dangerous.
>>
>> Makes no sense. pg_dump already make a lot of conversions: from internal
>> representation (which may be platform dependent) to some common format,
>> say text. It's just multi-line text which is a hard to deal with, because
>> there _no_ single format for it. pg_dump may just choose one format, and
>> stick with it. Every dump/restore will work. You may have trouble editing
>> a text dump, but that's another matter. BTW, what pg_dump does on windows?
>> I mean with -F p. Does it produce a text file with CRNL line seperator?
>> What happens if you feed that file to psql on a Unix box?
>
> Ah, but you see, looking at it from your point of view, pg_dump doesn't
> interpret text strings. For example, the python script in a function is
> a opaque string. Not multiline, nothing. All postgresql does is pass
> that block of opaque data to the interpreter for that language. pg_dump
> dumps that opaque data into the output, and the CREATE FUNCTION dumps
> that opaque data back into the system tables. Postgresql doesn't
> understand python any more or less than perl, tcl, R or any other
> language.

I was referring to psql output in general.
E.g. (comments stripped):
CREATE TABLE t2 (
     f1 text
);

COPY t2 (f1) FROM stdin;
test1
test2
test3
\.

This dump, produced on Unix, will have lines separated by \n. What does the
same dump produced on Windows look like? If it's \n separated, it's not
editable (natively) on Windows. Which is fine to me, we just defined pg_dump
textual output to be \n terminated, always. Or, it's \r\n terminated. If so,
how would it be to restore it on a Unix box (with psql -f). Now, if the
data contains a \r I think it shows like that, escaped. Whether intended
or not, that's the only thing that saves us (note that there's no need
to escape a bare \r in Unix).

> The argument here is that basically this opaque data has different
> meanings for Python on windows and Python on unix. You can't make any
> special cases because I can rename plperl.so to plpython.so (or
> vice-versa) the opaque data won't be passed to the interpreter that
> you'd expect from looking at the definition.
>
>> I'm for defining a format used by PostgreSQL, and force the python parser
>> into accepting it on all platforms. That is, let's set the rule that
>> python programs to be embedded into PostgreSQL use \n as line termination.
>
> Wouldn't that disadvantage non-unix pl/python users, whose python
> functions would have to be converted at run-time to conform to the
> local text format. With the extra bummer that the resulting string may
> not be the same size either. Remember, postgresql uses the standard
> shared library for the language on the platform, it doesn't build its
> own.
> But sure, preprocessing the source at run-time seems to be the only
> realistic solution without a change to the interpreter.

Yeah. My fav. solution is to convert the string to platform format before
passing it to the parser. See the martian example.

>> Think of this: tomorrow we meet people from Mars. One of them really likes
>> PostgreSQL, and ports it to their platform. Being a martian platform, it
>> uses a different text file format. Line separator there is the first 1000
>
> <snip>
>
> Spurious argument. You're assuming Martians would use ASCII to write
> programs without using one of the two defined line-ending characters.
> If they were smart they'd simply use a character set which doesn't have
> the ambiguity. If they even use 8-bit bytes. An ASCII C compiler won't
> compile EBCDIC source code either, but nobody thinks that's
> unreasonable, probably because nobody uses EBCDIC anymore :).

You missed the point. Charset has nothing to do with the issue.
While you can handle both at the same time, they are unrelated.
Line separator is not dictated by the charset, only by the platform.
\r\n or \n or \r for line termination is _not_ defined by ASCII.
The _same_ ASCII textfile looks differently when looked in binary mode
on various platforms. The point was: what if someone introduces
another platform with yet-another-line-termination-standard?
It's unlikely, just like martians. But it makes you realize that
conversion is the job of the software that handles inter-platform
communication (much like FTP).

> No-one is complaining about the use of line-ending characters, they
> could have said that you need a semi-colon to seperate "lines". The
> problem is that it's *not consistant* across platforms.
>
> Have a nice day,

What about C? How about fopen("afile", "r") in C? Is it "portable"?
Or should you use: fopen("afile", "rb")? Define "consistant across
platforms" here. If you use "rb", your program will be consistant
in that with the same _binary_ input, produces the same _binary_
output. But if it's supposed to handle text files, it will fail.
That is, it is consistant if it is supposed to handle binary data,
it is not if it is supposed to handle text files. If you use "r",
it's the opposite. No matter what, your program will never be
completely consistant! You have to decide if it handles text file
_or_ binary data (unless you make runtime detection, that is, of
course - but that's another matter. Under Windows you can assume
a .txt file is "text". Under Unix things are not that simple).

Think of the meaning of '$' in a regular expression. What (binary)
character(s) does it match? I expect it to match \n under Unix and
the sequence \r\n under Windows. What is the usage scope of '$'?
A multiline text. If you look at the data you're using it on as
_binary_ data, it's behaviour it's inconsistant.

Face it, _every_ time you're handling multiline text data, you
should know in advance what separator it uses. If handling includes
moving across platforms, you should take care of conversion, _before_
you pass it to an external program that expects textual input.

Try and read the binmode() entry in the Perl manual. In particular:
" For the sake of portability it is a good idea to always
   use it when appropriate, and to never use it when it isn't
   appropriate."
That is, you should be well aware of that type of data you're
handling, and handle it correctly. Burying your head in the sand
and say "well I treat it as binary opaque data, so I'm fine" is
calling for problems. Expecially when you're moving it across platform.

Otherwise, you _define_ it to be binary data (and users may have
problems in reading it as text).

.TM.
--
       ____/  ____/   /
      /      /       /            Marco Colombo
     ___/  ___  /   /              Technical Manager
    /          /   /             ESI s.r.l.
  _____/ _____/  _/               Colombo@ESI.it