Thread: Multiline plpython procedure
Hi, I am biwildered at how to create a multi-line plpython function in Postgres. When I create the function in one line like this: CREATE or REPLACE FUNCTION circ (float) RETURNS float AS 'from math import pi; return 2*pi*args[0]' LANGUAGE plpythonu; and then use SELECT circ(1) to test it, it runs well. But if I try to make the code looks better by separating it into mulitple lines, like this: CREATE or REPLACE FUNCTION circ (float) RETURNS float AS ' from math import pi return 2*pi*args[0]' LANGUAGE plpythonu; I got an error message: ERROR: plpython: could not compile function "circ" DETAIL: exceptions.SyntaxError: invalid syntax (line 2) How to get Postgres to accept a normal looking python function? Greetings -- HONG Yuan Homemaster Trading Co., Ltd. No. 601, Bldg. 41, 288 Shuangyang Rd. (N) Shanghai 200433, P.R.C. Tel: +86 21 55056553 Fax: +86 21 55067325 E-mail: hongyuan@homemaster.cn
On Monday 17 January 2005 01:54 am, Hong Yuan wrote: I entered the multilineversion of this function exactly as written here and it ran properly. This was with version 8.0 of Postgres. You might want to do a /df+ circ in psql to see if your editor is putting a space at the beginning of line 2. > Hi, > > I am biwildered at how to create a multi-line plpython function in > Postgres. When I create the function in one line like this: > > CREATE or REPLACE FUNCTION circ (float) > RETURNS float AS 'from math import pi; return 2*pi*args[0]' LANGUAGE > plpythonu; > > and then use SELECT circ(1) to test it, it runs well. > > But if I try to make the code looks better by separating it into > mulitple lines, like this: > > CREATE or REPLACE FUNCTION circ (float) > RETURNS float AS ' > from math import pi > return 2*pi*args[0]' LANGUAGE plpythonu; > > I got an error message: > > ERROR: plpython: could not compile function "circ" > DETAIL: exceptions.SyntaxError: invalid syntax (line 2) > > How to get Postgres to accept a normal looking python function? > > Greetings -- Adrian Klaver aklaver@comcast.net
Adrian Klaver wrote: >On Monday 17 January 2005 01:54 am, Hong Yuan wrote: >I entered the multilineversion of this function exactly as written here and it >ran properly. This was with version 8.0 of Postgres. You might want to do >a /df+ circ in psql to see if your editor is putting a space at the beginning >of line 2. > > The strange thing is that if I create the function in psql, the function works. But if I create the function from PgAdmin III, there was an error, although with \df+ circ I can not identify any visual difference between the two versions of function. See below: .... I create the function with psql .... homemaster=# \df+ circ List of functions Result data type | Schema | Name | Argument data types | Owner | Language | Source code | Description ------------------+--------+------+---------------------+---------+-----------+-------------+------------- double precision | public | circ | double precision | zopeapp | plpythonu | from math import pi return 2*pi*args[0] | (1 row) homemaster=# select circ(1); circ --------------- 6.28318530718 (1 row) .... Now I recreate the function with pgAdmin .... homemaster=# \df+ circ List of functions Result data type | Schema | Name | Argument data types | Owner | Language | Source code | Description ------------------+--------+------+---------------------+---------+-----------+-------------+------------- double precision | public | circ | double precision | zopeapp | plpythonu | from math import pi return 2*pi*args[0] | (1 row) homemaster=# select circ(1); ERROR: plpython: could not compile function "circ" DETAIL: exceptions.SyntaxError: invalid syntax (line 2) I suppose there are some invisible characters inserted into the function body by pgAdmin. I am using pgAdmin III Version 1.2.0 under Chinese Windows XP, while the database is 7.4.6 under Linux. Should be a bug of pgAdmin with encoding or something. At least I know the workaround now. Thanks. -- HONG Yuan Homemaster Trading Co., Ltd. No. 601, Bldg. 41, 288 Shuangyang Rd. (N) Shanghai 200433, P.R.C. Tel: +86 21 55056553 Fax: +86 21 55067325 E-mail: hongyuan@homemaster.cn
Hong Yuan <hongyuan@homemaster.cn> writes: > I suppose there are some invisible characters inserted into the function > body by pgAdmin. Seems like it must be. You could try pg_dump'ing both versions of the function and comparing the output files byte-by-byte. Be sure to let the pgAdmin guys know what you find out. I'm sure they are not intentionally trying to break plpython. (Python is, however, very far out in left field in considering the amount of whitespace to be semantically significant. I personally won't use it voluntarily because of that one reason...) regards, tom lane
Tom Lane wrote: >Hong Yuan <hongyuan@homemaster.cn> writes: > > >>I suppose there are some invisible characters inserted into the function >>body by pgAdmin. >> >> > >Seems like it must be. You could try pg_dump'ing both versions of the >function and comparing the output files byte-by-byte. > >Be sure to let the pgAdmin guys know what you find out. I'm sure they >are not intentionally trying to break plpython. (Python is, however, >very far out in left field in considering the amount of whitespace to >be semantically significant. I personally won't use it voluntarily >because of that one reason...) > > regards, tom lane > > Following your direction, I did a byte-by-byte comparasion of the dump files of the two different version. The sole difference is that in the working version, the line break is represented the unix way as one 0x0a, while in the non-working version the line break is the Windows/Dos version of 0x0d, 0x0a. I am no longer sure whether this is a problem of Postgres or of pgAdmin. Maybe it is just one database configuration? Best regards -- HONG Yuan Homemaster Trading Co., Ltd. No. 601, Bldg. 41, 288 Shuangyang Rd. (N) Shanghai 200433, P.R.C. Tel: +86 21 55056553 Fax: +86 21 55067325 E-mail: hongyuan@homemaster.cn
Hong Yuan <hongyuan@homemaster.cn> writes: > Following your direction, I did a byte-by-byte comparasion of the dump > files of the two different version. The sole difference is that in the > working version, the line break is represented the unix way as one 0x0a, > while in the non-working version the line break is the Windows/Dos > version of 0x0d, 0x0a. Now that you say that, I seem to recall that this has been reported before. It seems odd that in today's climate the Python interpreter would not cope well with Windows-style newlines. Maybe there is some configuration issue with Python itself? > I am no longer sure whether this is a problem of Postgres or of pgAdmin. Postgres is just passing the bytes from point A to point B. You could maybe argue that it's a bad idea for pgAdmin to be using a Windows-centric notion of newline, but I doubt you'll be able to sell those guys on it. Really this seems like a robustness issue for Python. regards, tom lane
Tom Lane wrote: >Now that you say that, I seem to recall that this has been reported >before. It seems odd that in today's climate the Python interpreter >would not cope well with Windows-style newlines. Maybe there is some >configuration issue with Python itself? > > I found a thread on exactly the same problem in the archive http://archives.postgresql.org/pgadmin-support/2004-08/msg00078.php. However the discussion seemed to have died down. >>I am no longer sure whether this is a problem of Postgres or of pgAdmin. >> >> > >Postgres is just passing the bytes from point A to point B. You could >maybe argue that it's a bad idea for pgAdmin to be using a >Windows-centric notion of newline, but I doubt you'll be able to sell >those guys on it. Really this seems like a robustness issue for Python. > > > I can not find any problem with the python configuration itself. It runs program with Windows-styled newlines perfectly. I suspect the problem might lie in the module pl/python, which receives functions from Postgres and passes them to python. Maybe I should file a bug report. -- HONG Yuan Homemaster Trading Co., Ltd. No. 601, Bldg. 41, 288 Shuangyang Rd. (N) Shanghai 200433, P.R.C. Tel: +86 21 55056553 Fax: +86 21 55067325 E-mail: hongyuan@homemaster.cn
On Tue, Jan 18, 2005 at 01:24:31AM -0500, Tom Lane wrote: > > Now that you say that, I seem to recall that this has been reported > before. It seems odd that in today's climate the Python interpreter > would not cope well with Windows-style newlines. Maybe there is some > configuration issue with Python itself? Hmmmm.... #include <Python.h> int main(void) { Py_Initialize(); PyRun_SimpleString("print 'What hath'\n" "print 'Guido wrought?'\n"); Py_Finalize(); return 0; } As written this program works, at least with Python 2.4 on FreeBSD 4.11 and Solaris 9: % ./foo What hath Guido wrought? But if you change LF to CRLF like this: PyRun_SimpleString("print 'What hath'\r\n" "print 'Guido wrought?'\r\n"); then you get this: % ./foo File "<string>", line 1 print 'What hath' ^ SyntaxError: invalid syntax I don't know if that behavior is configurable or not. -- Michael Fuhr http://www.fuhr.org/~mfuhr/
On Tue, Jan 18, 2005 at 01:24:31AM -0500, Tom Lane wrote: > > It seems odd that in today's climate the Python interpreter > would not cope well with Windows-style newlines. Maybe there is some > configuration issue with Python itself? http://docs.python.org/ref/physical.html "A physical line ends in whatever the current platform's convention is for terminating lines. On Unix, this is the ASCII LF (linefeed) character. On Windows, it is the ASCII sequence CR LF (return followed by linefeed). On Macintosh, it is the ASCII CR (return) character." And yet simple tests show that normal scripts run on Unix-like platforms regardless of whether their lines end in LF, CRLF, or CR. -- Michael Fuhr http://www.fuhr.org/~mfuhr/
Michael Fuhr <mike@fuhr.org> writes: > http://docs.python.org/ref/physical.html > "A physical line ends in whatever the current platform's convention > is for terminating lines. On Unix, this is the ASCII LF (linefeed) > character. On Windows, it is the ASCII sequence CR LF (return > followed by linefeed). On Macintosh, it is the ASCII CR (return) > character." Seems like Guido has missed a bet here: namely the case of a script generated on one platform and fed to an interpreter running on another. If I were designing it, I would say that any Python interpreter should take all three variants no matter which platform the interpreter itself is sitting on. Or is cross-platform support not a Python goal? In short, any bug report on this ought to go to the Python project. regards, tom lane
Actually universal newline support seems to be covered by the following PEP and is present in the version of Python(2.3) I am running. http://www.python.org/peps/pep-0278.txt I would tend to agree with Hong Yuan that the problem exists in plpythonu's handling of newlines. On Tuesday 18 January 2005 05:19 am, Tom Lane wrote: > Michael Fuhr <mike@fuhr.org> writes: > > http://docs.python.org/ref/physical.html > > > > "A physical line ends in whatever the current platform's convention > > is for terminating lines. On Unix, this is the ASCII LF (linefeed) > > character. On Windows, it is the ASCII sequence CR LF (return > > followed by linefeed). On Macintosh, it is the ASCII CR (return) > > character." > > Seems like Guido has missed a bet here: namely the case of a script > generated on one platform and fed to an interpreter running on another. > If I were designing it, I would say that any Python interpreter should > take all three variants no matter which platform the interpreter itself > is sitting on. Or is cross-platform support not a Python goal? > > In short, any bug report on this ought to go to the Python project. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 7: don't forget to increase your free space map settings -- Adrian Klaver aklaver@comcast.net
On Tue, Jan 18, 2005 at 07:34:59PM -0800, Adrian Klaver wrote: > Actually universal newline support seems to be covered by the following PEP > and is present in the version of Python(2.3) I am running. > http://www.python.org/peps/pep-0278.txt I see the following in the PEP: There is no support for universal newlines in strings passed to eval() or exec. It is envisioned that such strings always have the standard \n line feed, if the strings come from a file that file can be read with universal newlines. Does the above mean that the PyRun_*() family doesn't have universal newline support? Or at least that some members of the family don't? That would explain why the simple C program I tested failed. http://archives.postgresql.org/pgsql-general/2005-01/msg00876.php > I would tend to agree with Hong Yuan that the problem exists in plpythonu's > handling of newlines. If Python's behavior is intentional then the newline burden would seem to be on the user or on plpythonu. I think Tom's point is that that's just silly.... -- Michael Fuhr http://www.fuhr.org/~mfuhr/
Michael Fuhr wrote: > On Tue, Jan 18, 2005 at 07:34:59PM -0800, Adrian Klaver wrote: > > >>Actually universal newline support seems to be covered by the following PEP >>and is present in the version of Python(2.3) I am running. >>http://www.python.org/peps/pep-0278.txt > > > I see the following in the PEP: > > There is no support for universal newlines in strings passed to > eval() or exec. It is envisioned that such strings always have the > standard \n line feed, if the strings come from a file that file can > be read with universal newlines. > > Does the above mean that the PyRun_*() family doesn't have universal > newline support? Or at least that some members of the family don't? > That would explain why the simple C program I tested failed. > > http://archives.postgresql.org/pgsql-general/2005-01/msg00876.php > > >>I would tend to agree with Hong Yuan that the problem exists in plpythonu's >>handling of newlines. > > > If Python's behavior is intentional then the newline burden would > seem to be on the user or on plpythonu. I think Tom's point is > that that's just silly.... Changing this behavior in Python would break backwards compatibility. In particular, the exec() function accepts strings that have already been unescaped: >>> exec('print """\n\r\n\r\n"""') In the above example, the exec function is being passed a string containing carridge returns and line feeds - not '\n' and '\r' character sequences. It is too late for the Python 2.3 series anyway - 2.3.5 is being released Jan 26th and there won't be a 2.3.6. If it was championed and it decided that the above example is a bug and not a feature and a patch produced, it could get into 2.4.1 due April and 2.5+ I suspect this means fixing this problem in plpythonu for 8.1. -- Stuart Bishop <stuart@stuartbishop.net> http://www.stuartbishop.net/
On Wed, Jan 19, 2005 at 06:28:25PM +1100, Stuart Bishop wrote: > Michael Fuhr wrote: > >If Python's behavior is intentional then the newline burden would > >seem to be on the user or on plpythonu. I think Tom's point is > >that that's just silly.... > > Changing this behavior in Python would break backwards compatibility. In > particular, the exec() function accepts strings that have already been > unescaped: > > >>> exec('print """\n\r\n\r\n"""') > > In the above example, the exec function is being passed a string > containing carridge returns and line feeds - not '\n' and '\r' character > sequences. Ofcourse, if the \r is within a literal string, then ofcourse you can't ignore it. Other languages like C and Perl also maintain any character within a string. The point is that outside of character strings, there is no need to consider a \n different form a \r (unless there is a place in Python where an extra newline changes the meaning). Sure, you can't just run dos2unix over the code, but within the parser this is a simple change. > It is too late for the Python 2.3 series anyway - 2.3.5 is being > released Jan 26th and there won't be a 2.3.6. If it was championed and > it decided that the above example is a bug and not a feature and a patch > produced, it could get into 2.4.1 due April and 2.5+ > > I suspect this means fixing this problem in plpythonu for 8.1. I suggest adding to the Release Notes: User defined functions using the Python language must use the newline delimiter of the server OS. There is currently no standard way of determining the newline delimiter of the server. Note this also affects the portability of pg_dump output. Hope this helps, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Attachment
On Tue, 18 Jan 2005, Tom Lane wrote: > Michael Fuhr <mike@fuhr.org> writes: >> http://docs.python.org/ref/physical.html > >> "A physical line ends in whatever the current platform's convention >> is for terminating lines. On Unix, this is the ASCII LF (linefeed) >> character. On Windows, it is the ASCII sequence CR LF (return >> followed by linefeed). On Macintosh, it is the ASCII CR (return) >> character." > > Seems like Guido has missed a bet here: namely the case of a script > generated on one platform and fed to an interpreter running on another. I think you're missing that vendors define what a 'text file' is on their platform, not Guido. Guido just says that a Python program is a text file, which is a very sound decision, since it makes perfectlty sense to be able to edit it with native tools (text editors which do not support alien textfile formats). What you seem to be missing is that before scripts are "fed to interpreter running on another [platform]" they need to be transferred there! Conversion must happen (if necessary) at that point. That's why the 2000 years old protocol FTP (well, maybe not 2000 years but it _is_ old) has an ASCII transfer mode. Is this situation unfortunate? Yes. Every (programming) language is (or should be) affected by the same problem, since I expect the source file being a _text_ file, everywhere. A \n line-terminated file is not a text file under Windows, per specs. A \r\n line-terminated is not a text file under Unix, per specs. A \r line-terminated is not a text file neither Win or Unix, per specs. (I'm not sure what the specs are under Mac). Those are facts of life you have to deal with everytime you move a text file (such as the source of a program) from platform X to platform Y. It may affect a Cobol or Lisp or C compiler as well. > If I were designing it, I would say that any Python interpreter should > take all three variants no matter which platform the interpreter itself > is sitting on. Or is cross-platform support not a Python goal? Changing what a text file is under all platforms Python aims to run on is not. (I can't speak for Guido of course, but I'm pretty sure it isn't). I'm not against your suggestion, but that won't help with the simple fact that text files need to be converted to what the platform they sit on defines a text file to be. Otherwise, many other native tools fail in treating them as text file. > In short, any bug report on this ought to go to the Python project. Definitely not a bug report for Python. It seems to me is works as expected (that is, as documented). The bug is on the application that transferred the text file over the wire from platform X to platform Y. OTOH, it's true that on the client-server "wire" no text file is tranferred, strictly speaking. Just a string which happens to be a valid python program. Moreover, Python is used more like an embedded scripting language (not as a standalone programming language). So you're right when you expect it to be more tolerant. This is a grey area. It is pretty clear that a text file is a sequence of lines: the separator is platform specific but the user/application becomes aware of it only when the text file is accessed as in binary mode (with some quirks... most native unix tool will precisely that, think of md5sum, since there's no way to recognise a "text file"). It happens that when a text file is _correctly_ accessed, the platform hides the separator (or should do). For "multiline strings" (which is the right data type for a python embedded script), everything is just worse (there's nothing even close to a vague definition). IMHO, everytime such a string is handed to a native tool, it should be converted to the platform specific multiline format (that is, with the right separator). You shouldn't expect the external tool to be able to cope with alien line formats. Alternatively, you should _define_ what the separator is for python embedded script in PostgreSQL, and have the interpreter accept it on every platform unconditionally (I'm not sure whether this is easy or not). Just my 0.03 eurocents. .TM. -- ____/ ____/ / / / / Marco Colombo ___/ ___ / / Technical Manager / / / ESI s.r.l. _____/ _____/ _/ Colombo@ESI.it
On Wed, Jan 19, 2005 at 12:20:23PM +0100, Marco Colombo wrote: > On Tue, 18 Jan 2005, Tom Lane wrote: > >>followed by linefeed). On Macintosh, it is the ASCII CR (return) > >>character." > > > >Seems like Guido has missed a bet here: namely the case of a script > >generated on one platform and fed to an interpreter running on another. > > I think you're missing that vendors define what a 'text file' is on their > platform, not Guido. Guido just says that a Python program is a text file, > which is a very sound decision, since it makes perfectlty sense to be able > to edit it with native tools (text editors which do not support alien > textfile formats). Sure, some text editors don't. Some text editors do. But the C compiler accepts programs in any of these formats. And consider multiple machines working off the same file server. There is no "standard" text format and everyone should just get along. The C standard explicitly defines \r and \n as whitespace, thus neatly avoiding the entire issue. Many other languages do the same. The fact is the python is the odd one out. Be liberal in what you receive. After, what's the benefit of having python source that's not runnable on every computer. Without conversion. Hope this helps, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Attachment
Martijn van Oosterhout wrote: > On Wed, Jan 19, 2005 at 06:28:25PM +1100, Stuart Bishop wrote: > >>Michael Fuhr wrote: >> >>>If Python's behavior is intentional then the newline burden would >>>seem to be on the user or on plpythonu. I think Tom's point is >>>that that's just silly.... >> >>Changing this behavior in Python would break backwards compatibility. In >>particular, the exec() function accepts strings that have already been >>unescaped: >> >> >>>>>exec('print """\n\r\n\r\n"""') >> >>In the above example, the exec function is being passed a string >>containing carridge returns and line feeds - not '\n' and '\r' character >>sequences. > > > Ofcourse, if the \r is within a literal string, then ofcourse you can't > ignore it. Other languages like C and Perl also maintain any character > within a string. The point is that outside of character strings, there > is no need to consider a \n different form a \r (unless there is a > place in Python where an extra newline changes the meaning). > > Sure, you can't just run dos2unix over the code, but within the parser > this is a simple change. Oh - I had confused myself. Your point about dos2unix shows that plpythonu might not be able to do this 'correctly' unless it understands a good chunk of Python syntax. It could do it 'good enough' if that is acceptible. I'll take this to python-dev, but unfortunately I think my comment below about the 2.3 series still stands unless higher powers believe this is a show stopper. >>It is too late for the Python 2.3 series anyway - 2.3.5 is being >>released Jan 26th and there won't be a 2.3.6. If it was championed and >>it decided that the above example is a bug and not a feature and a patch >>produced, it could get into 2.4.1 due April and 2.5+ >> >>I suspect this means fixing this problem in plpythonu for 8.1. > > > I suggest adding to the Release Notes: > > User defined functions using the Python language must use the newline > delimiter of the server OS. There is currently no standard way of > determining the newline delimiter of the server. Note this also > affects the portability of pg_dump output. I don't see how it affects the portability of pg_dump. If you have a working Python function (with unix line endings), won't pg_dump dump the source with unix line endings? -- Stuart Bishop <stuart@stuartbishop.net> http://www.stuartbishop.net/
On Thu, Jan 20, 2005 at 09:50:43AM +1100, Stuart Bishop wrote: > Martijn van Oosterhout wrote: > > User defined functions using the Python language must use the newline > > delimiter of the server OS. There is currently no standard way of > > determining the newline delimiter of the server. Note this also > > affects the portability of pg_dump output. > > I don't see how it affects the portability of pg_dump. If you have a > working Python function (with unix line endings), won't pg_dump dump the > source with unix line endings? It will ... so it won't work on Windows (maybe Mac OS X) servers. -- Alvaro Herrera (<alvherre[@]dcc.uchile.cl>) "La grandeza es una experiencia transitoria. Nunca es consistente. Depende en gran parte de la imaginación humana creadora de mitos" (Irulan)
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Uh, does the Python doc specify "platform" line endings, or "normal (\n)" line endings? It sounded to me like it always wanted the UNIX-style \n line endings, so that using those would result in portability... On Jan 19, 2005, at 6:03 PM, Alvaro Herrera wrote: > On Thu, Jan 20, 2005 at 09:50:43AM +1100, Stuart Bishop wrote: >> Martijn van Oosterhout wrote: > >>> User defined functions using the Python language must use the >>> newline >>> delimiter of the server OS. There is currently no standard way of >>> determining the newline delimiter of the server. Note this also >>> affects the portability of pg_dump output. >> >> I don't see how it affects the portability of pg_dump. If you have a >> working Python function (with unix line endings), won't pg_dump dump >> the >> source with unix line endings? > > It will ... so it won't work on Windows (maybe Mac OS X) servers. > > -- > Alvaro Herrera (<alvherre[@]dcc.uchile.cl>) > "La grandeza es una experiencia transitoria. Nunca es consistente. > Depende en gran parte de la imaginación humana creadora de mitos" > (Irulan) > > ---------------------------(end of > broadcast)--------------------------- > TIP 9: the planner will ignore your desire to choose an index scan if > your > joining column's datatypes do not match > > - ----------------------------------------------------------- Frank D. Engel, Jr. <fde101@fjrhome.net> $ ln -s /usr/share/kjvbible /usr/manual $ true | cat /usr/manual | grep "John 3:16" John 3:16 For God so loved the world, that he gave his only begotten Son, that whosoever believeth in him should not perish, but have everlasting life. $ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (Darwin) iD8DBQFB7vYZ7aqtWrR9cZoRAszxAKCJpwXuWU/icjj8YSyKej/daEhQyQCfRB4I mk5qcMxWeDRa5RHIxP+9lfw= =UUu8 -----END PGP SIGNATURE----- ___________________________________________________________ $0 Web Hosting with up to 120MB web space, 1000 MB Transfer 10 Personalized POP and Web E-mail Accounts, and much more. Signup at www.doteasy.com
On Wed, Jan 19, 2005 at 07:06:49PM -0500, Frank D. Engel, Jr. wrote: > Uh, does the Python doc specify "platform" line endings, or "normal > (\n)" line endings? It sounded to me like it always wanted the > UNIX-style \n line endings, so that using those would result in > portability... That's the problem, Python specifies "platform" line endings. See this note: http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/66434 > When working between platforms, it is often necessary to convert the > line endings on files for them to work, especially when it comes to > code. Pass Unix Python code with \r and it goes nowhere. Same on Mac > Python with \n. This code simply and easily fixes the problem. Note, one of these links provides a possible solution: http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/59867 > This code eliminates the need to convert line endings when moving .py > modules between OSes. Put in your sitecustomize.py, anywhere on > sys.path, and you'll be able to import Python modules with any of > Unix, Mac, or Windows line endings, on any OS. I don't understand how it works, but people claim it solves their problem... Hope this helps, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Attachment
On Wed, 19 Jan 2005, Martijn van Oosterhout wrote: > On Wed, Jan 19, 2005 at 12:20:23PM +0100, Marco Colombo wrote: >> I think you're missing that vendors define what a 'text file' is on their >> platform, not Guido. Guido just says that a Python program is a text file, >> which is a very sound decision, since it makes perfectlty sense to be able >> to edit it with native tools (text editors which do not support alien >> textfile formats). > > Sure, some text editors don't. Some text editors do. But the C compiler > accepts programs in any of these formats. And consider multiple > machines working off the same file server. There is no "standard" text > format and everyone should just get along. Exaclty. Or, one could say: the "standard" text format is the one the platform you are running on dictates. Which is what python does. Multiple machine from a file server had better to agree on what a text file is. Or do runtime conversions. Or let the server do that. The issue affects _any_ text file (this email to name one) not only python programs. [aside note: for e-mail there actually is a well defined "on the wire" format, and applications are expected to make the conversion when needed] > The C standard explicitly defines \r and \n as whitespace, thus neatly > avoiding the entire issue. Many other languages do the same. The fact > is the python is the odd one out. You're missing the point. The C source file is not a text file, it's a binary sequence of bytes (which is quite unfortunate, you may d/l a .c file and be not able to see/read it on your platform, while the C compliler groks it happily). There's no _line_ separator in C. If you've ever heard of obfuscated-C contexts, you know that you can write a complete C program that actually does something in one line, since C uses a _statements_ separator (';') and not a line separator. So C is precisely an example of what you should not do: use a binary file as source, pretending it's a text file. This may actually make sense, historically, but definitely it's against python attitude. Python source files are, like it or not, well formed text file, and the parser even requires correct indentation. "Be very picky in what you accept"... after all, you're a _formal_ language. You already put a thousand requirements (a whole grammar) in what you receive, why not adding also a few ones that force an improved readability. Think of how hard it is for newbies to spot a missing ; in C. Compare to how easy is to spot a missing line break (actually, I think any newbie gets line breaking naturally right from the start). Having the source of your programs be line-oriented (opposed to statement oriented) is big win for a language designer. And correclty indended from the start is even better. You may not agree with the last statements, but that's the python way, a design (and general attitude) decision. There's no point in sending a bug report about it. > Be liberal in what you receive. After, what's the benefit of having > python source that's not runnable on every computer. Without > conversion. Python source of course is runnable on every computer, provided that the source file is a real text file for that platform. If you downloaded any text file (not just python source files) by the _wrong_ mean (e.g. FTP binary mode from a Unix server) on Windows you'll have problems in handling it. You cannot view it (notepad) you - very likely - cannot print it. (Yeah your <insert favorite 3rd party editor> may be able perform both operations, but that's not the point). Are you expecting your python interpreter on windows to be able to handle it? Why? It's not a text file, it's binary garbage, the same you see with notepad or on your printer when you try and print it. See the point? (It's subtle: python somehow requires a program to be human readable, and that means it has to be a text file, correctly formatted for the platform). I can see only two ways to address the issue: - convert the string that represents the python program to a correct multi-line string (according to the rules of the platform we're running on) before we pass it to the python interpreter; - explicitly set one format as the right one for our purpose ("embedded python in PostgreSQL"), and have the python interpreter we use comply, no matter of the platform we're running on. Of course, setting the rule: - python scripts should be correctly formatted multi-line strings according to _server_ platform, will work as well, but places extra burden on the clients (and/or users). Note that an option or env. variable like: $ python -T dos file.py $ export PYTHONTEXTFORMAT=dos $ python file.py would be great to have, of course (and that can be suggested). .TM. -- ____/ ____/ / / / / Marco Colombo ___/ ___ / / Technical Manager / / / ESI s.r.l. _____/ _____/ _/ Colombo@ESI.it
Marco Colombo <marco@esi.it> writes: > Exaclty. Or, one could say: the "standard" text format is the one the > platform you are running on dictates. Which is what python does. Egads. So the set of valid Python programs is different depending on what platform you're on? That's just, uhm, insane. So essentially Python isn't a single language, it's a set of languages, Python-NL, Python-NLCR, Python-CR, (and in theory others). So if I generate a database with a Python-CRNL function on windows, then pg_dump it and load it on Unix the function won't run because it's the wrong language, Unix only supports Python-NL. I don't think it's reasonable for pg_dump to think about converting data from one language to another. It's important for pg_dump to restore an identical database. Having it start with special case data conversation from one flavour to another seems too dangerous. Incidentally, are we sure we've diagnosed this correctly? I'm discussing this with some Python developers and they're expressing skepticism. One just tried a quick test with a Python program containing a mixture of all three newline flavours and it ran fine. -- greg
Greg Stark <gsstark@mit.edu> writes: > Egads. So the set of valid Python programs is different depending on what > platform you're on? That's just, uhm, insane. No quibble here. > Incidentally, are we sure we've diagnosed this correctly? I'm discussing this > with some Python developers and they're expressing skepticism. One just tried > a quick test with a Python program containing a mixture of all three newline > flavours and it ran fine. He tried reading 'em from files, using Python's text-format-converting file reader, no? See the test case posted up-thread, which demonstrates that feeding a string directly to PyExec (or whatever its called) produces newline-sensitive results. regards, tom lane
Tom Lane wrote: > Greg Stark <gsstark@mit.edu> writes: > >>Egads. So the set of valid Python programs is different depending on what >>platform you're on? That's just, uhm, insane. > > > No quibble here. > > >>Incidentally, are we sure we've diagnosed this correctly? I'm discussing this >>with some Python developers and they're expressing skepticism. One just tried >>a quick test with a Python program containing a mixture of all three newline >>flavours and it ran fine. > > > He tried reading 'em from files, using Python's text-format-converting > file reader, no? See the test case posted up-thread, which demonstrates > that feeding a string directly to PyExec (or whatever its called) > produces newline-sensitive results. This is currently being discussed on python-dev: http://mail.python.org/pipermail/python-dev/2005-January/051203.html It looks like my earlier concerns were unwarrented - current consensus seems to be to transform line endings in the string to the one-true-format expected by Python's guts: http://mail.python.org/pipermail/python-dev/2005-January/051214.html -- Stuart Bishop <stuart@stuartbishop.net> http://www.stuartbishop.net/
On Fri, 2005-01-21 at 01:30 -0500, Tom Lane wrote: > Greg Stark <gsstark@mit.edu> writes: > > Egads. So the set of valid Python programs is different depending on what > > platform you're on? That's just, uhm, insane. > > No quibble here. Funny discussion. Did you ever try a shell script written on dos/windows, transferred binary to *nix, with for example bash? Strange errors at the #! line? Yes. Here you are. You cant blame python for it. Python bytecode however is in fact running on every platform. I think an approach where plpython compiles just like plpgsql should be perfectly valid. -- Tino Wildenhain <tino@wildenhain.de>
On Fri, 21 Jan 2005, Greg Stark wrote: > > Marco Colombo <marco@esi.it> writes: > >> Exaclty. Or, one could say: the "standard" text format is the one the >> platform you are running on dictates. Which is what python does. > > Egads. So the set of valid Python programs is different depending on what > platform you're on? That's just, uhm, insane. So essentially Python isn't a > single language, it's a set of languages, Python-NL, Python-NLCR, Python-CR, > (and in theory others). No. Just any other application that reads text files, it reads text files. That simple. It's unfortunate that 'textfile' means different things on different platforms. > So if I generate a database with a Python-CRNL function on windows, then > pg_dump it and load it on Unix the function won't run because it's the wrong > language, Unix only supports Python-NL. > > I don't think it's reasonable for pg_dump to think about converting data from > one language to another. It's important for pg_dump to restore an identical > database. Having it start with special case data conversation from one flavour > to another seems too dangerous. Makes no sense. pg_dump already make a lot of conversions: from internal representation (which may be platform dependent) to some common format, say text. It's just multi-line text which is a hard to deal with, because there _no_ single format for it. pg_dump may just choose one format, and stick with it. Every dump/restore will work. You may have trouble editing a text dump, but that's another matter. BTW, what pg_dump does on windows? I mean with -F p. Does it produce a text file with CRNL line seperator? What happens if you feed that file to psql on a Unix box? I've tried (adding spurious CRs) on Unix, and I think SQL treats CR as whitespace so it's no issue. But what for opposite? Is psql on Windows able to recognize SQL scripts made on Unix? (I can't try this). Anyway, think of floats. If you want do to FP maths fast, you need to use the native format supported by the CPU. When you dump, you get a text form of the FP number, and when you restore on a different platform you may get a _different_ number. And you have to live with it. Kiss goodbye to your "indentical database". > Incidentally, are we sure we've diagnosed this correctly? I'm discussing this > with some Python developers and they're expressing skepticism. One just tried > a quick test with a Python program containing a mixture of all three newline > flavours and it ran fine. Recent python has universal newline support. It works for files, and it's enabled by default when it read source files. But it's NOT part of the parser, AFAIK, and the source file gets converted to UNIX format before being fed to the parser (lexxer). Problem is that I'm not sure that's the way python is used by PostgreSQL. It works only when the program is read from a file. That's what the guy tested, probably. If you build a program, put it in a string, and invoke the parser, the string must be in Unix format. I'm for defining a format used by PostgreSQL, and force the python parser into accepting it on all platforms. That is, let's set the rule that python programs to be embedded into PostgreSQL use \n as line termination. Think of this: tomorrow we meet people from Mars. One of them really likes PostgreSQL, and ports it to their platform. Being a martian platform, it uses a different text file format. Line separator there is the first 1000 binary digits of PI. When he writes a small python function on his client and tries to have it run on a server on Earth, it fails, cause the python parser here won't handle PI-terminated lines correctly. What would you do? Bug python developers because "python it's a set of languages, Python-Earth, Python-Mars, Python-Venus (and in theory others)"? (BTW, in that situation, I bet Perl would fail as well). Or would you ask the martian guy to add, as part of his port effort, support for the martian line format to PostgreSQL, so that the server can convert the python program to Earth format before feeding it to python? Or, alternatively, just tell him: python programs in PostgreSQL are \n terminated? Which one is the simplest? .TM. -- ____/ ____/ / / / / Marco Colombo ___/ ___ / / Technical Manager / / / ESI s.r.l. _____/ _____/ _/ Colombo@ESI.it
On Fri, 21 Jan 2005, Stuart Bishop wrote: > This is currently being discussed on python-dev: > > http://mail.python.org/pipermail/python-dev/2005-January/051203.html > > It looks like my earlier concerns were unwarrented - current consensus seems > to be to transform line endings in the string to the one-true-format expected > by Python's guts: > > http://mail.python.org/pipermail/python-dev/2005-January/051214.html I'm not sure that exec expects strings in Unix format: http://docs.python.org/ref/physical.html "A physical line ends in whatever the current platform's convention is for terminating lines. On Unix, this is the ASCII LF (linefeed) character. On Windows, it is the ASCII sequence CR LF (return followed by linefeed). On Macintosh, it is the ASCII CR (return) character." Reading the following message from Guido, and the above paragraph, I think that the lexxer always uses platform conventions to split lines. http://mail.python.org/pipermail/python-dev/2002-March/021741.html "I still think that this PEP is a big hack -- but as big hacks go, it seems to have a pretty good payback. I'm hoping that eventually the parser (really the lexer) will be able to open the file in binary mode and recognize all three newline styles directly. That would solve the problems with exec, eval, and compile." So I think you can't just pass exec a string with \n as line terminator. That would work only under Unix. You should use os.linesep instead. E.g.: import os import re # a small program with mixed line ending conventions a = "print 1\rprint 2\r\nprint 3\nprint 4\n" try: exec a except SyntaxError: print "SyntaxError" pass # transform it according to local line ending convention b = os.linesep.join(re.split("\r\n|\r|\n", a)) try: exec b except SyntaxError: print "SyntaxError" pass This produces, under Unix: SyntaxError 1 2 3 4 Anyone can try it under Windows and under some old Mac? I think we could do the same before passing the string to the interpreter (I'm not familiar with how the interpreter is called in PostgreSQL, tho). For those not fluent in python, this line: os.linesep.join(re.split("\r\n|\r|\n", a)) first splits the string into a list for lines, using this pattern: \r\n|\r|\n as separator. Then joins the lines again using os.linesep as separator instead. .TM. -- ____/ ____/ / / / / Marco Colombo ___/ ___ / / Technical Manager / / / ESI s.r.l. _____/ _____/ _/ Colombo@ESI.it
On Fri, Jan 21, 2005 at 12:02:09PM +0100, Marco Colombo wrote: > On Fri, 21 Jan 2005, Greg Stark wrote: > >I don't think it's reasonable for pg_dump to think about converting > >data from one language to another. It's important for pg_dump to > >restore an identical database. Having it start with special case > >data conversation from one flavour to another seems too dangerous. > > Makes no sense. pg_dump already make a lot of conversions: from internal > representation (which may be platform dependent) to some common format, > say text. It's just multi-line text which is a hard to deal with, because > there _no_ single format for it. pg_dump may just choose one format, and > stick with it. Every dump/restore will work. You may have trouble editing > a text dump, but that's another matter. BTW, what pg_dump does on windows? > I mean with -F p. Does it produce a text file with CRNL line seperator? > What happens if you feed that file to psql on a Unix box? Ah, but you see, looking at it from your point of view, pg_dump doesn't interpret text strings. For example, the python script in a function is a opaque string. Not multiline, nothing. All postgresql does is pass that block of opaque data to the interpreter for that language. pg_dump dumps that opaque data into the output, and the CREATE FUNCTION dumps that opaque data back into the system tables. Postgresql doesn't understand python any more or less than perl, tcl, R or any other language. The argument here is that basically this opaque data has different meanings for Python on windows and Python on unix. You can't make any special cases because I can rename plperl.so to plpython.so (or vice-versa) the opaque data won't be passed to the interpreter that you'd expect from looking at the definition. > I'm for defining a format used by PostgreSQL, and force the python parser > into accepting it on all platforms. That is, let's set the rule that > python programs to be embedded into PostgreSQL use \n as line termination. Wouldn't that disadvantage non-unix pl/python users, whose python functions would have to be converted at run-time to conform to the local text format. With the extra bummer that the resulting string may not be the same size either. Remember, postgresql uses the standard shared library for the language on the platform, it doesn't build its own. But sure, preprocessing the source at run-time seems to be the only realistic solution without a change to the interpreter. > Think of this: tomorrow we meet people from Mars. One of them really likes > PostgreSQL, and ports it to their platform. Being a martian platform, it > uses a different text file format. Line separator there is the first 1000 <snip> Spurious argument. You're assuming Martians would use ASCII to write programs without using one of the two defined line-ending characters. If they were smart they'd simply use a character set which doesn't have the ambiguity. If they even use 8-bit bytes. An ASCII C compiler won't compile EBCDIC source code either, but nobody thinks that's unreasonable, probably because nobody uses EBCDIC anymore :). No-one is complaining about the use of line-ending characters, they could have said that you need a semi-colon to seperate "lines". The problem is that it's *not consistant* across platforms. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Attachment
On Fri, Jan 21, 2005 at 12:02:09 +0100, Marco Colombo <pgsql@esiway.net> wrote: > On Fri, 21 Jan 2005, Greg Stark wrote: > > Anyway, think of floats. If you want do to FP maths fast, you need to use > the native format supported by the CPU. When you dump, you get a text > form of the FP number, and when you restore on a different platform you > may get a _different_ number. And you have to live with it. Kiss goodbye > to your "indentical database". If you are going to another system that uses the same floating point representation, you should get the same number. pg_dump writes out enough digits that the exact number can be recovered when the dump has been reloaded. This has been the case since 7.3. If you move the data to a machine with a different floating point representation you might get a different number even if the original number could be represented exactly in the new representation.
On Fri, 21 Jan 2005, Bruno Wolff III wrote: > On Fri, Jan 21, 2005 at 12:02:09 +0100, > If you are going to another system that uses the same floating point > representation, you should get the same number. pg_dump writes out > enough digits that the exact number can be recovered when the dump > has been reloaded. This has been the case since 7.3. > > If you move the data to a machine with a different floating point > representation you might get a different number even if the original number > could be represented exactly in the new representation. So the same pg_dump file _may_ lead to different databases on different platforms, even right now. So the issue of 'identical' databases is not serious. Note that the float case is worse than the multiline text one. With multiline text there is always a way to convert it w/o loss or change of information. All you need is to treat it as a "sequence of lines". "sequences of lines" are differently represented on different platforms, just like floats. Whenever you use it on a platform, use the platform dependent format. When you export it, use a defined external format. On import, convert the defined external format to the internal one. Does this lead to different internal formats on different platforms? Yeah but what's the problem? The problem originated from a Windows application storing a multiline text (python function body) and this text being handed to a unix program that expect a multiline text as input (the python interpreter). This is a particular case only. _Any_ windows client that inserts a multiline text is likely to use \r\n as separator, while any unix client is likely to insert text with \n. For the same input (same sequence of lines typed by the user), the result is different. There's no way to write a server-side application that handles that correctly, right now. Of course 'Hello\r\nWorld\r\n' is different from 'Hello\n\World\n', as far as the server is concerned. But if you think of what the users typed, you realize they should be equal. It _is_ the same data (line 1 is 'Hello' and line 2 is 'World'), just in different formats. Either the client library should handle that transparently (converting to an on-the-wire format), or the server should be aware of what convention the client is using. Right now the application developer should take care of it, since PostgreSQL (including client library) treats text as opaque binary data. (I'm not arguing we should change that. I'm just saying it's not a python bug.) .TM. -- ____/ ____/ / / / / Marco Colombo ___/ ___ / / Technical Manager / / / ESI s.r.l. _____/ _____/ _/ Colombo@ESI.it
Stuart Bishop <stuart@stuartbishop.net> writes: > Changing this behavior in Python would break backwards compatibility. Nonsense. It would not affect the behavior of any script that was accepted before. > In particular, the exec() function accepts strings that have already been > unescaped: > exec('print """\n\r\n\r\n"""') So? That has nothing to do with the treatment of \r and \n outside a quoted string. > I suspect this means fixing this problem in plpythonu for 8.1. plpython does not have a charter to work around Python bugs. If the Python community feels that this is correct behavior for Python, then so be it. regards, tom lane
On Fri, 21 Jan 2005, Martijn van Oosterhout wrote: > On Fri, Jan 21, 2005 at 12:02:09PM +0100, Marco Colombo wrote: >> On Fri, 21 Jan 2005, Greg Stark wrote: >>> I don't think it's reasonable for pg_dump to think about converting >>> data from one language to another. It's important for pg_dump to >>> restore an identical database. Having it start with special case >>> data conversation from one flavour to another seems too dangerous. >> >> Makes no sense. pg_dump already make a lot of conversions: from internal >> representation (which may be platform dependent) to some common format, >> say text. It's just multi-line text which is a hard to deal with, because >> there _no_ single format for it. pg_dump may just choose one format, and >> stick with it. Every dump/restore will work. You may have trouble editing >> a text dump, but that's another matter. BTW, what pg_dump does on windows? >> I mean with -F p. Does it produce a text file with CRNL line seperator? >> What happens if you feed that file to psql on a Unix box? > > Ah, but you see, looking at it from your point of view, pg_dump doesn't > interpret text strings. For example, the python script in a function is > a opaque string. Not multiline, nothing. All postgresql does is pass > that block of opaque data to the interpreter for that language. pg_dump > dumps that opaque data into the output, and the CREATE FUNCTION dumps > that opaque data back into the system tables. Postgresql doesn't > understand python any more or less than perl, tcl, R or any other > language. I was referring to psql output in general. E.g. (comments stripped): CREATE TABLE t2 ( f1 text ); COPY t2 (f1) FROM stdin; test1 test2 test3 \. This dump, produced on Unix, will have lines separated by \n. What does the same dump produced on Windows look like? If it's \n separated, it's not editable (natively) on Windows. Which is fine to me, we just defined pg_dump textual output to be \n terminated, always. Or, it's \r\n terminated. If so, how would it be to restore it on a Unix box (with psql -f). Now, if the data contains a \r I think it shows like that, escaped. Whether intended or not, that's the only thing that saves us (note that there's no need to escape a bare \r in Unix). > The argument here is that basically this opaque data has different > meanings for Python on windows and Python on unix. You can't make any > special cases because I can rename plperl.so to plpython.so (or > vice-versa) the opaque data won't be passed to the interpreter that > you'd expect from looking at the definition. > >> I'm for defining a format used by PostgreSQL, and force the python parser >> into accepting it on all platforms. That is, let's set the rule that >> python programs to be embedded into PostgreSQL use \n as line termination. > > Wouldn't that disadvantage non-unix pl/python users, whose python > functions would have to be converted at run-time to conform to the > local text format. With the extra bummer that the resulting string may > not be the same size either. Remember, postgresql uses the standard > shared library for the language on the platform, it doesn't build its > own. > But sure, preprocessing the source at run-time seems to be the only > realistic solution without a change to the interpreter. Yeah. My fav. solution is to convert the string to platform format before passing it to the parser. See the martian example. >> Think of this: tomorrow we meet people from Mars. One of them really likes >> PostgreSQL, and ports it to their platform. Being a martian platform, it >> uses a different text file format. Line separator there is the first 1000 > > <snip> > > Spurious argument. You're assuming Martians would use ASCII to write > programs without using one of the two defined line-ending characters. > If they were smart they'd simply use a character set which doesn't have > the ambiguity. If they even use 8-bit bytes. An ASCII C compiler won't > compile EBCDIC source code either, but nobody thinks that's > unreasonable, probably because nobody uses EBCDIC anymore :). You missed the point. Charset has nothing to do with the issue. While you can handle both at the same time, they are unrelated. Line separator is not dictated by the charset, only by the platform. \r\n or \n or \r for line termination is _not_ defined by ASCII. The _same_ ASCII textfile looks differently when looked in binary mode on various platforms. The point was: what if someone introduces another platform with yet-another-line-termination-standard? It's unlikely, just like martians. But it makes you realize that conversion is the job of the software that handles inter-platform communication (much like FTP). > No-one is complaining about the use of line-ending characters, they > could have said that you need a semi-colon to seperate "lines". The > problem is that it's *not consistant* across platforms. > > Have a nice day, What about C? How about fopen("afile", "r") in C? Is it "portable"? Or should you use: fopen("afile", "rb")? Define "consistant across platforms" here. If you use "rb", your program will be consistant in that with the same _binary_ input, produces the same _binary_ output. But if it's supposed to handle text files, it will fail. That is, it is consistant if it is supposed to handle binary data, it is not if it is supposed to handle text files. If you use "r", it's the opposite. No matter what, your program will never be completely consistant! You have to decide if it handles text file _or_ binary data (unless you make runtime detection, that is, of course - but that's another matter. Under Windows you can assume a .txt file is "text". Under Unix things are not that simple). Think of the meaning of '$' in a regular expression. What (binary) character(s) does it match? I expect it to match \n under Unix and the sequence \r\n under Windows. What is the usage scope of '$'? A multiline text. If you look at the data you're using it on as _binary_ data, it's behaviour it's inconsistant. Face it, _every_ time you're handling multiline text data, you should know in advance what separator it uses. If handling includes moving across platforms, you should take care of conversion, _before_ you pass it to an external program that expects textual input. Try and read the binmode() entry in the Perl manual. In particular: " For the sake of portability it is a good idea to always use it when appropriate, and to never use it when it isn't appropriate." That is, you should be well aware of that type of data you're handling, and handle it correctly. Burying your head in the sand and say "well I treat it as binary opaque data, so I'm fine" is calling for problems. Expecially when you're moving it across platform. Otherwise, you _define_ it to be binary data (and users may have problems in reading it as text). .TM. -- ____/ ____/ / / / / Marco Colombo ___/ ___ / / Technical Manager / / / ESI s.r.l. _____/ _____/ _/ Colombo@ESI.it