Re: Multiline plpython procedure - Mailing list pgsql-general

From Marco Colombo
Subject Re: Multiline plpython procedure
Date
Msg-id Pine.LNX.4.61.0501191642360.27562@Megathlon.ESI
Whole thread Raw
In response to Re: Multiline plpython procedure  (Martijn van Oosterhout <kleptog@svana.org>)
Responses Re: Multiline plpython procedure  (Greg Stark <gsstark@mit.edu>)
List pgsql-general
On Wed, 19 Jan 2005, Martijn van Oosterhout wrote:

> On Wed, Jan 19, 2005 at 12:20:23PM +0100, Marco Colombo wrote:
>> I think you're missing that vendors define what a 'text file' is on their
>> platform, not Guido. Guido just says that a Python program is a text file,
>> which is a very sound decision, since it makes perfectlty sense to be able
>> to edit it with native tools (text editors which do not support alien
>> textfile formats).
>
> Sure, some text editors don't. Some text editors do. But the C compiler
> accepts programs in any of these formats. And consider multiple
> machines working off the same file server. There is no "standard" text
> format and everyone should just get along.

Exaclty. Or, one could say: the "standard" text format is the one the
platform you are running on dictates. Which is what python does.
Multiple machine from a file server had better to agree on what a text
file is. Or do runtime conversions. Or let the server do that.
The issue affects _any_ text file (this email to name one) not only
python programs. [aside note: for e-mail there actually is a well
defined "on the wire" format, and applications are expected to make
the conversion when needed]

> The C standard explicitly defines \r and \n as whitespace, thus neatly
> avoiding the entire issue. Many other languages do the same. The fact
> is the python is the odd one out.

You're missing the point. The C source file is not a text file, it's
a binary sequence of bytes (which is quite unfortunate, you may
d/l a .c file and be not able to see/read it on your platform, while
the C compliler groks it happily). There's no _line_ separator in C.
If you've ever heard of obfuscated-C contexts, you know that you
can write a complete C program that actually does something in one
line, since C uses a _statements_ separator (';') and not a line
separator. So C is precisely an example of what you should not do:
use a binary file as source, pretending it's a text file. This may
actually make sense, historically, but definitely it's against
python attitude. Python source files are, like it or not, well formed
text file, and the parser even requires correct indentation.

"Be very picky in what you accept"... after all, you're a _formal_
language. You already put a thousand requirements (a whole grammar)
in what you receive, why not adding also a few ones that force an
improved readability. Think of how hard it is for newbies to spot
a missing ; in C. Compare to how easy is to spot a missing line
break (actually, I think any newbie gets line breaking naturally
right from the start). Having the source of your programs be
line-oriented (opposed to statement oriented) is big win for a
language designer. And correclty indended from the start is even
better.
You may not agree with the last statements, but that's the python
way, a design (and general attitude) decision. There's no point
in sending a bug report about it.

> Be liberal in what you receive. After, what's the benefit of having
> python source that's not runnable on every computer. Without
> conversion.

Python source of course is runnable on every computer, provided that
the source file is a real text file for that platform.
If you downloaded any text file (not just python source files) by
the _wrong_ mean (e.g. FTP binary mode from a Unix server) on Windows
you'll have problems in handling it. You cannot view it (notepad)
you - very likely - cannot print it. (Yeah your <insert favorite 3rd
party editor> may be able perform both operations, but that's not the
point). Are you expecting your python interpreter on windows to be able
to handle it? Why? It's not a text file, it's binary garbage, the same
you see with notepad or on your printer when you try and print it.
See the point? (It's subtle: python somehow requires a program to be
human readable, and that means it has to be a text file, correctly
formatted for the platform).

I can see only two ways to address the issue:

- convert the string that represents the python program to a correct
   multi-line string (according to the rules of the platform we're
   running on) before we pass it to the python interpreter;

- explicitly set one format as the right one for our purpose
   ("embedded python in PostgreSQL"), and have the python interpreter
   we use comply, no matter of the platform we're running on.

Of course, setting the rule:

- python scripts should be correctly formatted multi-line strings
   according to _server_ platform,

will work as well, but places extra burden on the clients (and/or users).

Note that an option or env. variable like:

$ python -T dos file.py

$ export PYTHONTEXTFORMAT=dos
$ python file.py

would be great to have, of course (and that can be suggested).

.TM.
--
       ____/  ____/   /
      /      /       /            Marco Colombo
     ___/  ___  /   /              Technical Manager
    /          /   /             ESI s.r.l.
  _____/ _____/  _/               Colombo@ESI.it

pgsql-general by date:

Previous
From: Vikram Singh
Date:
Subject: Thank you
Next
From: "Vanole, Mike"
Date:
Subject: Calculating a moving average