Re: multi line text data/query ?bug? - Mailing list pgsql-general

From Marco Colombo
Subject Re: multi line text data/query ?bug?
Date
Msg-id Pine.LNX.4.61.0503231028120.26346@Megathlon.ESI
Whole thread Raw
In response to Re: multi line text data/query ?bug?  ("Sim Zacks" <sim@compulab.co.il>)
List pgsql-general
On Wed, 23 Mar 2005, Sim Zacks wrote:

> While I would agree with you that from a purely technical standpoint, the
> user inserted into the database a CRLF and a query with just an LF does not
> exactly match that, from a users and more practical perspective, that does
> not make sense at all. That is why I surrounded the  word bug in ??.
>
> I would say that from a users perspective it qualifies as a bug because they
> did not put in specific binary characters. They want a newline. From a
> database standards perspective, I would argue that any database that allows
> connections from a client without qualifying a required operating system
> should be OS neutral.
>
> I would say it is a bug from a users perspective because the exact same
> query works differently from different clients. Since the user does not
> choose what binary characters to put in, they are invisible to the user.
> Anything that is completely invisible to the user should not be considered
> valid qualifying data.
>
> As there is no postgresql database standard, such as "all newlines are unix
> newlines" it is impossible to write a client that will necessarily return
> the data that you want.
>
> This is the exact problem we are having with Python right now, as a Windows
> client cannot write a python function to be run on a linux server.

Unfortunately, it's not that simple. There are problems with python
when _both_ the client and the server are Windows. Python itself
_always_ uses \n even on Windows. So the only solution is to
"pythonize" the input (convert to \n), no matter what.

For the more general problem of handling text, see my comments in
this thread:
http://archives.postgresql.org/pgsql-general/2005-01/msg00792.php

There are interesting problems with multiline text, as a datatype.
Think of digital signatures and checksums. Think of a simple function:
     len(text)
should it count line separators as characters? In theory, the only
way to get cross-platform consistent behaviour, is to _ignore_ line
separators when counting or checksumming. But the real world solution
is to treat textfiles as binary and let the users or the application
handle the conversion.

.TM.
--
       ____/  ____/   /
      /      /       /            Marco Colombo
     ___/  ___  /   /              Technical Manager
    /          /   /             ESI s.r.l.
  _____/ _____/  _/               Colombo@ESI.it

pgsql-general by date:

Previous
From: Shaun Clements
Date:
Subject: ++ PLPGSQL
Next
From: "A. Mous"
Date:
Subject: Simple query takes a long time on win2K