Thread: newline conversion in SQL command strings
I have received a number of bug reports about plsh choking on Windows-style line endings. The problem is that the user uses some Windows-based tool or other to execute an SQL command line this: CREATE FUNCTION foo() RETURNS something<CR><LF> LANGUAGE plsh<CR><LF> AS $$<CR><LF> #!/bin/sh<CR><LF> <CR><LF> do something<CR><LF> do something<CR><LF> $$;<CR><LF> which (apparently, I don't have Windows handy) creates a function with the prosrc body of '<CR><LF> #!/bin/sh<CR><LF> <CR><LF> do something<CR><LF> do something<CR><LF> ' But executing this fails because Unix shells reject <CR> characters in inappropriate places as syntax errors. I don't know how to handle that. It would be unfortunate to have the behavior of a function depend on the kind of client used to create or modify it. I suppose the other more mainstream PLs don't expose that problem so much because the interpreters can handle either kind of line ending. But there could still be functional differences if for example a line ending is embedded into a string constant. Ideas?
On 20.09.2012 05:56, Peter Eisentraut wrote: > I have received a number of bug reports about plsh choking on > Windows-style line endings. The problem is that the user uses some > Windows-based tool or other to execute an SQL command line this: > > CREATE FUNCTION foo() RETURNS something<CR><LF> > LANGUAGE plsh<CR><LF> > AS $$<CR><LF> > #!/bin/sh<CR><LF> > <CR><LF> > do something<CR><LF> > do something<CR><LF> > $$;<CR><LF> > > which (apparently, I don't have Windows handy) creates a function with > the prosrc body of > > '<CR><LF> > #!/bin/sh<CR><LF> > <CR><LF> > do something<CR><LF> > do something<CR><LF> > ' > > But executing this fails because Unix shells reject<CR> characters in > inappropriate places as syntax errors. > > I don't know how to handle that. It would be unfortunate to have the > behavior of a function depend on the kind of client used to create or > modify it. Could you strip the CRs? Either at CREATE FUNCTION time, or when the function is executed. - Heikki
On 9/20/12 2:01 AM, Heikki Linnakangas wrote: > Could you strip the CRs? Either at CREATE FUNCTION time, or when the > function is executed. It has been proposed that the plsh handler should strip the CRs before execution. But I don't think that is a correct solution, because that is user data which could be relevant. It could be the case, for example, that plsh is run on a Windows host with a particular shell that requires CRLF line endings.
On 09/20/2012 09:12 AM, Peter Eisentraut wrote: > On 9/20/12 2:01 AM, Heikki Linnakangas wrote: >> Could you strip the CRs? Either at CREATE FUNCTION time, or when the >> function is executed. > It has been proposed that the plsh handler should strip the CRs before > execution. But I don't think that is a correct solution, because that > is user data which could be relevant. It could be the case, for > example, that plsh is run on a Windows host with a particular shell that > requires CRLF line endings. > > I confess I find it hard to take plsh terribly seriously, it just seems like a bad solution for practically any problem, when compared with, say, plperlu, and it seems so Unixish that the very idea of using it at all on Windows strikes me as crazy. Having said that, and devoted all of 5 minutes to looking at the code, I suggest that you make two changes: if (sourcecode[0] == '\n') sourcecode++; would become: if (sourcecode[0] == '\n' || (sourcecode[0] == '\r' && sourcecode[1] == '\n')) sourcecode++; and len = strcspn(rest, "\n"); would become: len = strcspn(rest, "\r\n"); Also, there is probably a pretty good case for opening the temp file using PG_BINARY_W rather than "w" so that Windows doesn'tproduce extra CRs for you (see recent discussion of this w.r.t. pg_upgrade). cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > On 09/20/2012 09:12 AM, Peter Eisentraut wrote: >> It has been proposed that the plsh handler should strip the CRs before >> execution. But I don't think that is a correct solution, because that >> is user data which could be relevant. It could be the case, for >> example, that plsh is run on a Windows host with a particular shell that >> requires CRLF line endings. > I confess I find it hard to take plsh terribly seriously, Perhaps, but the general problem remains, for any language whose interpreter is unforgiving about line endings. However, I think the problem is insoluble as Peter has stated it. His original complaint was > ... It would be unfortunate to have the > behavior of a function depend on the kind of client used to create or > modify it. but I don't see a way to reconcile that with the notion that CRs might be user data. Either you think they're significant, in which case you should retain the function body as presented, or you think they aren't, in which case you might as well strip them. I lean to the position that you're best off stripping them, given that the lines are to be fed to a Unix shell. The argument about some other shell possibly requiring them seems pretty tenuous. (IOW, I think that the PL-specific code is exactly the place to be making such decisions. I don't want Postgres' generic function support making any decisions about whether CRs in function bodies are significant, because that decision could be language-specific.) regards, tom lane
On 09/20/2012 03:34 PM, Tom Lane wrote: > Andrew Dunstan <andrew@dunslane.net> writes: >> On 09/20/2012 09:12 AM, Peter Eisentraut wrote: >>> It has been proposed that the plsh handler should strip the CRs before >>> execution. But I don't think that is a correct solution, because that >>> is user data which could be relevant. It could be the case, for >>> example, that plsh is run on a Windows host with a particular shell that >>> requires CRLF line endings. >> I confess I find it hard to take plsh terribly seriously, > Perhaps, but the general problem remains, for any language whose > interpreter is unforgiving about line endings. plsh is the only one I know of that writes the script out to a temp file, and it opens that file on Windows in text mode (if I'm looking at the right source), which will convert LF into CRLF. Hence my suggestion that he change that. Of course, if the function body actually has embedded CRs then there is a different set of problems. cheers andrew