Home > mailing lists

Re: EOL characters and multibyte encodings - Mailing list pgsql-hackers

From	Joe Conway
Subject	Re: EOL characters and multibyte encodings
Date	June 21, 2007 19:51:20
Msg-id	467B00E1.7070400@joeconway.com Whole thread Raw
In response to	Re: EOL characters and multibyte encodings (Tom Lane <tgl@sss.pgh.pa.us>)
List	pgsql-hackers

Tree view

Tom Lane wrote:
> Joe Conway <mail@joeconway.com> writes:
>> My first thought on fixing this issue was to simply replace all 
>> instances of '\r' in pg_proc.prosrc with '\n' prior to sending it to the 
>> R parser. As far as I know, any instances of '\r' embedded in a 
>> syntactically valid R statement must be escaped (i.e. literally the 
>> characters "\" and "r"), so that should not be a problem. But I am 
>> concerned about how this potentially plays against multibyte characters. 
>> Is it safe to do this, or do I need to use a mb-aware replace algorithm?
> 
> It's safe, because you'll be dealing with prosrc inside the backend,
> therefore using a backend-legal encoding, and those don't have any ASCII
> aliasing problems (all bytes of an MB character must have high bit set).

Great -- I wasn't sure about that.

> However I dislike doing it exactly that way because line numbers in the
> R script will all get doubled.  Unless R never reports errors in terms
> of line numbers, you'd be better off to either delete the \r characters
> or replace them with spaces.

Good point. But I need to be able to deal with Apple EOLs too -- IIRC 
those can be *only* '\r'. So I guess I need to do a look-ahead whenever 
I run into '\r', see if it is followed by '\n', and then munge the 
string accordingly.

Joe

pgsql-hackers by date:

From: Andrew Dunstan
Date: 21 June 2007, 19:39:51
Subject: Re: EOL characters and multibyte encodings

From: Euler Taveira de Oliveira
Date: 21 June 2007, 21:57:43
Subject: Re: month abreviation

Re: EOL characters and multibyte encodings - Mailing list pgsql-hackers

Previous

Next