Thread: Multi-line text fields

Multi-line text fields

From
Mike Toews
Date:
Hi all,

I have some records that have some multiple lines within a single text 
field. (On top of that, I think I have mixed DOS/UNIX line endings too). 
I'm looking for two functions which can operate on a single field:
  1. number of lines  2. a way to select the first line or to trim to the first line only     (the normal trim function
doesn'tappear to do this)
 

Thanks in advance.
-Mike



Re: Multi-line text fields

From
Craig Ringer
Date:
Mike Toews wrote:
> Hi all,
> 
> I have some records that have some multiple lines within a single text 
> field. (On top of that, I think I have mixed DOS/UNIX line endings too). 
> I'm looking for two functions which can operate on a single field:
> 
>   1. number of lines

A few different ways, none of which are particularly pretty:

SELECT length(regexp_replace(inputstr, E'[^\\n]', '', 'g'));

SELECT count(1) from regexp_split_to_table(inputstr, E'\\n');

CREATE LANGUAGE plpythonu;
CREATE OR REPLACE FUNCTION nlines(s VARCHAR) RETURNS INTEGER AS $$
return len(s) - len(s.replace('\n',''))
$$ LANGUAGE 'plpythonu' IMMUTABLE;
SELECT nlines(inputstr);

Note that using \n should be OK so long as you don't have any files with 
Mac line endings. It'll be fine for both UNIX and DOS line endings.

If you really need it to be fast, consider writing a small C function to 
do the job. I'd consider an SQL-visible prototype like:

countchar(inputstr VARCHAR, findchar CHAR) RETURNS INTEGER

You might even want to send a patch in, just in case the dev team want 
to include it as a utility function.

>   2. a way to select the first line or to trim to the first line only
>      (the normal trim function doesn't appear to do this)

One way, again probably not the fastest:

SELECT (regexp_split_to_array(inputstr, E'\\n'))[1]

Note the extra set of parentheses. You might also want to trim() off any 
trailing \r in case of DOS line endings.

A little C function that copied the input only up to the first newline 
would instead probably be the fastest. It'd also let you easily strip 
the trailing \r if any was present.

--
Craig Ringer


Resp.: Multi-line text fields

From
"Osvaldo Kussama"
Date:
2008/9/23, Craig Ringer <craig@postnewspapers.com.au>:
> Mike Toews wrote:
>> Hi all,
>>
>> I have some records that have some multiple lines within a single text
>> field. (On top of that, I think I have mixed DOS/UNIX line endings too).
>> I'm looking for two functions which can operate on a single field:
>>
>>   1. number of lines
...
>>   2. a way to select the first line or to trim to the first line only
>>      (the normal trim function doesn't appear to do this)
>
> One way, again probably not the fastest:
>
> SELECT (regexp_split_to_array(inputstr, E'\\n'))[1]
>
> Note the extra set of parentheses. You might also want to trim() off any
> trailing \r in case of DOS line endings.
>
> A little C function that copied the input only up to the first newline
> would instead probably be the fastest. It'd also let you easily strip
> the trailing \r if any was present.
>


Use substring(string from pattern):
http://www.postgresql.org/docs/current/interactive/functions-string.html

bdteste=# SELECT substring(E'foo\nbar\nbaz' FROM E'^((.)+?)\n');substring
-----------foo
(1 registro)

Osvaldo