Mike Toews wrote:
> Hi all,
>
> I have some records that have some multiple lines within a single text
> field. (On top of that, I think I have mixed DOS/UNIX line endings too).
> I'm looking for two functions which can operate on a single field:
>
> 1. number of lines
A few different ways, none of which are particularly pretty:
SELECT length(regexp_replace(inputstr, E'[^\\n]', '', 'g'));
SELECT count(1) from regexp_split_to_table(inputstr, E'\\n');
CREATE LANGUAGE plpythonu;
CREATE OR REPLACE FUNCTION nlines(s VARCHAR) RETURNS INTEGER AS $$
return len(s) - len(s.replace('\n',''))
$$ LANGUAGE 'plpythonu' IMMUTABLE;
SELECT nlines(inputstr);
Note that using \n should be OK so long as you don't have any files with
Mac line endings. It'll be fine for both UNIX and DOS line endings.
If you really need it to be fast, consider writing a small C function to
do the job. I'd consider an SQL-visible prototype like:
countchar(inputstr VARCHAR, findchar CHAR) RETURNS INTEGER
You might even want to send a patch in, just in case the dev team want
to include it as a utility function.
> 2. a way to select the first line or to trim to the first line only
> (the normal trim function doesn't appear to do this)
One way, again probably not the fastest:
SELECT (regexp_split_to_array(inputstr, E'\\n'))[1]
Note the extra set of parentheses. You might also want to trim() off any
trailing \r in case of DOS line endings.
A little C function that copied the input only up to the first newline
would instead probably be the fastest. It'd also let you easily strip
the trailing \r if any was present.
--
Craig Ringer