Re: [GENERAL] string_to_array with empty input - Mailing list pgsql-hackers

From Sam Mason
Subject Re: [GENERAL] string_to_array with empty input
Date
Msg-id 20090401143850.GA11167@frubble.xen.chris-lamb.co.uk
Whole thread Raw
In response to Re: string_to_array with empty input  (Greg Stark <stark@enterprisedb.com>)
List pgsql-hackers
[ oops, forgot to send this to -hackers before ]

On Tue, Mar 31, 2009 at 05:08:45PM +0100, Greg Stark wrote:
> Both interpretations are clearly consistent but it depends on whether
> you think it's a bunch of text strings concatenated together or if
> it's a list of objects.
> 
> The example of string_to_array('',',')::int[] is relevant to this
> point. The whole "there's one empty element" only makes sense if
> you're thinking in terms of string processing. If it's a list of any
> other kind of object it probably doesn't make sense; you can't say
> there's one empty integer or one empty composite object or one empty
> anything else.

I think this is about the only sensible option, but my reasoning is
somewhat different.

My original intuition was that array_to_string and string_to_array
should be (perfect) inverses of each other.  Unfortunately I can't see
any way to get this to happen; zero length arrays or NULL elements
in the array mean than array_to_string isn't injective.  This means
that the composition of the two functions won't result in an injective
function and my original premise is false.  Note that as far as I can
tell string_to_array is injective.  I'm assuming that the delimiter
won't appear as part of an element of the array; e.g. an array of
integers and space as a delimiter is OK, but using the same delimiter
with unconstrained text is not OK, a blank delimiter is never OK as it's
always part of a string.

"Injective" means there exists more than one array that encodes to the
same string.  The examples being how do you sanely encode '{}' and
'{NULL}' in a unique way; '{""}' is a bad example because it's just
an artifact of how strings are represented.  The complications needed
to allow this to happen would make it a very similar function as the
array's normal output_function function and hence wouldn't serve a
useful purpose.  All of this implies that we have to make a compromise
somewhere.

The semantics that most closely match the existing behaviour would be;
for array_to_string:
 1) remove NULL values from input array 2) call output_function on remaining elements 3) intersperse[1] the delimiter
betweenthe remaining elements 4) concatenate the resulting array
 

for string_to_array:
 1) check if input is zero length; return empty array 2) split array based on delimiter and return

Note that both functions are STRICT; i.e. a NULL for either parameter
should cause the function to return NULL.  Arguably in string_to_array
it could be non-strict if the input string is empty, but it's probably
worth keeping it strict to simplify the semantics.

Here are some examples:
 array_to_string('{}'::TEXT[],',')          => '' array_to_string('{NULL}'::TEXT[],',')      => ''
array_to_string('{NULL,NULL}'::TEXT[],',')=> '' array_to_string('{a,NULL}'::TEXT[],',')    => 'a'
array_to_string('{NULL,a}'::TEXT[],',')   => 'a' array_to_string('{a,b}'::TEXT[],',')       => 'a,b'
array_to_string('{a,NULL,b}'::TEXT[],',') => 'a,b'
 
 string_to_array('',',')     => '{}' string_to_array(' ',',')    => '{" "}' string_to_array(',',',')    => '{"",""}'
string_to_array('a',',')   => '{a}' string_to_array('a,',',')   => '{a,""}' string_to_array(',a',',')   => '{"",a}'
string_to_array('a,b',',') => '{a,b}'
 

My thinking before was that it should be doing:
 string_to_array('',',')     => '{""}'

instead, but I now think that Greg has a point and these are nicer/more
generally useful semantics.

Hum, that all got a bit more verbose than I was expecting.  Ah well, I
hope it's somewhat useful.

--  Sam  http://samason.me.uk/[1] as in the intersperse function in Haskell
http://www.haskell.org/onlinereport/list.html#sect17.3     intersperse "#" ["a", "bar"] == ["a", "#", "bar"]    note
thathere we're working with arrays of string, rather than    arrays of characters as in the report.
 


pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: WIP: transformation hook modules and JSON support
Next
From: "Vincze, Tamas"
Date:
Subject: Patch to speed up pg_dump