Re: string_to_array with empty input - Mailing list pgsql-general
From | Sam Mason |
---|---|
Subject | Re: string_to_array with empty input |
Date | |
Msg-id | 20090401140005.GD12225@frubble.xen.chris-lamb.co.uk Whole thread Raw |
In response to | Re: string_to_array with empty input (Greg Stark <stark@enterprisedb.com>) |
List | pgsql-general |
On Tue, Mar 31, 2009 at 05:08:45PM +0100, Greg Stark wrote: > Both interpretations are clearly consistent but it depends on whether > you think it's a bunch of text strings concatenated together or if > it's a list of objects. > > The example of string_to_array('',',')::int[] is relevant to this > point. The whole "there's one empty element" only makes sense if > you're thinking in terms of string processing. If it's a list of any > other kind of object it probably doesn't make sense; you can't say > there's one empty integer or one empty composite object or one empty > anything else. I think this is about the only sensible option, but my reasoning is somewhat different. My original intuition was that array_to_string and string_to_array should be (perfect) inverses of each other. Unfortunately I can't see any way to get this to happen; zero length arrays or NULL elements in the array mean than array_to_string isn't injective. This means that the composition of the two functions won't result in an injective function and my original premise is false. Note that as far as I can tell string_to_array is injective. I'm assuming that the delimiter won't appear as part of an element of the array; e.g. an array of integers and space as a delimiter is OK, but using the same delimiter with unconstrained text is not OK, a blank delimiter is never OK as it's always part of a string. "Injective" means there exists more than one array that encodes to the same string. The examples being how do you sanely encode '{}' and '{NULL}' in a unique way; '{""}' is a bad example because it's just an artifact of how strings are represented. The complications needed to allow this to happen would make it a very similar function as the array's normal output_function function and hence wouldn't serve a useful purpose. All of this implies that we have to make a compromise somewhere. The semantics that most closely match the existing behaviour would be; for array_to_string: 1) remove NULL values from input array 2) call output_function on remaining elements 3) intersperse[1] the delimiter between the remaining elements 4) concatenate the resulting array for string_to_array: 1) check if input is zero length; return empty array 2) split array based on delimiter and return Note that both functions are STRICT; i.e. a NULL for either parameter should cause the function to return NULL. Arguably in string_to_array it could be non-strict if the input string is empty, but it's probably worth keeping it strict to simplify the semantics. Here are some examples: array_to_string('{}'::TEXT[],',') => '' array_to_string('{NULL}'::TEXT[],',') => '' array_to_string('{NULL,NULL}'::TEXT[],',') => '' array_to_string('{a,NULL}'::TEXT[],',') => 'a' array_to_string('{NULL,a}'::TEXT[],',') => 'a' array_to_string('{a,b}'::TEXT[],',') => 'a,b' array_to_string('{a,NULL,b}'::TEXT[],',') => 'a,b' string_to_array('',',') => '{}' string_to_array(' ',',') => '{" "}' string_to_array(',',',') => '{"",""}' string_to_array('a',',') => '{a}' string_to_array('a,',',') => '{a,""}' string_to_array(',a',',') => '{"",a}' string_to_array('a,b',',') => '{a,b}' My thinking before was that it should be doing: string_to_array('',',') => '{""}' instead, but I now think that Greg has a point and these are nicer/more generally useful semantics. Hum, that all got a bit more verbose than I was expecting. Ah well, I hope it's somewhat useful. -- Sam http://samason.me.uk/ [1] as in the intersperse function in Haskell http://www.haskell.org/onlinereport/list.html#sect17.3 intersperse "#" ["a", "bar"] == ["a", "#", "bar"] note that here we're working with arrays of string, rather than arrays of characters as in the report.
pgsql-general by date: