Re: string_to_array with empty input - Mailing list pgsql-general

From Sam Mason
Subject Re: string_to_array with empty input
Date
Msg-id 20090401140005.GD12225@frubble.xen.chris-lamb.co.uk
Whole thread Raw
In response to Re: string_to_array with empty input  (Greg Stark <stark@enterprisedb.com>)
List pgsql-general
On Tue, Mar 31, 2009 at 05:08:45PM +0100, Greg Stark wrote:
> Both interpretations are clearly consistent but it depends on whether
> you think it's a bunch of text strings concatenated together or if
> it's a list of objects.
>
> The example of string_to_array('',',')::int[] is relevant to this
> point. The whole "there's one empty element" only makes sense if
> you're thinking in terms of string processing. If it's a list of any
> other kind of object it probably doesn't make sense; you can't say
> there's one empty integer or one empty composite object or one empty
> anything else.

I think this is about the only sensible option, but my reasoning is
somewhat different.

My original intuition was that array_to_string and string_to_array
should be (perfect) inverses of each other.  Unfortunately I can't see
any way to get this to happen; zero length arrays or NULL elements
in the array mean than array_to_string isn't injective.  This means
that the composition of the two functions won't result in an injective
function and my original premise is false.  Note that as far as I can
tell string_to_array is injective.  I'm assuming that the delimiter
won't appear as part of an element of the array; e.g. an array of
integers and space as a delimiter is OK, but using the same delimiter
with unconstrained text is not OK, a blank delimiter is never OK as it's
always part of a string.

"Injective" means there exists more than one array that encodes to the
same string.  The examples being how do you sanely encode '{}' and
'{NULL}' in a unique way; '{""}' is a bad example because it's just
an artifact of how strings are represented.  The complications needed
to allow this to happen would make it a very similar function as the
array's normal output_function function and hence wouldn't serve a
useful purpose.  All of this implies that we have to make a compromise
somewhere.

The semantics that most closely match the existing behaviour would be;
for array_to_string:

  1) remove NULL values from input array
  2) call output_function on remaining elements
  3) intersperse[1] the delimiter between the remaining elements
  4) concatenate the resulting array

for string_to_array:

  1) check if input is zero length; return empty array
  2) split array based on delimiter and return

Note that both functions are STRICT; i.e. a NULL for either parameter
should cause the function to return NULL.  Arguably in string_to_array
it could be non-strict if the input string is empty, but it's probably
worth keeping it strict to simplify the semantics.

Here are some examples:

  array_to_string('{}'::TEXT[],',')          => ''
  array_to_string('{NULL}'::TEXT[],',')      => ''
  array_to_string('{NULL,NULL}'::TEXT[],',') => ''
  array_to_string('{a,NULL}'::TEXT[],',')    => 'a'
  array_to_string('{NULL,a}'::TEXT[],',')    => 'a'
  array_to_string('{a,b}'::TEXT[],',')       => 'a,b'
  array_to_string('{a,NULL,b}'::TEXT[],',')  => 'a,b'

  string_to_array('',',')     => '{}'
  string_to_array(' ',',')    => '{" "}'
  string_to_array(',',',')    => '{"",""}'
  string_to_array('a',',')    => '{a}'
  string_to_array('a,',',')   => '{a,""}'
  string_to_array(',a',',')   => '{"",a}'
  string_to_array('a,b',',')  => '{a,b}'

My thinking before was that it should be doing:

  string_to_array('',',')     => '{""}'

instead, but I now think that Greg has a point and these are nicer/more
generally useful semantics.

Hum, that all got a bit more verbose than I was expecting.  Ah well, I
hope it's somewhat useful.

--
  Sam  http://samason.me.uk/

 [1] as in the intersperse function in Haskell
       http://www.haskell.org/onlinereport/list.html#sect17.3
       intersperse "#" ["a", "bar"] == ["a", "#", "bar"]
     note that here we're working with arrays of string, rather than
     arrays of characters as in the report.

pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: Indexing unknown words with Tsearch2
Next
From: CM J
Date:
Subject: Re: Postgres: Packaging & Server Startup