Thread: string_to_array with empty input

string_to_array with empty input

From
Steve Crawford
Date:
I have a query that converts a string to an array with the
string_to_array function. Sometimes the input is an empty string (not a
null, but a string of zero-length). I had expected the result to be a
one-element array with an empty string as the first and only element but
instead it returned null. I looked at the docs and didn't find the
observed behavior documented.

I ran some tests:
string_to_array(null,',') -- expected and got null
string_to_array('a',',') -- expected and got one-element array {a}
string_to_array('',',') -- expected consistency with above (one-element
array with empty string) but got null

So I looked to see how array_to_string deals with such arrays:
array_to_string('{""}'::text[],',')  -- expected and got an empty string

I realize that array_to_string is not a perfect inverse of
string_to_array - especially due to the fact that the array-index is not
preserved. But it seems that the array data should match.

Is this a bug or am I missing the logic behind this behavior? As it
stands, it appears I will have to wrap the function in a case statement
as a work-around unless there is a better solution.

Cheers,
Steve


Re: string_to_array with empty input

From
Tom Lane
Date:
Steve Crawford <scrawford@pinpointresearch.com> writes:
> I have a query that converts a string to an array with the
> string_to_array function. Sometimes the input is an empty string (not a
> null, but a string of zero-length). I had expected the result to be a
> one-element array with an empty string as the first and only element but
> instead it returned null. I looked at the docs and didn't find the
> observed behavior documented.

The behavior is pretty intentional according to the source code:

    /* return NULL for empty input string */
    if (inputstring_len < 1)
    {
        text_position_cleanup(&state);
        PG_RETURN_NULL();
    }

I agree this seems less than consistent though, especially seeing
that you *don't* get a null for a zero-length separator, which if
anything is a more poorly defined case.

I doubt it'd be a good idea to back-patch a change for this,
but I could see altering the definition for 8.4.

Does anyone want to argue for keeping it the same?  Or perhaps
argue that a zero-element array is a more sensible result than
a one-element array with one empty string?  (It doesn't seem
like it to me, but maybe somebody thinks so.)

            regards, tom lane

Re: string_to_array with empty input

From
Justin
Date:
Tom Lane wrote:
> I agree this seems less than consistent though, especially seeing
> that you *don't* get a null for a zero-length separator, which if
> anything is a more poorly defined case.
>
> I doubt it'd be a good idea to back-patch a change for this,
> but I could see altering the definition for 8.4.
>
> Does anyone want to argue for keeping it the same?  Or perhaps
> argue that a zero-element array is a more sensible result than
> a one-element array with one empty string?  (It doesn't seem
> like it to me, but maybe somebody thinks so.)
>
>             regards, tom lane
>

I  like the array to contain single zero length string.   A  string was
passed in  although empty,  its still a string not a NULL.

Returning an empty array implies nothing was passed to the function
although something was. That seems kinda odd to me also, give  back what
was sent in broken into an array.

I use this and split_part allot in our database to break apart  part numbers


Re: [HACKERS] string_to_array with empty input

From
"David E. Wheeler"
Date:
On Mar 30, 2009, at 8:26 PM, Tom Lane wrote:

> Does anyone want to argue for keeping it the same?  Or perhaps
> argue that a zero-element array is a more sensible result than
> a one-element array with one empty string?  (It doesn't seem
> like it to me, but maybe somebody thinks so.)

Hrm. There seems to be some disagreement about this among some
languages:

% perl -le '@r = split /-/, ""; print length @r; print qq{"$r[0]"}'
1
""

% irb
 >> puts ''.split('-')
=> nil

So Perl returns a single element as Steve had been expecting, while
Ruby returns nil. I'm used to the Perl way, but I guess there's room
for various interpretations, including the current implementation,
with which Ruby would seem to agree.

Best,

David

Re: string_to_array with empty input

From
Greg Stark
Date:
Sorry for top-posting--blame apple.

Hm my first instinct was indeed to make it a zero-length array. I was
thinking of the input as a "list" and surely there are no elements in
a list which empty. I had to think a while until a length-1 array made
sense.

I suppose the thinking was string_to_array is the inverse of an
array_to_string operation then there are multiple possible answers.
You might have joined a zero length or a singleton array of an empty
string....
and since it's unknown which was the original value null is the right
answer...

I agree that picking an arbitrary choice is going to be more useful in
practice though.

--
Greg


On 30 Mar 2009, at 23:26, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Steve Crawford <scrawford@pinpointresearch.com> writes:
>> I have a query that converts a string to an array with the
>> string_to_array function. Sometimes the input is an empty string
>> (not a
>> null, but a string of zero-length). I had expected the result to be a
>> one-element array with an empty string as the first and only
>> element but
>> instead it returned null. I looked at the docs and didn't find the
>> observed behavior documented.
>
> The behavior is pretty intentional according to the source code:
>
>    /* return NULL for empty input string */
>    if (inputstring_len < 1)
>    {
>        text_position_cleanup(&state);
>        PG_RETURN_NULL();
>    }
>
> I agree this seems less than consistent though, especially seeing
> that you *don't* get a null for a zero-length separator, which if
> anything is a more poorly defined case.
>
> I doubt it'd be a good idea to back-patch a change for this,
> but I could see altering the definition for 8.4.
>
> Does anyone want to argue for keeping it the same?  Or perhaps
> argue that a zero-element array is a more sensible result than
> a one-element array with one empty string?  (It doesn't seem
> like it to me, but maybe somebody thinks so.)
>
>            regards, tom lane
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] string_to_array with empty input

From
Brendan Jurd
Date:
On Tue, Mar 31, 2009 at 2:26 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Does anyone want to argue for keeping it the same?  Or perhaps
> argue that a zero-element array is a more sensible result than
> a one-element array with one empty string?  (It doesn't seem
> like it to me, but maybe somebody thinks so.)
>

My first thought was that it should be a zero-element array, because
then the string_to_array() behaviour would conform to the notion that
it returns an array with 1 element per string fragment bounded by the
delimiter.

However, I note that if you provide an empty delimiter, or one which
doesn't occur anywhere in the source string, you get an array with one
element, being the entire source string.

# select string_to_array('1-2-3', '-');
 {1,2,3}

# select string_to_array('1-2-3', 'x');
 {1-2-3}

Given this behaviour, I would argue for consistent treatment for a
zero-length source string: it should return an array with one element,
being the entire source string, whenever there is no string splitting
to take place.  And if the source string happens to be zero-length,
then the return value would be as expected by the OP.

Cheers,
BJ

Re: [HACKERS] string_to_array with empty input

From
Sam Mason
Date:
On Tue, Mar 31, 2009 at 05:45:33PM +1100, Brendan Jurd wrote:
> On Tue, Mar 31, 2009 at 2:26 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Does anyone want to argue for keeping it the same?  Or perhaps
> > argue that a zero-element array is a more sensible result than
> > a one-element array with one empty string?  (It doesn't seem
> > like it to me, but maybe somebody thinks so.)
>
> Given this behaviour, I would argue for consistent treatment for a
> zero-length source string: it should return an array with one element,
> being the entire source string, whenever there is no string splitting
> to take place.  And if the source string happens to be zero-length,
> then the return value would be as expected by the OP.

I'd agree with this as well, just to be verbose:

  string_to_array(NULL,',')   =>  NULL
  string_to_array('',',')     =>  {""}
  string_to_array('a',',')    =>  {a}
  string_to_array('a,',',')   =>  {a,""}
  string_to_array('a,b',',')  =>  {a,b}

However, I can see (nasty and hacky) reasons why the current behaviour
is there.  You'd get the following error if this change was accepted:

  string_to_array('',',')::INT[]  => invalid input syntax for integer: ""

Which you don't get at the moment; although you do currently get it in
other common cases such as:

  string_to_array('1,',',')::INT[]

If you want backwards compatible behaviour you could always bung a
NULLIF in there:

  string_to_array(NULLIF('',''),',')::INT[]  => NULL

To aid porting of code and general utility, I'd be tempted to add a pair
of functions like:

  CREATE FUNCTION array_filter_blanks(TEXT[]) RETURNS TEXT[]
      LANGUAGE SQL IMMUTABLE STRICT AS $$
    ARRAY(SELECT s FROM unnest($1) AS s WHERE s <> '') $$;

  CREATE FUNCTION array_nullif(ANYARRAY,ANYELEMENT) RETURNS ANYARRAY
      LANGUAGE SQL IMMUTABLE AS $$
    ARRAY(SELECT NULLIF(s,$2) FROM unnest($1) AS s) $$;

Although, this is obviously going above and beyond what you originally
asked for.

--
  Sam  http://samason.me.uk/

Re: string_to_array with empty input

From
Greg Stark
Date:
On Tue, Mar 31, 2009 at 7:45 AM, Brendan Jurd <direvus@gmail.com> wrote:
> My first thought was that it should be a zero-element array, because
> then the string_to_array() behaviour would conform to the notion that
> it returns an array with 1 element per string fragment bounded by the
> delimiter.
>
> However, I note that if you provide an empty delimiter, or one which
> doesn't occur anywhere in the source string, you get an array with one
> element, being the entire source string.

Yeah, actually the more I think about it the more I think it would be
strange for most uses to get a singleton array for this case.

What do you really expect to be returned for things like

select count_elements(string_to_array('butter,tea,milk',','))
select count_elements(string_to_array('butter,tea',','))
select count_elements(string_to_array('butter',','))
select count_elements(string_to_array('',','))

...

I could see lists like this being stored when people gather data using
a web form or something and don't want to bother normalizing some
trivial bit of data collection which they'll never individually, but
have to unnest the list for some display purposes.

The cases where it makes more sense to return a singleton array are
going to be parsing things like /etc/password where there are specific
meanings for each element, but when some are optional. I can't think
of any examples offhand though.

--
greg

Re: string_to_array with empty input

From
Greg Stark
Date:
On Tue, Mar 31, 2009 at 3:42 PM, Sam Mason <sam@samason.me.uk> wrote:
>
>  string_to_array('',',')::INT[]  => invalid input syntax for integer: ""

Oof. That's a good point.


--
greg

Re: string_to_array with empty input

From
Sam Mason
Date:
On Tue, Mar 31, 2009 at 03:43:37PM +0100, Greg Stark wrote:
> On Tue, Mar 31, 2009 at 7:45 AM, Brendan Jurd <direvus@gmail.com> wrote:
> > My first thought was that it should be a zero-element array, because
> > then the string_to_array() behaviour would conform to the notion that
> > it returns an array with 1 element per string fragment bounded by the
> > delimiter.
> >
> > However, I note that if you provide an empty delimiter, or one which
> > doesn't occur anywhere in the source string, you get an array with one
> > element, being the entire source string.
>
> Yeah, actually the more I think about it the more I think it would be
> strange for most uses to get a singleton array for this case.

Really? I think it's strange not to!

> What do you really expect to be returned for things like
>
> select count_elements(string_to_array('butter,tea,milk',','))
> select count_elements(string_to_array('butter,tea',','))
> select count_elements(string_to_array('butter',','))
> select count_elements(string_to_array('',','))

I'd expect 3,2,1 and 1.

That's also a disingenuous example; what would you expect back from:

  select count_elements(string_to_array('butter,,milk',','))

I think the semantics you want is what you'd get from:

  array_filter_blanks(string_to_array($1,$2))

where I defined "array_filter_blanks" in my previous post.

--
  Sam  http://samason.me.uk/

Re: [HACKERS] string_to_array with empty input

From
Tom Lane
Date:
Greg Stark <stark@enterprisedb.com> writes:
> On Tue, Mar 31, 2009 at 3:42 PM, Sam Mason <sam@samason.me.uk> wrote:
>> string_to_array('',',')::INT[] �=> invalid input syntax for integer: ""

> Oof. That's a good point.

Isn't that an argument in favor of the zero-size-array definition?

            regards, tom lane

Re: [HACKERS] string_to_array with empty input

From
"David E. Wheeler"
Date:
On Mar 31, 2009, at 8:34 AM, Sam Mason wrote:

>> What do you really expect to be returned for things like
>>
>> select count_elements(string_to_array('butter,tea,milk',','))

3 = {butter,tea,milk}

>> select count_elements(string_to_array('butter,tea',','))

2 = {butter,tea}

>> select count_elements(string_to_array('butter',','))

1 = {butter}

>> select count_elements(string_to_array('',','))

1 = ARRAY['']

> I'd expect 3,2,1 and 1.
>
> That's also a disingenuous example; what would you expect back from:
>
>  select count_elements(string_to_array('butter,,milk',','))

3 = ARRAY['butter', '', 'milk']

> I think the semantics you want is what you'd get from:
>
>  array_filter_blanks(string_to_array($1,$2))
>
> where I defined "array_filter_blanks" in my previous post.

Yeah, if I wanted something like that in Perl, I'd do:

   my @stuff = grep { $_ } split /,/, $string;

In no case would I ever expect a NULL, however, unless I was trying to
split on NULL.

NULL = string_to_array(NULL, ',');

Best,

David


Re: string_to_array with empty input

From
Greg Stark
Date:
On Tue, Mar 31, 2009 at 4:34 PM, Sam Mason <sam@samason.me.uk> wrote:
>
> That's also a disingenuous example; what would you expect back from:
>
>  select count_elements(string_to_array('butter,,milk',','))
>
> I think the semantics you want is what you'd get from:
>
>  array_filter_blanks(string_to_array($1,$2))
>
> where I defined "array_filter_blanks" in my previous post.

I think you're still thinking from a programmers point of view
thinking about string processing.

From a functional point of view, if I give you a comma separated list
of items which is an empty string and ask you how many items are in it
nobody would normally say there's one empty thing in the list.

Both interpretations are clearly consistent but it depends on whether
you think it's a bunch of text strings concatenated together or if
it's a list of objects.

The example of string_to_array('',',')::int[] is relevant to this
point. The whole "there's one empty element" only makes sense if
you're thinking in terms of string processing. If it's a list of any
other kind of object it probably doesn't make sense; you can't say
there's one empty integer or one empty composite object or one empty
anything else.

--
greg

Re: string_to_array with empty input

From
justin
Date:
Sam Mason wrote:
> I'd expect 3,2,1 and 1.
>
> That's also a disingenuous example; what would you expect back from:
>
>   select count_elements(string_to_array('butter,,milk',','))
>
> I think the semantics you want is what you'd get from:
>
>   array_filter_blanks(string_to_array($1,$2))
>
> where I defined "array_filter_blanks" in my previous post.
>

I agree the function should not be changing values passed.
Stripping/Dropping empty strings is changing what was passed into the
function instead breaking it into a array.



Re: [HACKERS] string_to_array with empty input

From
justin
Date:
Sam Mason wrote:
>
>   string_to_array('',',')::INT[]  => invalid input syntax for integer: ""
>
> Which you don't get at the moment; although you do currently get it in
> other common cases such as:
>
>   string_to_array('1,',',')::INT[]
>
> If you want backwards compatible behaviour you could always bung a
> NULLIF in there:
>
>   string_to_array(NULLIF('',''),',')::INT[]  => NULL
>
>
>

But consider  this fails also

select string_to_array('1, , 3', ',' )::int[] => ERROR:  invalid input
syntax for integer: " "

yet this works

select string_to_array('1, 2, 3',',')::int[]

Re: string_to_array with empty input

From
Greg Stark
Date:
On Tue, Mar 31, 2009 at 5:48 PM, justin <justin@emproshunts.com> wrote:
>
> But consider  this fails also
>
> select string_to_array('1, , 3', ',' )::int[] => ERROR:  invalid input
> syntax for integer: " "
>
> yet this works
>
> select string_to_array('1, 2, 3',',')::int[]


Sure, and the analogous pair string_to_array(' ',',')::int[] and
string_to_array('1 ',',')::int[] behave similarly.

The point is that '' could represent no items or one empty string. We
get to pick which one and in any use case where the string was a list
of objects it's almost certainly intended to be an empty list. And
databases are almost always processing lists of things.

I think the only use case where you want it to be a singleton list of
an empty string is when you're doing string parsing such as building a
lexer or something like that, which is isn't a typical use for sql
code.

--
greg

Re: string_to_array with empty input

From
justin
Date:
Greg Stark wrote: <blockquote cite="mid:4136ffa0903311006v15eb3a25xf3fa94a8ca089fd6@mail.gmail.com" type="cite"><pre
wrap="">OnTue, Mar 31, 2009 at 5:48 PM, justin <a class="moz-txt-link-rfc2396E"
href="mailto:justin@emproshunts.com"><justin@emproshunts.com></a>wrote: </pre><blockquote type="cite"><pre
wrap="">Butconsider  this fails also
 

select string_to_array('1, , 3', ',' )::int[] => ERROR:  invalid input
syntax for integer: " "

yet this works

select string_to_array('1, 2, 3',',')::int[]   </pre></blockquote><pre wrap="">

Sure, and the analogous pair string_to_array(' ',',')::int[] and
string_to_array('1 ',',')::int[] behave similarly.

The point is that '' could represent no items or one empty string. We
get to pick which one and in any use case where the string was a list
of objects it's almost certainly intended to be an empty list. And
databases are almost always processing lists of things.

I think the only use case where you want it to be a singleton list of
an empty string is when you're doing string parsing such as building a
lexer or something like that, which is isn't a typical use for sql
code. </pre></blockquote><br /> I disagree.   Casting a string to something else can be a very error prone to begin
with.<br /> Having  string_to_array() to deal with that possibility  is out of its scope IMHO.<br /><br /> Consider
this. I have intelligent part numbers  that need to be split apart to simplify searching  and do math with. <br /><br
/>string_to_array(' F-2500-50 ', '-' ) ::int[]<br /><br /> Still fails with an error as expected.  what is the
differencebetween ' '  and 'F'  <br /><br /> So before doing any thing a test needs to be done to verify the contents,
soit can be casted to something else.<br /> 

Re: string_to_array with empty input

From
Greg Stark
Date:
On Tue, Mar 31, 2009 at 6:44 PM, justin <justin@emproshunts.com> wrote:
>
> Consider this.  I have intelligent part numbers  that need to be split apart
> to simplify searching  and do math with.
>
> string_to_array(' F-2500-50 ', '-' ) ::int[]

Yeah, that's what I've said about three times. If you're building a
parser and think of it in terms of string parsing then yes, a
singletone array makes sense.

> Still fails with an error as expected.  what is the difference between ' '
> and 'F'

I don't know what you're talking about. The question is how many
elements are in ''. No space. And no separators. To repeat for the
last time. If you think in terms of string processing then the answer
1 is reasonable. But if you think it's a list of separate items then
anyone will say that's an empty list and contains no elements.

Nobody has ever suggested filtering out empty elements or dealing
specially with spaces or anything else like that. If you're talking
about that then you've missed the original question.

One last try.

If there's a column called "shopping list" which is a comma-separated
list of items to buy in the store and I store '' in it. How many items
do you think that means you should go shopping for? Do you think
that's one item that should be rejected because it's an empty string?
Or do you think that's an empty list with zero items listed?

What would it look like if it was a zero-length list? You can ask what
would it look like if it was a shopping list of one item called ''.
But I agree both are theoretically consistent, but one is actually
useful in 99% of use cases. The other is only useful in unusual cases.
--
greg

Re: string_to_array with empty input

From
justin
Date:
<font color="#ff0000">This thread being cross posted has made it a bit confusing </font><br /><br /> Greg Stark wrote:
<blockquotecite="mid:4136ffa0903311053x6f80e0bbl384167a71005aece@mail.gmail.com" type="cite"><pre wrap="">
 

Nobody has ever suggested filtering out empty elements or dealing
specially with spaces or anything else like that. If you're talking
about that then you've missed the original question. </pre></blockquote><br /><pre wrap="">"Does anyone want to argue
forkeeping it the same?  Or perhaps
 
argue that a <big><big><u><b>zero-element array</b></u></big></big> is a more sensible result than
a one-element array with one empty string?  (It doesn't seem
like it to me, but maybe somebody thinks so.)"


<small><font color="#ff0000"><big><big><font face="Arial">That means to me dropping empty strings or removing values
thatdon't make sense.  Then the argument begins what values make since to drop.  Just zero length strings or include
stringswith million white spaces???</font></big></big></font></small>
 

</pre><blockquote cite="mid:4136ffa0903311053x6f80e0bbl384167a71005aece@mail.gmail.com" type="cite"><pre wrap="">
One last try.

If there's a column called "shopping list" which is a comma-separated
list of items to buy in the store and I store '' in it. How many items
do you think that means you should go shopping for? Do you think
that's one item that should be rejected because it's an empty string?
Or do you think that's an empty list with zero items listed? </pre></blockquote><font color="#ff0000"><big>It can't be
rejected, Its an Empty shopping list  although a worthless shopping list .   </big></font><br /><blockquote
cite="mid:4136ffa0903311053x6f80e0bbl384167a71005aece@mail.gmail.com"type="cite"><pre wrap="">
 
What would it look like if it was a zero-length list? You can ask what
would it look like if it was a shopping list of one item called ''.
But I agree both are theoretically consistent, but one is actually
useful in 99% of use cases. The other is only useful in unusual cases. </pre></blockquote><br /><font
color="#ff0000">I'mstill confused which one you want here</font><br /> 

Re: string_to_array with empty input

From
Sam Mason
Date:
On Tue, Mar 31, 2009 at 05:08:45PM +0100, Greg Stark wrote:
> Both interpretations are clearly consistent but it depends on whether
> you think it's a bunch of text strings concatenated together or if
> it's a list of objects.
>
> The example of string_to_array('',',')::int[] is relevant to this
> point. The whole "there's one empty element" only makes sense if
> you're thinking in terms of string processing. If it's a list of any
> other kind of object it probably doesn't make sense; you can't say
> there's one empty integer or one empty composite object or one empty
> anything else.

I think this is about the only sensible option, but my reasoning is
somewhat different.

My original intuition was that array_to_string and string_to_array
should be (perfect) inverses of each other.  Unfortunately I can't see
any way to get this to happen; zero length arrays or NULL elements
in the array mean than array_to_string isn't injective.  This means
that the composition of the two functions won't result in an injective
function and my original premise is false.  Note that as far as I can
tell string_to_array is injective.  I'm assuming that the delimiter
won't appear as part of an element of the array; e.g. an array of
integers and space as a delimiter is OK, but using the same delimiter
with unconstrained text is not OK, a blank delimiter is never OK as it's
always part of a string.

"Injective" means there exists more than one array that encodes to the
same string.  The examples being how do you sanely encode '{}' and
'{NULL}' in a unique way; '{""}' is a bad example because it's just
an artifact of how strings are represented.  The complications needed
to allow this to happen would make it a very similar function as the
array's normal output_function function and hence wouldn't serve a
useful purpose.  All of this implies that we have to make a compromise
somewhere.

The semantics that most closely match the existing behaviour would be;
for array_to_string:

  1) remove NULL values from input array
  2) call output_function on remaining elements
  3) intersperse[1] the delimiter between the remaining elements
  4) concatenate the resulting array

for string_to_array:

  1) check if input is zero length; return empty array
  2) split array based on delimiter and return

Note that both functions are STRICT; i.e. a NULL for either parameter
should cause the function to return NULL.  Arguably in string_to_array
it could be non-strict if the input string is empty, but it's probably
worth keeping it strict to simplify the semantics.

Here are some examples:

  array_to_string('{}'::TEXT[],',')          => ''
  array_to_string('{NULL}'::TEXT[],',')      => ''
  array_to_string('{NULL,NULL}'::TEXT[],',') => ''
  array_to_string('{a,NULL}'::TEXT[],',')    => 'a'
  array_to_string('{NULL,a}'::TEXT[],',')    => 'a'
  array_to_string('{a,b}'::TEXT[],',')       => 'a,b'
  array_to_string('{a,NULL,b}'::TEXT[],',')  => 'a,b'

  string_to_array('',',')     => '{}'
  string_to_array(' ',',')    => '{" "}'
  string_to_array(',',',')    => '{"",""}'
  string_to_array('a',',')    => '{a}'
  string_to_array('a,',',')   => '{a,""}'
  string_to_array(',a',',')   => '{"",a}'
  string_to_array('a,b',',')  => '{a,b}'

My thinking before was that it should be doing:

  string_to_array('',',')     => '{""}'

instead, but I now think that Greg has a point and these are nicer/more
generally useful semantics.

Hum, that all got a bit more verbose than I was expecting.  Ah well, I
hope it's somewhat useful.

--
  Sam  http://samason.me.uk/

 [1] as in the intersperse function in Haskell
       http://www.haskell.org/onlinereport/list.html#sect17.3
       intersperse "#" ["a", "bar"] == ["a", "#", "bar"]
     note that here we're working with arrays of string, rather than
     arrays of characters as in the report.

Re: [HACKERS] string_to_array with empty input

From
Robert Haas
Date:
On Tue, Mar 31, 2009 at 10:44 AM, Greg Stark <stark@enterprisedb.com> wrote:
> On Tue, Mar 31, 2009 at 3:42 PM, Sam Mason <sam@samason.me.uk> wrote:
>>
>>  string_to_array('',',')::INT[]  => invalid input syntax for integer: ""
>
> Oof. That's a good point.

+1.  I find this argument much more compelling than anything else
that's been offered up so far.

...Robert

Re: [HACKERS] string_to_array with empty input

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Mar 31, 2009 at 10:44 AM, Greg Stark <stark@enterprisedb.com> wrote:
>> On Tue, Mar 31, 2009 at 3:42 PM, Sam Mason <sam@samason.me.uk> wrote:
>>> string_to_array('',',')::INT[] �=> invalid input syntax for integer: ""
>>
>> Oof. That's a good point.

> +1.  I find this argument much more compelling than anything else
> that's been offered up so far.

Yeah.  It seems to me that if you consider only the case where the array
elements are text, there's a weak preference for considering '' to be a
single empty string; but as soon as you think about any other datatype,
there's a strong preference to consider it a zero-element list.  So I
too have come around to favor the latter interpretation.  Do we have
any remaining holdouts?

            regards, tom lane

Re: [HACKERS] string_to_array with empty input

From
"David E. Wheeler"
Date:
On Apr 1, 2009, at 9:02 AM, Tom Lane wrote:

>> +1.  I find this argument much more compelling than anything else
>> that's been offered up so far.
>
> Yeah.  It seems to me that if you consider only the case where the
> array
> elements are text, there's a weak preference for considering '' to
> be a
> single empty string; but as soon as you think about any other
> datatype,
> there's a strong preference to consider it a zero-element list.  So I
> too have come around to favor the latter interpretation.  Do we have
> any remaining holdouts?

Well, I'd just point out that the return value of string_to_array() is
text[]. Thus, this is not a problem with string_to_array(), but a
casting problem from text[] to int[]. Making string_to_array() return
a NULL for this case to make casting simpler is addressing the problem
in the wrong place, IMHO. If I want to do this in Perl, for example,
I'd do something like this:

my @ints = grep { defined $_ && $_ ne '' } split ',', $string;

So I split the string into an array, and then remove unreasonable
values. This also allows me to set defaults:

my @ints = map { $_ || 0 } split ',', $string;

This ensures that I get the proper number of records in the example of
something like '1,2,,4'.

So I still think that string_to_array('', ',') should return '{""}',
and how casting is handled should be left to the user to flexibly
handle.

That said, I'm not seeing a simple function for modifying an array.
I'd have to write one for each specific case. :-(

Best,

David




Re: [HACKERS] string_to_array with empty input

From
justin
Date:
Tom Lane wrote: <blockquote cite="mid:6569.1238601738@sss.pgh.pa.us" type="cite"><pre wrap="">Robert Haas <a
class="moz-txt-link-rfc2396E"href="mailto:robertmhaas@gmail.com"><robertmhaas@gmail.com></a> writes:
</pre><blockquotetype="cite"><pre wrap="">On Tue, Mar 31, 2009 at 10:44 AM, Greg Stark <a class="moz-txt-link-rfc2396E"
href="mailto:stark@enterprisedb.com"><stark@enterprisedb.com></a>wrote:   </pre><blockquote type="cite"><pre
wrap="">OnTue, Mar 31, 2009 at 3:42 PM, Sam Mason <a class="moz-txt-link-rfc2396E"
href="mailto:sam@samason.me.uk"><sam@samason.me.uk></a>wrote:     </pre><blockquote type="cite"><pre
wrap="">string_to_array('',',')::INT[] => invalid input syntax for integer: ""       </pre></blockquote><pre
wrap="">Oof.That's a good point.     </pre></blockquote></blockquote><pre wrap=""> </pre><blockquote type="cite"><pre
wrap="">+1. I find this argument much more compelling than anything else
 
that's been offered up so far.   </pre></blockquote><pre wrap="">
Yeah.  It seems to me that if you consider only the case where the array
elements are text, there's a weak preference for considering '' to be a
single empty string; but as soon as you think about any other datatype,
there's a strong preference to consider it a zero-element list.  So I
too have come around to favor the latter interpretation.  Do we have
any remaining holdouts?
        regards, tom lane </pre></blockquote><br /> I'm still a hold out,  We are taking a string putting it into a
arraybased on a delimiter.  That is very simple and straight forward.  Yet many argue if we want to cast this into
anotherdata type the function should deal with in limited cases.  <br /><br /> string_to_array('',',')::INT[]  works as
proposed<br /><br /> But <br /> string_to_array(',,,', ',' )::INT[]   Fails <br /> or<br /> string_to_array('1,2,,4',
',')::INT[] Fails .<br /><br /><br /> I'm trying to understand the difference between a empty string to a string with 
manyblank entries between  the delimiter.    <br /> Consider   ',,,,,,'  = ''  once the delimiter is removed .  Yet
Sevenzero length entries were passed.  How is that going to be handled????  <br /><br /> In one case it works and yet
othercases it fails.  This is inconsistent behavior.  Unless all zero length strings are removed or are treated as
NULLs I can't see how casting to another type is going to work.<br /><br /> If zero length strings are treated as NULLs
thiscreates idea that zero length strings are = to NULLs. <br /><br /> The input is a string and the output is text[], 
castingto another data type is error prone and should be handled by the programmer.  <br /> 

Re: [HACKERS] string_to_array with empty input

From
Tom Lane
Date:
"David E. Wheeler" <david@kineticode.com> writes:
> Well, I'd just point out that the return value of string_to_array() is
> text[].

True...

> Thus, this is not a problem with string_to_array(), but a
> casting problem from text[] to int[].

Nonsense.  The question is whether string_to_array is meant to be useful
for lists of anything except text.  I agree you could argue that it
isn't.  But even in the domain of text it's not all that cut-and-dried
whether string_to_array should return array[] or array[''] for empty
input.  So ISTM we're giving up less than we gain by choosing the former.

            regards, tom lane

Re: [HACKERS] string_to_array with empty input

From
"David E. Wheeler"
Date:
On Apr 1, 2009, at 10:09 AM, Tom Lane wrote:

>> Thus, this is not a problem with string_to_array(), but a
>> casting problem from text[] to int[].
>
> Nonsense.  The question is whether string_to_array is meant to be
> useful
> for lists of anything except text.  I agree you could argue that it
> isn't.  But even in the domain of text it's not all that cut-and-dried
> whether string_to_array should return array[] or array[''] for empty
> input.  So ISTM we're giving up less than we gain by choosing the
> former.

Yeah. I'm okay with either, as long as it's consistent. I have a mild
preference for '{""}', but I can live with ARRAY[] instead. As long as
it's not NULL that gets returned.

Best,

David

Re: [HACKERS] string_to_array with empty input

From
"David E. Wheeler"
Date:
On Apr 1, 2009, at 10:05 AM, justin wrote:

> string_to_array('',',')::INT[]  works as proposed
>
> But
> string_to_array(',,,', ',' )::INT[]   Fails
> or
> string_to_array('1,2,,4', ',' )::INT[] Fails .
>
>
> I'm trying to understand the difference between a empty string to a
> string with  many blank entries between  the delimiter.
> Consider   ',,,,,,'  = ''  once the delimiter is removed .  Yet
> Seven zero length entries were passed.  How is that going to be
> handled????

Right, it's making a special case of '', which does seem rather
inconsistent to me.

Best,

David

Re: [HACKERS] string_to_array with empty input

From
Sam Mason
Date:
On Wed, Apr 01, 2009 at 10:23:18AM -0700, David E. Wheeler wrote:
> On Apr 1, 2009, at 10:05 AM, justin wrote:
> >string_to_array('',',')::INT[]  works as proposed
> >
> >But
> >string_to_array(',,,', ',' )::INT[]   Fails
> >or
> >string_to_array('1,2,,4', ',' )::INT[] Fails .
> >
> >
> >I'm trying to understand the difference between a empty string to a
> >string with  many blank entries between  the delimiter.
> >Consider   ',,,,,,'  = ''  once the delimiter is removed .  Yet
> >Seven zero length entries were passed.  How is that going to be
> >handled????
>
> Right, it's making a special case of '', which does seem rather
> inconsistent to me.

Yes it is; but it's a useful special case because it allows:

  string_to_array(array_to_string(col,','),',')

to do the right thing whether it's got zero or more elements in.  With
the current implementation you get a NULL back in the case of zero
elements and the expected array back the rest of the time.

To me, it doesn't really matter whether:

  string_to_array(',', ',' )::INT[]

fails or not; because array_to_string will never generate a string that
looks like this.

--
  Sam  http://samason.me.uk/

Re: [HACKERS] string_to_array with empty input

From
Greg Stark
Date:
On Wed, Apr 1, 2009 at 6:23 PM, David E. Wheeler <david@kineticode.com> wrote:
> Right, it's making a special case of '', which does seem rather inconsistent
> to me.


"David E. Wheeler" <david@kineticode.com> writes:

> On Apr 1, 2009, at 10:05 AM, justin wrote:
>
>> string_to_array('',',')::INT[]  works as proposed
>>
>> But
>> string_to_array(',,,', ',' )::INT[]   Fails
>> or
>> string_to_array('1,2,,4', ',' )::INT[] Fails .
>>
>>
>> I'm trying to understand the difference between a empty string to a  string
>> with  many blank entries between  the delimiter.

Well, uh, in one case it's empty and in the other case it's not?

>> Consider   ',,,,,,'  = ''  once the delimiter is removed .  Yet  Seven zero
>> length entries were passed.  How is that going to be  handled????

Well it's pretty clear empty delimiters cannot be handled consistently. Some
languages handle them as a special case (splitting every character into a
separate string, for example -- which I'll point out will result in an empty
array as a result for an empty string input) or make it an error.

> Right, it's making a special case of '', which does seem rather  inconsistent
> to me.

It's not a special case -- or it's a special case whichever we choose,
depending on which way you look at it.

What we're talking about here is replacing the blank values in the following
tables. We can get either the first one right in both cases with {} as the
result, or we can get the second one right in the second table with {""}.
Either way there is an inconsistency in at least one case.

The existing behaviour of returning NULL is the only "consistent" choice since
the correct value is "unknown". And one could argue that it's easier to
replace NULL with the correct value if the programmer knows using coalesce
than it is to replace either "" or {""}. But I'm still leaning to thinking
that using an arbitrary choice that at least gets most users intentions is
better.

postgres=# select input,
string_to_array(array_to_string(input,','),',') as output from (values
(array[]::text[]),(array['foo']),(array['foo','bar']),(array['foo','bar','baz']))
as input(input);
     input     |    output
---------------+---------------
 {}            |
 {foo}         | {foo}
 {foo,bar}     | {foo,bar}
 {foo,bar,baz} | {foo,bar,baz}
(4 rows)

postgres=# select input,
string_to_array(array_to_string(input,','),',') as output from (values
(array[]::text[]),(array['']),(array['','']),(array['','',''])) as
input(input);
   input    |   output
------------+------------
 {}         |
 {""}       |
 {"",""}    | {"",""}
 {"","",""} | {"","",""}
(4 rows)

--
greg

Re: [HACKERS] string_to_array with empty input

From
Sam Mason
Date:
On Wed, Apr 01, 2009 at 07:40:16PM +0100, Greg Stark wrote:
> The existing behaviour of returning NULL is the only "consistent"
> choice since the correct value is "unknown". And one could argue that
> it's easier to replace NULL with the correct value if the programmer
> knows using coalesce than it is to replace either "" or {""}.

Couldn't a similar argument be applied for division by zero?  Since it's
not known whether the user wants to get a "divide by zero" exception or
"infinity" PG should return NULL and punt the choice to the user.

I think everybody would agree that this would be a bad thing to do!

> But I'm
> still leaning to thinking that using an arbitrary choice that at least
> gets most users intentions is better.

I'd agree; returning NULL and not forcing the user to make a choice
is a bad design decision---the user doesn't need to put a coalesce
in and hence their code will probably break in strange ways when
they're not expecting it.  Nobody suggest adding a third parameter to
string_to_array, please!

The general mantra that seems to apply here is "one good option is
better than two bad ones".

--
  Sam  http://samason.me.uk/

Re: [HACKERS] string_to_array with empty input

From
Robert Haas
Date:
On Wed, Apr 1, 2009 at 12:52 PM, David E. Wheeler <david@kineticode.com> wrote:
> Well, I'd just point out that the return value of string_to_array() is
> text[]. Thus, this is not a problem with string_to_array(), but a casting
> problem from text[] to int[]. Making string_to_array() return a NULL for
> this case to make casting simpler is addressing the problem in the wrong
> place, IMHO. If I want to do this in Perl, for example, I'd do something
> like this:
>
> my @ints = grep { defined $_ && $_ ne '' } split ',', $string;

I've written code that looks a whole lot like this myself, but there's
no easy way to do that in SQL.  SQL, in particular, lacks closures, so
grep {} and map {} don't exist.  I really, really wish they did, but I
believe that our type system is too woefully pathetic to be up to the
job.  So it seems to me that arguing that SQL (which lacks those
primitives) should match Perl (which has them) isn't really getting us
anywhere.

> my @ints = map { $_ || 0 } split ',', $string;
>
> This ensures that I get the proper number of records in the example of something like '1,2,,4'.

I can't see that there's any way to do this in SQL regardless of how
we define this operation.

...Robert

Re: [HACKERS] string_to_array with empty input

From
Robert Haas
Date:
On Wed, Apr 1, 2009 at 1:05 PM, justin <justin@emproshunts.com> wrote:
> I'm still a hold out,  We are taking a string putting it into a array based
> on a delimiter.  That is very simple and straight forward.  Yet many argue
> if we want to cast this into another data type the function should deal with
> in limited cases.
>
> string_to_array('',',')::INT[]  works as proposed
>
> But
> string_to_array(',,,', ',' )::INT[]   Fails
> or
> string_to_array('1,2,,4', ',' )::INT[] Fails .

But... but... those aren't comma-separated lists of integers.  If they
were, it would work.

string_to_array('cow,dog,horse')::INT[] will also fail.

If you take 0 items of any type whatsoever and join them together with
commas, you will get the empty string.  It is also true that if you
join 1 item together with commas, you will get that item back, and if
that item is the empty string, you will now have the empty string.  I
think it's better to worry more about the first case because it
applies to any type at all, whereas the latter case ONLY applies in
situations where the empty string is a potentially legal value.

...Robert

Re: [HACKERS] string_to_array with empty input

From
justin
Date:
If someone can show me a real world example  this logic simplifies the
code and has more uses I'll bite


I just presently can't see how this works better.

Re: [HACKERS] string_to_array with empty input

From
Martin Gainty
Date:
Split strings into array elements using provided delimiter

string_to_array('xx~^~yy~^~zz', '~^~')
output:
{xx,yy,zz}

http://www.postgresql.org/docs/8.3/interactive/functions-array.html

?
Martin
______________________________________________
Disclaimer and confidentiality note
This message is confidential and may be privileged. If you are not the intended recipient, we kindly ask you to  please inform the sender. Any unauthorised dissemination or copying hereof is prohibited. This message serves for information purposes only and shall not have any legally binding effect. Given that e-mails can easily be subject to manipulation, we can not accept any liability for the content provided.






> Date: Wed, 1 Apr 2009 15:49:42 -0400
> From: justin@emproshunts.com
> To: robertmhaas@gmail.com
> CC: tgl@sss.pgh.pa.us; stark@enterprisedb.com; sam@samason.me.uk; pgsql-general@postgresql.org; pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] [GENERAL] string_to_array with empty input
>
> If someone can show me a real world example this logic simplifies the
> code and has more uses I'll bite
>
>
> I just presently can't see how this works better.
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general


Rediscover Hotmail®: Get e-mail storage that grows with you. Check it out.

Re: [HACKERS] string_to_array with empty input

From
Justin
Date:
Martin Gainty wrote: <blockquote cite="mid:BLU142-W15179F226ED6BF8A1854D9AE8B0@phx.gbl" type="cite"><style>
.hmmessage P
{
margin:0px;
padding:0px
}
body.hmmessage
{
font-size: 10pt;
font-family:Verdana
} </style><tt class="LITERAL">Split strings into array elements using provided delimiter<br /><br />
string_to_array('xx~^~yy~^~zz','~^~') <br /> output: </tt><tt class="LITERAL">{xx,yy,zz}</tt><br /><br /><a
class="moz-txt-link-freetext"
href="http://www.postgresql.org/docs/8.3/interactive/functions-array.html">http://www.postgresql.org/docs/8.3/interactive/functions-array.html</a><br
/><br/></blockquote> Sorry thats not the question i'm asking.  <br /><br /> We are debating if it makes sense to change
theoutput in certain cases.  <br /><br /> I'm for not returning nulls or returning zero element array.  <br /><br />
I'masking how is the other better by giving a real world example???  I don't see the plus side at the moment. <br /> 

Re: [HACKERS] string_to_array with empty input

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> If you take 0 items of any type whatsoever and join them together with
> commas, you will get the empty string.  It is also true that if you
> join 1 item together with commas, you will get that item back, and if
> that item is the empty string, you will now have the empty string.  I
> think it's better to worry more about the first case because it
> applies to any type at all, whereas the latter case ONLY applies in
> situations where the empty string is a potentially legal value.

I'm starting to vacillate again.  It's clear that for the purposes
of string_to_array, an empty input string is fundamentally ambiguous:
it could mean a list of no things, or a list of one empty thing.
So the two cases in which an application can safely make use of this
function are
    (1) if lists of no things never happen.
    (2) if lists never contain empty things.
Either rule allows us to resolve the ambiguity.  We've been discussing
the fact that (2) is an okay assumption for many non-text data types,
but none-the-less string_to_array is in itself a text function and (2)
is not very good for text.  Making this worse, the format *appears*
to work fine for empty strings, so long as you don't have exactly
one of them.  So it seems like applications might be much more likely
to violate (2) than (1).

Another way to state the point is that we can offer people a choice of
two limitations: string_to_array doesn't work for zero-length lists,
or string_to_array doesn't work for empty strings (except most of the
time, it does).  The former is sounding less likely to bite people
unexpectedly.

Or we could stick to the current behavior and say "use COALESCE() to
resolve the ambiguity, if you need to".

            regards, tom lane

Re: [HACKERS] string_to_array with empty input

From
Robert Haas
Date:
On Wed, Apr 1, 2009 at 5:22 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Or we could stick to the current behavior and say "use COALESCE() to
> resolve the ambiguity, if you need to".

If there's no consensus on changing the behavior, it's probably better
to be backward compatible than not.

...Robert

Re: [HACKERS] string_to_array with empty input

From
Steve Crawford
Date:
Tom Lane wrote:
>
> I'm starting to vacillate again.  It's clear that for the purposes
> of string_to_array, an empty input string is fundamentally ambiguous:
> it could mean a list of no things, or a list of one empty thing.
>
Agreed. Of the two, a list of one empty thing makes string_to_array
closer to an inverse of array_to_string.

> Or we could stick to the current behavior and say "use COALESCE() to
> resolve the ambiguity, if you need to".
>
>
Currently string_to_array(null, ',') yields a null result -
indistinguishable from string_to_array('',','). Wrapping in coalesce
does not help distinguish true null input from empty-string input. I'm
not sure at the moment what other cases exist where non-null input
generates null output.

If the decision is to leave the behavior unchanged, it at least cries
out for a documentation patch.

Cheers,
Steve


Re: [HACKERS] string_to_array with empty input

From
"Leif B. Kristensen"
Date:
On Thursday 2. April 2009, Steve Crawford wrote:
>Currently string_to_array(null, ',') yields a null result -
>indistinguishable from string_to_array('',','). Wrapping in coalesce
>does not help distinguish true null input from empty-string input. I'm
>not sure at the moment what other cases exist where non-null input
>generates null output.

Somehow this reminds me of the old "division by zero" problem.

IMO, the proper way to handle this kind of anomaly would be to test if
the length of the string is non-zero before submitting it to the
string_to_array() function.
--
Leif Biberg Kristensen | Registered Linux User #338009
Me And My Database: http://solumslekt.org/blog/

Re: [HACKERS] string_to_array with empty input

From
"David E. Wheeler"
Date:
On Apr 1, 2009, at 12:19 PM, Robert Haas wrote:

>> my @ints = map { $_ || 0 } split ',', $string;
>>
>> This ensures that I get the proper number of records in the example
>> of something like '1,2,,4'.
>
> I can't see that there's any way to do this in SQL regardless of how
> we define this operation.

It's easy enough to write a function to do it:

CREATE OR REPLACE FUNCTION trim_blanks (anyarray) RETURNS anyarray AS $$
     SELECT ARRAY(
         SELECT CASE WHEN $1[i] IS NULL OR $1[i] = '' THEN '0' ELSE
$1[i] END
           FROM generate_series(1, array_upper($1, 1)) s(i)
          ORDER BY i
     );
$$ LANGUAGE SQL IMMUTABLE;

Best,

David

Re: [HACKERS] string_to_array with empty input

From
"David E. Wheeler"
Date:
On Apr 1, 2009, at 2:22 PM, Tom Lane wrote:

> Another way to state the point is that we can offer people a choice of
> two limitations: string_to_array doesn't work for zero-length lists,
> or string_to_array doesn't work for empty strings (except most of the
> time, it does).  The former is sounding less likely to bite people
> unexpectedly.

Right, very well put.

> Or we could stick to the current behavior and say "use COALESCE() to
> resolve the ambiguity, if you need to".

Steve has a point that leaving it as-is leaves it as impossible to
tell the difference between string_to_array(NULL, ',') and
string_to_array('', ','). The former properly handles an unknown
value, while the latter, where '' is a known value, seems weird to be
returning NULL.

Best,

David

Re: [HACKERS] string_to_array with empty input

From
Steve Crawford
Date:
Leif B. Kristensen wrote:
> On Thursday 2. April 2009, Steve Crawford wrote:
>
>> Currently string_to_array(null, ',') yields a null result -
>> indistinguishable from string_to_array('',','). Wrapping in coalesce
>> does not help distinguish true null input from empty-string input. I'm
>> not sure at the moment what other cases exist where non-null input
>> generates null output.
>>
>
> Somehow this reminds me of the old "division by zero" problem.
>
> IMO, the proper way to handle this kind of anomaly would be to test if
> the length of the string is non-zero before submitting it to the
> string_to_array() function.
>
Quite the opposite. Where division by zero is simply illegal as is, say,
string_to_array(1234, ','), string_to_array('', ',') is legal.
Unfortunately it is legal, and legal and legal.... with numerous
reasonable interpretations of which legal is most appropriate/consistent.

I would argue against a change to have string_to_array('',',') throw an
error.

Cheers,
Steve


Re: [HACKERS] string_to_array with empty input

From
Sam Mason
Date:
On Thu, Apr 02, 2009 at 09:29:04AM -0700, Steve Crawford wrote:
> Leif B. Kristensen wrote:
> >Somehow this reminds me of the old "division by zero" problem.
> >
> >IMO, the proper way to handle this kind of anomaly would be to test if
> >the length of the string is non-zero before submitting it to the
> >string_to_array() function.
>
> Quite the opposite. Where division by zero is simply illegal

This is just a matter of definitions; divide by zero is fine in lots of
languages and no exception will be raised.  The fact that you're saying
it's "simply illegal" means that you've internalised the definition to
such an extent that any alternative appears "simply illegal".

It seems reasonable to assume that if, to pick an arbitrary choice,
string_to_array returned a zero element set people would say it was
"simply illegal" for it to do anything else.  There are choices for
either and a choice needs to be made or the situation should somehow be
made impossible.

> I would argue against a change to have string_to_array('',',') throw an
> error.

I'd agree, throwing an exception here doesn't seem useful.

--
  Sam  http://samason.me.uk/

Re: [HACKERS] string_to_array with empty input

From
Robert Haas
Date:
On Thu, Apr 2, 2009 at 12:17 PM, David E. Wheeler <david@kineticode.com> wrote:
> On Apr 1, 2009, at 2:22 PM, Tom Lane wrote:
>
>> Another way to state the point is that we can offer people a choice of
>> two limitations: string_to_array doesn't work for zero-length lists,
>> or string_to_array doesn't work for empty strings (except most of the
>> time, it does).  The former is sounding less likely to bite people
>> unexpectedly.
>
> Right, very well put.
>
>> Or we could stick to the current behavior and say "use COALESCE() to
>> resolve the ambiguity, if you need to".
>
> Steve has a point that leaving it as-is leaves it as impossible to tell the
> difference between string_to_array(NULL, ',') and string_to_array('', ',').
> The former properly handles an unknown value, while the latter, where '' is
> a known value, seems weird to be returning NULL.

*shrug* CASE WHEN blah IS NOT NULL THEN string_to_array(blah, ',') END

More and more I'm leaning toward leaving this alone.  No matter how
you define it, the behavior can be changed to whichever alternative
you prefer with a 1-line case statement.

...Robert

Re: [HACKERS] string_to_array with empty input

From
Robert Haas
Date:
On Thu, Apr 2, 2009 at 12:10 PM, David E. Wheeler <david@kineticode.com> wrote:
> On Apr 1, 2009, at 12:19 PM, Robert Haas wrote:
>
>>> my @ints = map { $_ || 0 } split ',', $string;
>>>
>>> This ensures that I get the proper number of records in the example of
>>> something like '1,2,,4'.
>>
>> I can't see that there's any way to do this in SQL regardless of how
>> we define this operation.
>
> It's easy enough to write a function to do it:
>
> CREATE OR REPLACE FUNCTION trim_blanks (anyarray) RETURNS anyarray AS $$
>    SELECT ARRAY(
>        SELECT CASE WHEN $1[i] IS NULL OR $1[i] = '' THEN '0' ELSE $1[i] END
>          FROM generate_series(1, array_upper($1, 1)) s(i)
>         ORDER BY i
>    );
> $$ LANGUAGE SQL IMMUTABLE;

Ah!  Thanks for the tip.

...Robert

Re: [HACKERS] string_to_array with empty input

From
Tom Lane
Date:
"David E. Wheeler" <david@kineticode.com> writes:
>> Or we could stick to the current behavior and say "use COALESCE() to
>> resolve the ambiguity, if you need to".

> Steve has a point that leaving it as-is leaves it as impossible to
> tell the difference between string_to_array(NULL, ',') and
> string_to_array('', ','). The former properly handles an unknown
> value, while the latter, where '' is a known value, seems weird to be
> returning NULL.

Yeah, COALESCE is an abuse of a convenient notation, which will fall
over if you also want NULL to yield NULL.  A correct fix
outside-the-function would look more like

case when str = '' then '{}'::text[] else string_to_array(str, ',') end

which should correctly yield NULL for NULL input and an empty array
for empty input.  Similarly, if someone wanted to force the
single-empty-string result, they should do

case when str = '' then '{""}'::text[] else string_to_array(str, ',') end

which also still yields NULL if str is NULL.

Right at the moment, if we stick with the historical definition
of the function, *both* camps have to write out their choice of
the above.  Seems like this is the worst of all possible worlds.
We should probably pick one or the other.

            regards, tom lane

Re: [HACKERS] string_to_array with empty input

From
Robert Haas
Date:
On Thu, Apr 2, 2009 at 2:04 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Right at the moment, if we stick with the historical definition
> of the function, *both* camps have to write out their choice of
> the above.  Seems like this is the worst of all possible worlds.
> We should probably pick one or the other.

ISTM there are three camps.

...Robert

Re: [HACKERS] string_to_array with empty input

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Apr 2, 2009 at 2:04 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Right at the moment, if we stick with the historical definition
>> of the function, *both* camps have to write out their choice of
>> the above. �Seems like this is the worst of all possible worlds.
>> We should probably pick one or the other.

> ISTM there are three camps.

If there's a camp that actually *wants* a NULL result for this case,
I missed the reasoning.  AFAICS we can either say that every application
is going to have to put in a CASE wrapper around this function, or say
that we'll make it do the right thing for some of them and the rest have
to put the same wrapper around it.

            regards, tom lane

Re: [HACKERS] string_to_array with empty input

From
Sam Mason
Date:
On Thu, Apr 02, 2009 at 02:04:41PM -0400, Tom Lane wrote:
> A correct fix
> outside-the-function would look more like
>
> case when str = '' then '{}'::text[] else string_to_array(str, ',') end
>
> which should correctly yield NULL for NULL input and an empty array
> for empty input.  Similarly, if someone wanted to force the
> single-empty-string result, they should do
>
> case when str = '' then '{""}'::text[] else string_to_array(str, ',') end
>
> which also still yields NULL if str is NULL.
>
> Right at the moment, if we stick with the historical definition
> of the function, *both* camps have to write out their choice of
> the above.  Seems like this is the worst of all possible worlds.
> We should probably pick one or the other.

Yes, I'd be tempted to pick one and go with it.  It's seems a completely
arbitrary choice one way or the other but the current behaviour is
certainly wrong.

I'd go with returning a zero element array because it would do
the "right thing" more often when paired with array_to_string.
I've also been through the first few pages of a Google search for
"array_to_string" and it seems to do the "right" thing for the majority
of the cases.

--
  Sam  http://samason.me.uk/

Re: [HACKERS] string_to_array with empty input

From
Robert Haas
Date:
On Thu, Apr 2, 2009 at 2:18 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Thu, Apr 2, 2009 at 2:04 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Right at the moment, if we stick with the historical definition
>>> of the function, *both* camps have to write out their choice of
>>> the above.  Seems like this is the worst of all possible worlds.
>>> We should probably pick one or the other.
>
>> ISTM there are three camps.
>
> If there's a camp that actually *wants* a NULL result for this case,
> I missed the reasoning.  AFAICS we can either say that every application
> is going to have to put in a CASE wrapper around this function, or say
> that we'll make it do the right thing for some of them and the rest have
> to put the same wrapper around it.

So that we don't break existing apps because of an issue that is
trivial to work around.

...Robert

Re: [HACKERS] string_to_array with empty input

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Apr 2, 2009 at 2:18 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> If there's a camp that actually *wants* a NULL result for this case,
>> I missed the reasoning.

> So that we don't break existing apps because of an issue that is
> trivial to work around.

We would only be breaking them if a NULL result were actually the
correct behavior for the application's requirements, which seems
a bit unlikely.

            regards, tom lane

Re: [HACKERS] string_to_array with empty input

From
"David E. Wheeler"
Date:
On Apr 2, 2009, at 11:24 AM, Sam Mason wrote:

> Yes, I'd be tempted to pick one and go with it.  It's seems a
> completely
> arbitrary choice one way or the other but the current behaviour is
> certainly wrong.
>
> I'd go with returning a zero element array because it would do
> the "right thing" more often when paired with array_to_string.
> I've also been through the first few pages of a Google search for
> "array_to_string" and it seems to do the "right" thing for the
> majority
> of the cases.

Forgive me if I'm missing something, but it seems to me that
array_to_string() works either way, no?

try=# select '"' || array_to_string('{}'::text[], ',') || '"'; ?column?
----------
  ""
(1 row)

Time: 72.129 ms
try=# select '"' || array_to_string('{""}'::text[], ',') || '"';
  ?column?
----------
  ""
(1 row)

Best,

David

Re: [HACKERS] string_to_array with empty input

From
Robert Haas
Date:
On Thu, Apr 2, 2009 at 2:50 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Thu, Apr 2, 2009 at 2:18 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> If there's a camp that actually *wants* a NULL result for this case,
>>> I missed the reasoning.
>
>> So that we don't break existing apps because of an issue that is
>> trivial to work around.
>
> We would only be breaking them if a NULL result were actually the
> correct behavior for the application's requirements, which seems
> a bit unlikely.

But that's completely untrue.  If the most useful behavior is either
ARRAY[''] or ARRAY[], then there are presumably lots and lots of
people out there who have apps that do COALESCE(string_to_array(...),
something).  Whichever way you change string_to_array() will break all
of the people doing this who wanted the opposite behavior for no good
reason.

...Robert

Re: [HACKERS] string_to_array with empty input

From
Sam Mason
Date:
On Thu, Apr 02, 2009 at 12:06:01PM -0700, David E. Wheeler wrote:
> On Apr 2, 2009, at 11:24 AM, Sam Mason wrote:
> >Yes, I'd be tempted to pick one and go with it.  It's seems a
> >completely arbitrary choice one way or the other but the current
> >behaviour is certainly wrong.
> >
> >I'd go with returning a zero element array because it would do
> >the "right thing" more often when paired with array_to_string.
> >I've also been through the first few pages of a Google search for
> >"array_to_string" and it seems to do the "right" thing for the
> >majority of the cases.
>
> Forgive me if I'm missing something, but it seems to me that
> array_to_string() works either way, no?

Sorry, I meant to type string_to_array but typed array_to_string
instead---after doing exactly the same thing when searching for stuff in
Google!  I think I should be using copy/paste more!

--
  Sam  http://samason.me.uk/

Re: [HACKERS] string_to_array with empty input

From
Steve Crawford
Date:
Did I miss the exciting conclusion or did this drift silently off radar?

I seem to recall three options:

1. Leave as is. Arguments: least effort, no backward compatibility
issues, since array_to_string evaluate both an array with single empty
string and an array with no elements to an empty string, string_to_array
on empty strings is ambiguous so we'll call it null. But: means that the
result of null input and non-null empty-string both result in null
output, requires everyone to explicitly handle empty strings (with the
side effect that they really know what the result will be) instead of
"helping" the majority of users. Requires: documentation change to
accurately describe function's behavior.

2. Change function to return an array. Arguments: Distinguishes null
from non-null input, easier coding for most cases, perhaps a less
surprising result. But: not backward compatible, requires somewhat
arbitrary decision on correct return value. Requires: code
change/testing, documentation updates.

In scenario 2, there were two options:
2a. Return zero-element array.
2b. Return array with single empty-string element.

My impression was that among the "change" options, 2b had the most
support (it is the most useful for the use-cases I've encountered so it
gets my vote). If the consensus is to change the function, it may be too
late for 8.4. But the documentation could be updated to reflect current
and planned behavior.

Cheers,
Steve


Re: [HACKERS] string_to_array with empty input

From
justin
Date:
Steve Crawford wrote:
> Did I miss the exciting conclusion or did this drift silently off radar?
>
>
it was pretty well split between the options.  tabled for another time.

Re: [HACKERS] string_to_array with empty input

From
"David E. Wheeler"
Date:
On Apr 7, 2009, at 8:07 AM, Steve Crawford wrote:

> In scenario 2, there were two options:
> 2a. Return zero-element array.
> 2b. Return array with single empty-string element.
>
> My impression was that among the "change" options, 2b had the most
> support (it is the most useful for the use-cases I've encountered so
> it gets my vote). If the consensus is to change the function, it may
> be too late for 8.4. But the documentation could be updated to
> reflect current and planned behavior.

+1

David