Re: [GENERAL] string_to_array with empty input - Mailing list pgsql-hackers

From Greg Stark
Subject Re: [GENERAL] string_to_array with empty input
Date
Msg-id 4136ffa0904011140x7e2473f4x714660d8a0cb8bd9@mail.gmail.com
Whole thread Raw
In response to Re: [GENERAL] string_to_array with empty input  ("David E. Wheeler" <david@kineticode.com>)
Responses Re: [GENERAL] string_to_array with empty input  (Sam Mason <sam@samason.me.uk>)
List pgsql-hackers
On Wed, Apr 1, 2009 at 6:23 PM, David E. Wheeler <david@kineticode.com> wrote:
> Right, it's making a special case of '', which does seem rather inconsistent
> to me.


"David E. Wheeler" <david@kineticode.com> writes:

> On Apr 1, 2009, at 10:05 AM, justin wrote:
>
>> string_to_array('',',')::INT[]  works as proposed
>>
>> But
>> string_to_array(',,,', ',' )::INT[]   Fails
>> or
>> string_to_array('1,2,,4', ',' )::INT[] Fails .
>>
>>
>> I'm trying to understand the difference between a empty string to a  string
>> with  many blank entries between  the delimiter.

Well, uh, in one case it's empty and in the other case it's not?

>> Consider   ',,,,,,'  = ''  once the delimiter is removed .  Yet  Seven zero
>> length entries were passed.  How is that going to be  handled????

Well it's pretty clear empty delimiters cannot be handled consistently. Some
languages handle them as a special case (splitting every character into a
separate string, for example -- which I'll point out will result in an empty
array as a result for an empty string input) or make it an error.

> Right, it's making a special case of '', which does seem rather  inconsistent
> to me.

It's not a special case -- or it's a special case whichever we choose,
depending on which way you look at it.

What we're talking about here is replacing the blank values in the following
tables. We can get either the first one right in both cases with {} as the
result, or we can get the second one right in the second table with {""}.
Either way there is an inconsistency in at least one case.

The existing behaviour of returning NULL is the only "consistent" choice since
the correct value is "unknown". And one could argue that it's easier to
replace NULL with the correct value if the programmer knows using coalesce
than it is to replace either "" or {""}. But I'm still leaning to thinking
that using an arbitrary choice that at least gets most users intentions is
better.

postgres=# select input,
string_to_array(array_to_string(input,','),',') as output from (values
(array[]::text[]),(array['foo']),(array['foo','bar']),(array['foo','bar','baz']))
as input(input);
     input     |    output
---------------+---------------
 {}            |
 {foo}         | {foo}
 {foo,bar}     | {foo,bar}
 {foo,bar,baz} | {foo,bar,baz}
(4 rows)

postgres=# select input,
string_to_array(array_to_string(input,','),',') as output from (values
(array[]::text[]),(array['']),(array['','']),(array['','',''])) as
input(input);
   input    |   output
------------+------------
 {}         |
 {""}       |
 {"",""}    | {"",""}
 {"","",""} | {"","",""}
(4 rows)

--
greg

pgsql-hackers by date:

Previous
From: Sam Mason
Date:
Subject: Re: [GENERAL] string_to_array with empty input
Next
From: Tom Lane
Date:
Subject: Re: Patch to speed up pg_dump