Thread: proposal: support empty string as separator for string_to_array

proposal: support empty string as separator for string_to_array

From
Pavel Stehule
Date:
Hello

I have one idea, that should simplify string to char array
transformation. The base is idea: between every char is empty string,
so empty string is regular separator for string_to_array function.
This behave is inversion of array_to_string function behave:

postgres=# select array_to_string(array['a','b','c'],'');array_to_string
-----------------abc
(1 row)

postgres=# select string_to_array('abc','');string_to_array
----------------- {a,b,c}
(1 row)

Notes, ideas???

Regards
Pavel Stehule


Re: proposal: support empty string as separator for string_to_array

From
Merlin Moncure
Date:
On Fri, Jul 24, 2009 at 11:40 PM, Pavel Stehule<pavel.stehule@gmail.com> wrote:
> Hello
>
> I have one idea, that should simplify string to char array
> transformation. The base is idea: between every char is empty string,
> so empty string is regular separator for string_to_array function.
> This behave is inversion of array_to_string function behave:
>
> postgres=# select array_to_string(array['a','b','c'],'');
>  array_to_string
> -----------------
>  abc
> (1 row)
>
> postgres=# select string_to_array('abc','');
>  string_to_array
> -----------------
>  {a,b,c}
> (1 row)

postgres=# select regexp_split_to_array('abc', '');regexp_split_to_array
-----------------------{a,b,c}
(1 row)

:-)

merlin


Re: proposal: support empty string as separator for string_to_array

From
Pavel Stehule
Date:
2009/7/25 Merlin Moncure <mmoncure@gmail.com>:
> On Fri, Jul 24, 2009 at 11:40 PM, Pavel Stehule<pavel.stehule@gmail.com> wrote:
>> Hello
>>
>> I have one idea, that should simplify string to char array
>> transformation. The base is idea: between every char is empty string,
>> so empty string is regular separator for string_to_array function.
>> This behave is inversion of array_to_string function behave:
>>
>> postgres=# select array_to_string(array['a','b','c'],'');
>>  array_to_string
>> -----------------
>>  abc
>> (1 row)
>>
>> postgres=# select string_to_array('abc','');
>>  string_to_array
>> -----------------
>>  {a,b,c}
>> (1 row)
>
> postgres=# select regexp_split_to_array('abc', '');
>  regexp_split_to_array
> -----------------------
>  {a,b,c}
> (1 row)

I know - but regexp is not necessary - simply function for string
decomposition should be faster and little bit more intuitive. Not
everybody understand reg exp.

Pavel
>
> :-)
>
> merlin
>


Re: proposal: support empty string as separator for string_to_array

From
Tom Lane
Date:
Pavel Stehule <pavel.stehule@gmail.com> writes:
> I have one idea, that should simplify string to char array
> transformation. The base is idea: between every char is empty string,
> so empty string is regular separator for string_to_array function.

There already is a definition for what string_to_array does with an
empty field separator, and that is not it.  So this change would possibly
break existing applications.  It does not seem either intuitively
correct or useful enough to justify that --- particularly seeing that
there's already another way to get the effect.
        regards, tom lane


Re: proposal: support empty string as separator for string_to_array

From
Pavel Stehule
Date:
2009/7/25 Tom Lane <tgl@sss.pgh.pa.us>:
> Pavel Stehule <pavel.stehule@gmail.com> writes:
>> I have one idea, that should simplify string to char array
>> transformation. The base is idea: between every char is empty string,
>> so empty string is regular separator for string_to_array function.
>
> There already is a definition for what string_to_array does with an
> empty field separator, and that is not it.  So this change would possibly
> break existing applications.  It does not seem either intuitively
> correct or useful enough to justify that --- particularly seeing that
> there's already another way to get the effect.

I thing, so nobody use empty separator in string_to_array, because it
does nothing useful. Or do you know any case where empty separator
should be used? I am not. My argument for "some" non regexp based
function is fact, so this function should be very light and fast.
Faster than regexp.

Other way is one param string_to_array function. This function is not
defined yet, so we could to use it.

Regards
Pavel

>
>                        regards, tom lane
>


Re: proposal: support empty string as separator for string_to_array

From
Tom Lane
Date:
Pavel Stehule <pavel.stehule@gmail.com> writes:
> 2009/7/25 Tom Lane <tgl@sss.pgh.pa.us>:
>> There already is a definition for what string_to_array does with an
>> empty field separator, and that is not it.

> I thing, so nobody use empty separator in string_to_array, because it
> does nothing useful.

According to you, maybe not.  But perhaps whoever coded the function
originally had a use-case in mind, or people may have come up with
one since then.  In any case we have a perfectly good answer available
for anyone who wants this behavior.  I see no reason to change here.
        regards, tom lane


Re: proposal: support empty string as separator for string_to_array

From
Pavel Stehule
Date:
2009/7/25 Merlin Moncure <mmoncure@gmail.com>:
> On Fri, Jul 24, 2009 at 11:40 PM, Pavel Stehule<pavel.stehule@gmail.com> wrote:
>> Hello
>>
>> I have one idea, that should simplify string to char array
>> transformation. The base is idea: between every char is empty string,
>> so empty string is regular separator for string_to_array function.
>> This behave is inversion of array_to_string function behave:
>>
>> postgres=# select array_to_string(array['a','b','c'],'');
>>  array_to_string
>> -----------------
>>  abc
>> (1 row)
>>
>> postgres=# select string_to_array('abc','');
>>  string_to_array
>> -----------------
>>  {a,b,c}
>> (1 row)
>
> postgres=# select regexp_split_to_array('abc', '');
>  regexp_split_to_array
> -----------------------
>  {a,b,c}
> (1 row)
>
> :-)
>

I tested  implementation and it's about 30% faster than using regexp.

I could to thing, 30% is significant reason for implementation.

regards
Pavel Stehule


> merlin
>


Re: proposal: support empty string as separator for string_to_array

From
"Kevin Grittner"
Date:
Pavel Stehule <pavel.stehule@gmail.com> wrote: 
> I tested  implementation and it's about 30% faster than using
> regexp.
Rather than making a change which could break existing applications,
how about a new function string_to_array(text) which returns an array
of "char"?
-Kevin


Re: proposal: support empty string as separator for string_to_array

From
Pavel Stehule
Date:
2009/7/27 Kevin Grittner <Kevin.Grittner@wicourts.gov>:
> Pavel Stehule <pavel.stehule@gmail.com> wrote:
>
>> I tested  implementation and it's about 30% faster than using
>> regexp.
>
> Rather than making a change which could break existing applications,
> how about a new function string_to_array(text) which returns an array
> of "char"?

yes, it was my idea too - or function "chars_to_array"

Pavel

>
> -Kevin
>


Re: proposal: support empty string as separator for string_to_array

From
Merlin Moncure
Date:
On Mon, Jul 27, 2009 at 12:42 PM, Pavel Stehule<pavel.stehule@gmail.com> wrote:
> 2009/7/25 Merlin Moncure <mmoncure@gmail.com>:
>> On Fri, Jul 24, 2009 at 11:40 PM, Pavel Stehule<pavel.stehule@gmail.com> wrote:
>>> Hello
>>>
>>> I have one idea, that should simplify string to char array
>>> transformation. The base is idea: between every char is empty string,
>>> so empty string is regular separator for string_to_array function.
>>> This behave is inversion of array_to_string function behave:
>>>
>>> postgres=# select array_to_string(array['a','b','c'],'');
>>>  array_to_string
>>> -----------------
>>>  abc
>>> (1 row)
>>>
>>> postgres=# select string_to_array('abc','');
>>>  string_to_array
>>> -----------------
>>>  {a,b,c}
>>> (1 row)
>>
>> postgres=# select regexp_split_to_array('abc', '');
>>  regexp_split_to_array
>> -----------------------
>>  {a,b,c}
>> (1 row)
>>
>> :-)
>>
>
> I tested  implementation and it's about 30% faster than using regexp.
>
> I could to thing, 30% is significant reason for implementation.

yes, I noticed that too.  I was thinking though that if anything
should be done, it should be to go the other way: simple cases of
regexp_split_to_array should use the simpler algorithm in
'string_to_array'...just not the '' case, since they produce different
results.

I don't think the chars_to_array function is the way to go.  One thing
that might work though is to overload the string_to_array function (or
use default parameter) to control the empty string case with an bool,
or an option or something.

merlin


Re: proposal: support empty string as separator for string_to_array

From
Tom Lane
Date:
Pavel Stehule <pavel.stehule@gmail.com> writes:
> I tested  implementation and it's about 30% faster than using regexp.

In a real application, that's going to be negligible compared to all the
other costs involved in pushing the data around.  And we still haven't
seen any in-the-field requests for this functionality, so even if the
gap were wider, I don't see the point of putting effort into it.
        regards, tom lane


Re: proposal: support empty string as separator for string_to_array

From
Pavel Stehule
Date:
2009/7/27 Tom Lane <tgl@sss.pgh.pa.us>:
> Pavel Stehule <pavel.stehule@gmail.com> writes:
>> I tested  implementation and it's about 30% faster than using regexp.
>
> In a real application, that's going to be negligible compared to all the
> other costs involved in pushing the data around.  And we still haven't
> seen any in-the-field requests for this functionality, so even if the
> gap were wider, I don't see the point of putting effort into it.
>

This is just possible optimalisation -  Maybe Merlin proposal is the
best - we could add this technique to regexp_split_to_array - when is
pattern empty, then we could to use direct char separation - without
any new function or change of current function. And somebody, that use
regexp_split_to_array now will have profit too.

Pavel

>                        regards, tom lane
>