Thread: proposal: support empty string as separator for string_to_array
Hello I have one idea, that should simplify string to char array transformation. The base is idea: between every char is empty string, so empty string is regular separator for string_to_array function. This behave is inversion of array_to_string function behave: postgres=# select array_to_string(array['a','b','c'],'');array_to_string -----------------abc (1 row) postgres=# select string_to_array('abc','');string_to_array ----------------- {a,b,c} (1 row) Notes, ideas??? Regards Pavel Stehule
On Fri, Jul 24, 2009 at 11:40 PM, Pavel Stehule<pavel.stehule@gmail.com> wrote: > Hello > > I have one idea, that should simplify string to char array > transformation. The base is idea: between every char is empty string, > so empty string is regular separator for string_to_array function. > This behave is inversion of array_to_string function behave: > > postgres=# select array_to_string(array['a','b','c'],''); > array_to_string > ----------------- > abc > (1 row) > > postgres=# select string_to_array('abc',''); > string_to_array > ----------------- > {a,b,c} > (1 row) postgres=# select regexp_split_to_array('abc', '');regexp_split_to_array -----------------------{a,b,c} (1 row) :-) merlin
2009/7/25 Merlin Moncure <mmoncure@gmail.com>: > On Fri, Jul 24, 2009 at 11:40 PM, Pavel Stehule<pavel.stehule@gmail.com> wrote: >> Hello >> >> I have one idea, that should simplify string to char array >> transformation. The base is idea: between every char is empty string, >> so empty string is regular separator for string_to_array function. >> This behave is inversion of array_to_string function behave: >> >> postgres=# select array_to_string(array['a','b','c'],''); >> array_to_string >> ----------------- >> abc >> (1 row) >> >> postgres=# select string_to_array('abc',''); >> string_to_array >> ----------------- >> {a,b,c} >> (1 row) > > postgres=# select regexp_split_to_array('abc', ''); > regexp_split_to_array > ----------------------- > {a,b,c} > (1 row) I know - but regexp is not necessary - simply function for string decomposition should be faster and little bit more intuitive. Not everybody understand reg exp. Pavel > > :-) > > merlin >
Pavel Stehule <pavel.stehule@gmail.com> writes: > I have one idea, that should simplify string to char array > transformation. The base is idea: between every char is empty string, > so empty string is regular separator for string_to_array function. There already is a definition for what string_to_array does with an empty field separator, and that is not it. So this change would possibly break existing applications. It does not seem either intuitively correct or useful enough to justify that --- particularly seeing that there's already another way to get the effect. regards, tom lane
2009/7/25 Tom Lane <tgl@sss.pgh.pa.us>: > Pavel Stehule <pavel.stehule@gmail.com> writes: >> I have one idea, that should simplify string to char array >> transformation. The base is idea: between every char is empty string, >> so empty string is regular separator for string_to_array function. > > There already is a definition for what string_to_array does with an > empty field separator, and that is not it. So this change would possibly > break existing applications. It does not seem either intuitively > correct or useful enough to justify that --- particularly seeing that > there's already another way to get the effect. I thing, so nobody use empty separator in string_to_array, because it does nothing useful. Or do you know any case where empty separator should be used? I am not. My argument for "some" non regexp based function is fact, so this function should be very light and fast. Faster than regexp. Other way is one param string_to_array function. This function is not defined yet, so we could to use it. Regards Pavel > > regards, tom lane >
Pavel Stehule <pavel.stehule@gmail.com> writes: > 2009/7/25 Tom Lane <tgl@sss.pgh.pa.us>: >> There already is a definition for what string_to_array does with an >> empty field separator, and that is not it. > I thing, so nobody use empty separator in string_to_array, because it > does nothing useful. According to you, maybe not. But perhaps whoever coded the function originally had a use-case in mind, or people may have come up with one since then. In any case we have a perfectly good answer available for anyone who wants this behavior. I see no reason to change here. regards, tom lane
2009/7/25 Merlin Moncure <mmoncure@gmail.com>: > On Fri, Jul 24, 2009 at 11:40 PM, Pavel Stehule<pavel.stehule@gmail.com> wrote: >> Hello >> >> I have one idea, that should simplify string to char array >> transformation. The base is idea: between every char is empty string, >> so empty string is regular separator for string_to_array function. >> This behave is inversion of array_to_string function behave: >> >> postgres=# select array_to_string(array['a','b','c'],''); >> array_to_string >> ----------------- >> abc >> (1 row) >> >> postgres=# select string_to_array('abc',''); >> string_to_array >> ----------------- >> {a,b,c} >> (1 row) > > postgres=# select regexp_split_to_array('abc', ''); > regexp_split_to_array > ----------------------- > {a,b,c} > (1 row) > > :-) > I tested implementation and it's about 30% faster than using regexp. I could to thing, 30% is significant reason for implementation. regards Pavel Stehule > merlin >
Pavel Stehule <pavel.stehule@gmail.com> wrote: > I tested implementation and it's about 30% faster than using > regexp. Rather than making a change which could break existing applications, how about a new function string_to_array(text) which returns an array of "char"? -Kevin
2009/7/27 Kevin Grittner <Kevin.Grittner@wicourts.gov>: > Pavel Stehule <pavel.stehule@gmail.com> wrote: > >> I tested implementation and it's about 30% faster than using >> regexp. > > Rather than making a change which could break existing applications, > how about a new function string_to_array(text) which returns an array > of "char"? yes, it was my idea too - or function "chars_to_array" Pavel > > -Kevin >
On Mon, Jul 27, 2009 at 12:42 PM, Pavel Stehule<pavel.stehule@gmail.com> wrote: > 2009/7/25 Merlin Moncure <mmoncure@gmail.com>: >> On Fri, Jul 24, 2009 at 11:40 PM, Pavel Stehule<pavel.stehule@gmail.com> wrote: >>> Hello >>> >>> I have one idea, that should simplify string to char array >>> transformation. The base is idea: between every char is empty string, >>> so empty string is regular separator for string_to_array function. >>> This behave is inversion of array_to_string function behave: >>> >>> postgres=# select array_to_string(array['a','b','c'],''); >>> array_to_string >>> ----------------- >>> abc >>> (1 row) >>> >>> postgres=# select string_to_array('abc',''); >>> string_to_array >>> ----------------- >>> {a,b,c} >>> (1 row) >> >> postgres=# select regexp_split_to_array('abc', ''); >> regexp_split_to_array >> ----------------------- >> {a,b,c} >> (1 row) >> >> :-) >> > > I tested implementation and it's about 30% faster than using regexp. > > I could to thing, 30% is significant reason for implementation. yes, I noticed that too. I was thinking though that if anything should be done, it should be to go the other way: simple cases of regexp_split_to_array should use the simpler algorithm in 'string_to_array'...just not the '' case, since they produce different results. I don't think the chars_to_array function is the way to go. One thing that might work though is to overload the string_to_array function (or use default parameter) to control the empty string case with an bool, or an option or something. merlin
Pavel Stehule <pavel.stehule@gmail.com> writes: > I tested implementation and it's about 30% faster than using regexp. In a real application, that's going to be negligible compared to all the other costs involved in pushing the data around. And we still haven't seen any in-the-field requests for this functionality, so even if the gap were wider, I don't see the point of putting effort into it. regards, tom lane
2009/7/27 Tom Lane <tgl@sss.pgh.pa.us>: > Pavel Stehule <pavel.stehule@gmail.com> writes: >> I tested implementation and it's about 30% faster than using regexp. > > In a real application, that's going to be negligible compared to all the > other costs involved in pushing the data around. And we still haven't > seen any in-the-field requests for this functionality, so even if the > gap were wider, I don't see the point of putting effort into it. > This is just possible optimalisation - Maybe Merlin proposal is the best - we could add this technique to regexp_split_to_array - when is pattern empty, then we could to use direct char separation - without any new function or change of current function. And somebody, that use regexp_split_to_array now will have profit too. Pavel > regards, tom lane >