Thread: string_to_array, array_to_string function without separator
Hi
I propose mentioned functions without specified separator. In this case the string is transformed to array of chars, in second case, the array of chars is transformed back to string.
Comments, notes?
Regards
Pavel
On Fri, Mar 15, 2019 at 05:04:02AM +0100, Pavel Stehule wrote: > Hi > > I propose mentioned functions without specified separator. In this case the > string is transformed to array of chars, in second case, the array of chars > is transformed back to string. > > Comments, notes? Whatever optimizations you have in mind for this, could they also work for string_to_array() and array_to_string() when they get an empty string handed to them? As to naming, some languages use explode/implode. Best, David. -- David Fetter <david(at)fetter(dot)org> http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
pá 15. 3. 2019 v 15:03 odesílatel David Fetter <david@fetter.org> napsal:
On Fri, Mar 15, 2019 at 05:04:02AM +0100, Pavel Stehule wrote:
> Hi
>
> I propose mentioned functions without specified separator. In this case the
> string is transformed to array of chars, in second case, the array of chars
> is transformed back to string.
>
> Comments, notes?
Whatever optimizations you have in mind for this, could they also work
for string_to_array() and array_to_string() when they get an empty
string handed to them?
my idea is use string_to_array('AHOJ') --> {A,H,O,J}
empty input means empty result --> {}
As to naming, some languages use explode/implode.
can be, but if we have string_to_array already, I am thinking so it is good name.
Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
On 3/15/19 11:46 AM, Pavel Stehule wrote: > pá 15. 3. 2019 v 15:03 odesílatel David Fetter <david@fetter.org> napsal: >> Whatever optimizations you have in mind for this, could they also work >> for string_to_array() and array_to_string() when they get an empty >> string handed to them? > > my idea is use string_to_array('AHOJ') --> {A,H,O,J} > > empty input means empty result --> {} I thought the question was maybe about an empty /delimiter/ string. It seems that string_to_array already has this behavior if NULL is passed as the delimiter: > select string_to_array('AHOJ', null); string_to_array ----------------- {A,H,O,J} and array_to_string has the proposed behavior if passed an empty string as the delimiter (as one would naturally expect) ... but not null for a delimiter (that just makes the result null). So the proposal seems roughly equivalent to making string_to_array's second parameter optional default null, and array_to_string's second parameter optional default ''. Does that sound right? Regards, -Chap
pá 15. 3. 2019 v 16:59 odesílatel Chapman Flack <chap@anastigmatix.net> napsal:
On 3/15/19 11:46 AM, Pavel Stehule wrote:
> pá 15. 3. 2019 v 15:03 odesílatel David Fetter <david@fetter.org> napsal:
>> Whatever optimizations you have in mind for this, could they also work
>> for string_to_array() and array_to_string() when they get an empty
>> string handed to them?
>
> my idea is use string_to_array('AHOJ') --> {A,H,O,J}
>
> empty input means empty result --> {}
I thought the question was maybe about an empty /delimiter/ string.
It seems that string_to_array already has this behavior if NULL is
passed as the delimiter:
> select string_to_array('AHOJ', null);
string_to_array
-----------------
{A,H,O,J}
and array_to_string has the proposed behavior if passed an
empty string as the delimiter (as one would naturally expect)
... but not null for a delimiter (that just makes the result null).
So the proposal seems roughly equivalent to making string_to_array's
second parameter optional default null, and array_to_string's second
parameter optional default ''.
Does that sound right?
yes
Pavel
Regards,
-Chap
Chapman Flack <chap@anastigmatix.net> writes: > So the proposal seems roughly equivalent to making string_to_array's > second parameter optional default null, and array_to_string's second > parameter optional default ''. In that case why bother? It'll just create a cross-version compatibility hazard for next-to-no keystroke savings. If the cases were so common that they could be argued to be sane "default" behavior, I might feel differently --- but if you were asked in a vacuum what the default delimiters ought to be, I don't think you'd say "no delimiter". regards, tom lane
pá 15. 3. 2019 v 17:16 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
Chapman Flack <chap@anastigmatix.net> writes:
> So the proposal seems roughly equivalent to making string_to_array's
> second parameter optional default null, and array_to_string's second
> parameter optional default ''.
In that case why bother? It'll just create a cross-version compatibility
hazard for next-to-no keystroke savings. If the cases were so common
that they could be argued to be sane "default" behavior, I might feel
differently --- but if you were asked in a vacuum what the default
delimiters ought to be, I don't think you'd say "no delimiter".
My motivation is following - sometimes I need to convert string to array of chars. Using NULL as separator is possible, but it is not intuitive. When you use string_to_array function without separator, then only one possible semantic is there - separation by chars.
I understand so there is a possible collision and possible meaning of missing parameter like default value. But in this case this meaning, semantic is not practical.
Regards
Pavel
regards, tom lane
On 3/15/19 12:15 PM, Tom Lane wrote: > Chapman Flack <chap@anastigmatix.net> writes: >> So the proposal seems roughly equivalent to making string_to_array's >> second parameter optional default null, and array_to_string's second >> parameter optional default ''. > > In that case why bother? It'll just create a cross-version compatibility > hazard for next-to-no keystroke savings. If the cases were so common > that they could be argued to be sane "default" behavior, I might feel > differently --- but if you were asked in a vacuum what the default > delimiters ought to be, I don't think you'd say "no delimiter". One could go further and argue that the non-optional arguments improve clarity: a reader seeing the explicit NULL or '' argument gets a strong clue what's intended, who in the optional-argument case might end up thinking "must go look up what this function's default delimiter is". -Chap
On 3/15/19 12:26 PM, Pavel Stehule wrote: > you use string_to_array function without separator, then only one possible > semantic is there - separation by chars. Other languages can and do specify other semantics for the separator-omitted case: often (as in Python) it means to split around "runs of one or more characters the platform considers white space", as a convenience, given that it's a fairly commonly wanted meaning but can be tedious to spell out as an explicit separator. I admit I think a separator of '' would be more clear than null, so if I were designing string_to_array in a green field, I think I would swap the meanings of null and '' as the delimiter: null would mean "don't really split anything", and '' would mean "split everywhere you can find '' in the string", that is, everywhere. But the current behavior is already established.... Regards, -Chap
pá 15. 3. 2019 v 17:54 odesílatel Chapman Flack <chap@anastigmatix.net> napsal:
On 3/15/19 12:26 PM, Pavel Stehule wrote:
> you use string_to_array function without separator, then only one possible
> semantic is there - separation by chars.
Other languages can and do specify other semantics for the
separator-omitted case: often (as in Python) it means to split
around "runs of one or more characters the platform considers white
space", as a convenience, given that it's a fairly commonly wanted
meaning but can be tedious to spell out as an explicit separator.
for this proposal "char" != byte
result[n] = substring(str FROM n FOR 1)
I admit I think a separator of '' would be more clear than null,
so if I were designing string_to_array in a green field, I think
I would swap the meanings of null and '' as the delimiter: null
would mean "don't really split anything", and '' would mean "split
everywhere you can find '' in the string", that is, everywhere.
But the current behavior is already established....
yes
Pavel
Regards,
-Chap
On 3/15/19 12:59 PM, Pavel Stehule wrote: > for this proposal "char" != byte > > result[n] = substring(str FROM n FOR 1) I think that's what string_to_array(..., null) already does: SHOW server_encoding; server_encoding UTF8 WITH t0(s) AS (SELECT text 'verlorn ist daz slüzzelîn'), t1(a) AS (SELECT string_to_array(s, null) FROM t0) SELECT char_length(s), octet_length(convert_to(s, 'UTF8')), array_length(a,1), a FROM t0, t1; char_length|octet_length|array_length|a 25|27|25|{v,e,r,l,o,r,n," ",i,s,t," ",d,a,z," ",s,l,ü,z,z,e,l,î,n} Regards, -Chap
pá 15. 3. 2019 v 18:30 odesílatel Chapman Flack <chap@anastigmatix.net> napsal:
On 3/15/19 12:59 PM, Pavel Stehule wrote:
> for this proposal "char" != byte
>
> result[n] = substring(str FROM n FOR 1)
I think that's what string_to_array(..., null) already does:
sure. My proposal is +/- just reduction about null parameter.
SHOW server_encoding;
server_encoding
UTF8
WITH
t0(s) AS (SELECT text 'verlorn ist daz slüzzelîn'),
t1(a) AS (SELECT string_to_array(s, null) FROM t0)
SELECT
char_length(s), octet_length(convert_to(s, 'UTF8')),
array_length(a,1), a
FROM
t0, t1;
char_length|octet_length|array_length|a
25|27|25|{v,e,r,l,o,r,n," ",i,s,t," ",d,a,z," ",s,l,ü,z,z,e,l,î,n}
Regards,
-Chap
On Fri, Mar 15, 2019 at 12:31:21PM -0400, Chapman Flack wrote: > On 3/15/19 12:15 PM, Tom Lane wrote: > > Chapman Flack <chap@anastigmatix.net> writes: > >> So the proposal seems roughly equivalent to making string_to_array's > >> second parameter optional default null, and array_to_string's second > >> parameter optional default ''. > > > > In that case why bother? It'll just create a cross-version compatibility > > hazard for next-to-no keystroke savings. If the cases were so common > > that they could be argued to be sane "default" behavior, I might feel > > differently --- but if you were asked in a vacuum what the default > > delimiters ought to be, I don't think you'd say "no delimiter". > > One could go further and argue that the non-optional arguments improve > clarity: a reader seeing the explicit NULL or '' argument gets a strong > clue what's intended, who in the optional-argument case might end up > thinking "must go look up what this function's default delimiter is". Going to look up the function's behavior would be much more fun if there were comments on these functions explaining things. I'll draft up a patch for some of that. In a similar vein, I haven't been able to come up with hazards of naming function parameters in some document-ish way. What did I miss? Best, David. -- David Fetter <david(at)fetter(dot)org> http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate