Thread: proposal: row_to_array function
On 1/16/15 3:45 AM, Pavel Stehule wrote: > I am returning back to processing records in plpgsql. > > I am thinking so it can be simply processed with transformations to array. > > Now we have similar functions - hstore(row), row_to_json, ... but using of these functions can be a useless step. Any rowvariable can be transformed to 2D text array. How is it useless? Why wouldn't you just use JSON and be done with it? Do you have some use cases you can share? > There two possible transformations: > > row_to_array --> [[key1, value1],[key2, value2], ...] > row_to_row_array --> [(key1, value1), (key2, value2), ... ] If we're going to go that route, I think it makes more sense to create an actual key/value type (ie: http://pgxn.org/dist/pair/doc/pair.html)and return an array of that. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com
On 1/16/15 3:45 AM, Pavel Stehule wrote:I am returning back to processing records in plpgsql.
I am thinking so it can be simply processed with transformations to array.
Now we have similar functions - hstore(row), row_to_json, ... but using of these functions can be a useless step. Any row variable can be transformed to 2D text array.
How is it useless? Why wouldn't you just use JSON and be done with it?
so ROW->ARRAY is shorter than ROW->JSON->ARRAY or ROW->HSTORE->ARRAY
Do you have some use cases you can share?
There two possible transformations:
row_to_array --> [[key1, value1],[key2, value2], ...]
row_to_row_array --> [(key1, value1), (key2, value2), ... ]
If we're going to go that route, I think it makes more sense to create an actual key/value type (ie: http://pgxn.org/dist/pair/doc/pair.html) and return an array of that.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com
On 1/16/15 11:22 AM, Pavel Stehule wrote: > > > 2015-01-16 18:03 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com <mailto:Jim.Nasby@bluetreble.com>>: > > On 1/16/15 3:45 AM, Pavel Stehule wrote: > > I am returning back to processing records in plpgsql. > > I am thinking so it can be simply processed with transformations to array. > > Now we have similar functions - hstore(row), row_to_json, ... but using of these functions can be a useless step.Any row variable can be transformed to 2D text array. > > > How is it useless? Why wouldn't you just use JSON and be done with it? > > > We can use a FOREACH IN ARRAY iteration in plpgsql (second variant is a implementation FOREACH for jsonb) > > so ROW->ARRAY is shorter than ROW->JSON->ARRAY or ROW->HSTORE->ARRAY I think the real problem here is that we're inventing a bunch of different ways to do the same thing: iterate over a set.Instead of doing that, should we add the idea of an iterator to the type system? That would make sense for arrays, hstore,json and XML. > Do you have some use cases you can share? > > > processing of NEW, OLD variables in triggers Note that last time I checked you couldn't do something like NEW.variable, and I don't think you could use EXEC to do iteither. So there's more needed here than just converting a record to an array. > There two possible transformations: > > row_to_array --> [[key1, value1],[key2, value2], ...] > row_to_row_array --> [(key1, value1), (key2, value2), ... ] > > > If we're going to go that route, I think it makes more sense to create an actual key/value type (ie: http://pgxn.org/dist/pair/doc/__pair.html<http://pgxn.org/dist/pair/doc/pair.html>) and return an array of that. > > > ok > > -- > Jim Nasby, Data Architect, Blue Treble Consulting > Data in Trouble? Get it in Treble! http://BlueTreble.com > > -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com
On 1/16/15 11:22 AM, Pavel Stehule wrote:
2015-01-16 18:03 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com <mailto:Jim.Nasby@bluetreble.com>>:
On 1/16/15 3:45 AM, Pavel Stehule wrote:
I am returning back to processing records in plpgsql.
I am thinking so it can be simply processed with transformations to array.
Now we have similar functions - hstore(row), row_to_json, ... but using of these functions can be a useless step. Any row variable can be transformed to 2D text array.
How is it useless? Why wouldn't you just use JSON and be done with it?
We can use a FOREACH IN ARRAY iteration in plpgsql (second variant is a implementation FOREACH for jsonb)
so ROW->ARRAY is shorter than ROW->JSON->ARRAY or ROW->HSTORE->ARRAY
I think the real problem here is that we're inventing a bunch of different ways to do the same thing: iterate over a set. Instead of doing that, should we add the idea of an iterator to the type system? That would make sense for arrays, hstore, json and XML.
Do you have some use cases you can share?
processing of NEW, OLD variables in triggers
Note that last time I checked you couldn't do something like NEW.variable, and I don't think you could use EXEC to do it either. So there's more needed here than just converting a record to an array.There two possible transformations:
row_to_array --> [[key1, value1],[key2, value2], ...]
row_to_row_array --> [(key1, value1), (key2, value2), ... ]
If we're going to go that route, I think it makes more sense to create an actual key/value type (ie: http://pgxn.org/dist/pair/doc/__pair.html <http://pgxn.org/dist/pair/doc/pair.html>) and return an array of that.
ok
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com
On 01/16/2015 12:22 PM, Pavel Stehule wrote: > > > There two possible transformations: > > row_to_array --> [[key1, value1],[key2, value2], ...] > row_to_row_array --> [(key1, value1), (key2, value2), ... ] > > > If we're going to go that route, I think it makes more sense to > create an actual key/value type (ie: > http://pgxn.org/dist/pair/doc/pair.html) and return an array of that. > > > ok > > <http://BlueTreble.com> > I think we'd possibly be better off with simply returning a flat array, [key1, value1, ...] Thats's what the hstore(text[]) and json_object(text[]) functions accept, along with the 2D variant, if we want a precedent. cheers andrew
On 01/16/2015 12:22 PM, Pavel Stehule wrote:
There two possible transformations:
row_to_array --> [[key1, value1],[key2, value2], ...]
row_to_row_array --> [(key1, value1), (key2, value2), ... ]
If we're going to go that route, I think it makes more sense to
create an actual key/value type (ie:
http://pgxn.org/dist/pair/doc/pair.html) and return an array of that.
ok
<http://BlueTreble.com>
I think we'd possibly be better off with simply returning a flat array, [key1, value1, ...]
Thats's what the hstore(text[]) and json_object(text[]) functions accept, along with the 2D variant, if we want a precedent.
cheers
andrew
2015-01-16 22:35 GMT+01:00 Andrew Dunstan <andrew@dunslane.net>:
On 01/16/2015 12:22 PM, Pavel Stehule wrote:
There two possible transformations:
row_to_array --> [[key1, value1],[key2, value2], ...]
row_to_row_array --> [(key1, value1), (key2, value2), ... ]
If we're going to go that route, I think it makes more sense to
create an actual key/value type (ie:
http://pgxn.org/dist/pair/doc/pair.html) and return an array of that.
ok
<http://BlueTreble.com>
I think we'd possibly be better off with simply returning a flat array, [key1, value1, ...]
Thats's what the hstore(text[]) and json_object(text[]) functions accept, along with the 2D variant, if we want a precedent.It can be one of supported variant. I should not be one, because we cannot to simply iterate over itNext possibility is teach FOREACH to take key and value in one step.
ostgres=# do $$
declare a int;
b int;
begin
foreach a,b in array ARRAY[(1,2),(3,4)]
loop
raise notice 'a = %, b = %', a,b;
end loop;
end;
$$ language plpgsql;
NOTICE: a = 1, b = 2
NOTICE: a = 3, b = 4
DO
RegardsPavel
cheers
andrew
On 01/16/2015 12:22 PM, Pavel Stehule wrote:
There two possible transformations:
row_to_array --> [[key1, value1],[key2, value2], ...]
row_to_row_array --> [(key1, value1), (key2, value2), ... ]
If we're going to go that route, I think it makes more sense to
create an actual key/value type (ie:
http://pgxn.org/dist/pair/doc/pair.html) and return an array of that.
ok
<http://BlueTreble.com>
I think we'd possibly be better off with simply returning a flat array, [key1, value1, ...]
Thats's what the hstore(text[]) and json_object(text[]) functions accept, along with the 2D variant, if we want a precedent.
cheers
andrew
Attachment
On 1/25/15 4:23 AM, Pavel Stehule wrote: > > I tested a concept iteration over array in format [key1, value1, key2, value2, .. ] - what is nice, it works for [[key1,value1],[key2,value2], ...] too > > It is only a few lines more to current code, and this change doesn't break a compatibility. > > Do you think, so this patch is acceptable? > > Ideas, comments? Aside from fixing the comments... I think this needs more tests on corner cases. For example, what happens when you do foreach a, b, c in array(array(1,2),array(3,4)) ? Or the opposite case of foreach a,b in array(array(1,2,3)) Also, what about: foreach a,b in '{{{1,2},{3,4}},{{5,6},{7,8}}}'::int[] ? -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com
On 1/25/15 4:23 AM, Pavel Stehule wrote:
I tested a concept iteration over array in format [key1, value1, key2, value2, .. ] - what is nice, it works for [[key1,value1],[key2, value2], ...] too
It is only a few lines more to current code, and this change doesn't break a compatibility.
Do you think, so this patch is acceptable?
Ideas, comments?
Aside from fixing the comments... I think this needs more tests on corner cases. For example, what happens when you do
foreach a, b, c in array(array(1,2),array(3,4)) ?
array(1,2),array(3,4) -- do you think ARRAY[[1,2],[3,4]] is effectively ARRAY[1,2,3,4]
Or the opposite case of
foreach a,b in array(array(1,2,3))
Also, what about:
foreach a,b in '{{{1,2},{3,4}},{{5,6},{7,8}}}'::int[] ?
postgres=# select array(select unnest('{{{1,2},{3,4}},{{5,6},{7,8}}}'::int[]));
array
-------------------
{1,2,3,4,5,6,7,8}
(1 row)
Pavel Stehule
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com
2015-01-26 21:44 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com>:On 1/25/15 4:23 AM, Pavel Stehule wrote:
I tested a concept iteration over array in format [key1, value1, key2, value2, .. ] - what is nice, it works for [[key1,value1],[key2, value2], ...] too
It is only a few lines more to current code, and this change doesn't break a compatibility.
Do you think, so this patch is acceptable?
Ideas, comments?
Aside from fixing the comments... I think this needs more tests on corner cases. For example, what happens when you do
foreach a, b, c in array(array(1,2),array(3,4)) ?it is relative simple behave -- empty values are NULL
array(1,2),array(3,4) -- do you think ARRAY[[1,2],[3,4]] is effectively ARRAY[1,2,3,4]
Or the opposite case of
foreach a,b in array(array(1,2,3))
Also, what about:
foreach a,b in '{{{1,2},{3,4}},{{5,6},{7,8}}}'::int[] ?
postgres=# select array(select unnest('{{{1,2},{3,4}},{{5,6},{7,8}}}'::int[]));
array
-------------------
{1,2,3,4,5,6,7,8}
(1 row)so it generate pairs {1,2}{3,4},{5,6},{7,8}
Regards
Pavel Stehule
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com
Attachment
Hi2015-01-27 11:41 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com>:2015-01-26 21:44 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com>:On 1/25/15 4:23 AM, Pavel Stehule wrote:
I tested a concept iteration over array in format [key1, value1, key2, value2, .. ] - what is nice, it works for [[key1,value1],[key2, value2], ...] too
It is only a few lines more to current code, and this change doesn't break a compatibility.
Do you think, so this patch is acceptable?
Ideas, comments?
Aside from fixing the comments... I think this needs more tests on corner cases. For example, what happens when you do
foreach a, b, c in array(array(1,2),array(3,4)) ?it is relative simple behave -- empty values are NULL
array(1,2),array(3,4) -- do you think ARRAY[[1,2],[3,4]] is effectively ARRAY[1,2,3,4]
Or the opposite case of
foreach a,b in array(array(1,2,3))
Also, what about:
foreach a,b in '{{{1,2},{3,4}},{{5,6},{7,8}}}'::int[] ?
postgres=# select array(select unnest('{{{1,2},{3,4}},{{5,6},{7,8}}}'::int[]));
array
-------------------
{1,2,3,4,5,6,7,8}
(1 row)so it generate pairs {1,2}{3,4},{5,6},{7,8}I fixed situation when array has not enough elements.More tests, simple docRegardsPavelRegards
Pavel Stehule
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com
Attachment
postgres=# do $$
declare r record;
declare k text; v text;
begin
for r in select * from foo loop
foreach k,v in array row_to_array(r) loop
raise notice 'k: %, v: %', k, v;
end loop;
end loop;
end;
$$;
NOTICE: k: a, v: 2
NOTICE: k: b, v: NAZDAR
NOTICE: k: c, v: 2015-01-27
NOTICE: k: a, v: 2
NOTICE: k: b, v: AHOJ
NOTICE: k: c, v: 2015-01-27
DO
Pavel
PavelRegardsHellohere is a initial version of row_to_array function - transform any row to array in format proposed by Andrew.2015-01-27 19:58 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com>:Hi2015-01-27 11:41 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com>:2015-01-26 21:44 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com>:On 1/25/15 4:23 AM, Pavel Stehule wrote:
I tested a concept iteration over array in format [key1, value1, key2, value2, .. ] - what is nice, it works for [[key1,value1],[key2, value2], ...] too
It is only a few lines more to current code, and this change doesn't break a compatibility.
Do you think, so this patch is acceptable?
Ideas, comments?
Aside from fixing the comments... I think this needs more tests on corner cases. For example, what happens when you do
foreach a, b, c in array(array(1,2),array(3,4)) ?it is relative simple behave -- empty values are NULL
array(1,2),array(3,4) -- do you think ARRAY[[1,2],[3,4]] is effectively ARRAY[1,2,3,4]
Or the opposite case of
foreach a,b in array(array(1,2,3))
Also, what about:
foreach a,b in '{{{1,2},{3,4}},{{5,6},{7,8}}}'::int[] ?
postgres=# select array(select unnest('{{{1,2},{3,4}},{{5,6},{7,8}}}'::int[]));
array
-------------------
{1,2,3,4,5,6,7,8}
(1 row)so it generate pairs {1,2}{3,4},{5,6},{7,8}I fixed situation when array has not enough elements.More tests, simple docRegardsPavelRegards
Pavel Stehule
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com
On 1/27/15 2:26 PM, Pavel Stehule wrote: > here is a initial version of row_to_array function - transform any row to array in format proposed by Andrew. Please start a new thread for this... does it depend on the key-value patch? -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com
On 1/27/15 12:58 PM, Pavel Stehule wrote: > postgres=# select array(select unnest('{{{1,2},{3,4}},{{5,6},{7,8}}}'::int[])); > array > ------------------- > {1,2,3,4,5,6,7,8} > (1 row) > > so it generate pairs {1,2}{3,4},{5,6},{7,8} > > > I fixed situation when array has not enough elements. > > More tests, simple doc Hrm, this wasn't what I was expecting: + select foreach_test_ab(array[1,2,3,4]); + NOTICE: a: 1, b: 2 + NOTICE: a: 3, b: 4 I was expecting that foreach a,b array would be expecting something in the array to have a dimension of 2. :( I think this is bad, because this: foreach_test_ab('{{1,2,3},{4,5,6}}'::int[]); will give you 1,2; 3,4; 5,6. I don't see any way that that makes sense. Even if it did make sense, I'm more concerned thatadding this will seriously paint us into a corner when it comes to the (to me) more rational case of returning {1,2,3},{4,5,6}. I think we need to think some more about this, at least to make sure we're not painting ourselves into a corner for moreappropriate array iteration. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com
<p dir="ltr"><br /> Dne 28.1.2015 0:25 "Jim Nasby" <<a href="mailto:Jim.Nasby@bluetreble.com">Jim.Nasby@bluetreble.com</a>>napsal(a):<br /> ><br /> > On 1/27/15 12:58PM, Pavel Stehule wrote:<br /> >><br /> >> postgres=# select array(select unnest('{{{1,2},{3,4}},{{5,6},{7,8}}}'::int[]));<br/> >> array<br /> >> -------------------<br/> >> {1,2,3,4,5,6,7,8}<br /> >> (1 row)<br /> >><br /> >> soit generate pairs {1,2}{3,4},{5,6},{7,8}<br /> >><br /> >><br /> >> I fixed situation when array hasnot enough elements.<br /> >><br /> >> More tests, simple doc<br /> ><br /> ><br /> > Hrm, this wasn'twhat I was expecting:<br /> ><br /> > + select foreach_test_ab(array[1,2,3,4]);<br /> > + NOTICE: a: 1, b:2<br /> > + NOTICE: a: 3, b: 4<br /> ><br /> > I was expecting that foreach a,b array would be expecting somethingin the array to have a dimension of 2. :(<p dir="ltr">It is inconsist (your expectation) with current implementationof FOREACH. It doesnt produce a array when SLICING is missing. And it doesnt calculate with dimensions.<p dir="ltr">Iwould not to change this rule. It is not ambigonuous and it allows to work with<br /> 1d, 2d, 3d dimensions array.You can process Andrew format well and my proposed format (2d array) well too. <p dir="ltr">There can be differen behavewhen SLICING is used. There we can iterate exactly with dimensions. We can design a behave in this case?<br /><p dir="ltr">><br/> > I think this is bad, because this:<br /> ><br /> > foreach_test_ab('{{1,2,3},{4,5,6}}'::int[]);<br/> ><br /> > will give you 1,2; 3,4; 5,6. I don't see any way that thatmakes sense. Even if it did make sense, I'm more concerned that adding this will seriously paint us into a corner whenit comes to the (to me) more rational case of returning {1,2,3},{4,5,6}.<br /> ><br /> > I think we need to thinksome more about this, at least to make sure we're not painting ourselves into a corner for more appropriate array iteration.<br/> ><br /> > -- <br /> > Jim Nasby, Data Architect, Blue Treble Consulting<br /> > Data in Trouble?Get it in Treble! <a href="http://BlueTreble.com">http://BlueTreble.com</a><br />
On 1/27/15 2:26 PM, Pavel Stehule wrote:here is a initial version of row_to_array function - transform any row to array in format proposed by Andrew.
Please start a new thread for this... does it depend on the key-value patch?
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com
Dne 28.1.2015 0:25 "Jim Nasby" <Jim.Nasby@bluetreble.com> napsal(a):
>
> On 1/27/15 12:58 PM, Pavel Stehule wrote:
>>
>> postgres=# select array(select unnest('{{{1,2},{3,4}},{{5,6},{7,8}}}'::int[]));
>> array
>> -------------------
>> {1,2,3,4,5,6,7,8}
>> (1 row)
>>
>> so it generate pairs {1,2}{3,4},{5,6},{7,8}
>>
>>
>> I fixed situation when array has not enough elements.
>>
>> More tests, simple doc
>
>
> Hrm, this wasn't what I was expecting:
>
> + select foreach_test_ab(array[1,2,3,4]);
> + NOTICE: a: 1, b: 2
> + NOTICE: a: 3, b: 4
>
> I was expecting that foreach a,b array would be expecting something in the array to have a dimension of 2. :(It is inconsist (your expectation) with current implementation of FOREACH. It doesnt produce a array when SLICING is missing. And it doesnt calculate with dimensions.
I would not to change this rule. It is not ambigonuous and it allows to work with
1d, 2d, 3d dimensions array. You can process Andrew format well and my proposed format (2d array) well too.
CREATE OR REPLACE FUNCTION iterate_over_pairs(text[])
RETURNS void AS $$
DECLARE v1 text; v2 text; e text; i int := 0;
BEGIN
FOREACH e IN ARRAY $1 LOOP
IF i % 2 = 0 THEN v1 := e;
ELSE v2 := e; RAISE NOTICE 'v1: %, v2: %', v1, v2; END IF;
i := i + 1;
END LOOP;
END;
$$ LANGUAGE plpgsql;
postgres=# SELECT iterate_over_pairs(ARRAY[1,2,3,4]::text[]);
NOTICE: v1: 1, v2: 2
NOTICE: v1: 3, v2: 4
iterate_over_pairs
--------------------
(1 row)
postgres=# SELECT iterate_over_pairs(ARRAY[[1,2],[3,4]]::text[]);
NOTICE: v1: 1, v2: 2
NOTICE: v1: 3, v2: 4
iterate_over_pairs
--------------------
(1 row)
..
There can be differen behave when SLICING is used. There we can iterate exactly with dimensions. We can design a behave in this case?
>
> I think this is bad, because this:
>
> foreach_test_ab('{{1,2,3},{4,5,6}}'::int[]);
>
> will give you 1,2; 3,4; 5,6. I don't see any way that that makes sense. Even if it did make sense, I'm more concerned that adding this will seriously paint us into a corner when it comes to the (to me) more rational case of returning {1,2,3},{4,5,6}.
>
> I think we need to think some more about this, at least to make sure we're not painting ourselves into a corner for more appropriate array iteration.
>
> --
> Jim Nasby, Data Architect, Blue Treble Consulting
> Data in Trouble? Get it in Treble! http://BlueTreble.com
Hi2015-01-27 11:41 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com>:2015-01-26 21:44 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com>:On 1/25/15 4:23 AM, Pavel Stehule wrote:
I tested a concept iteration over array in format [key1, value1, key2, value2, .. ] - what is nice, it works for [[key1,value1],[key2, value2], ...] too
It is only a few lines more to current code, and this change doesn't break a compatibility.
Do you think, so this patch is acceptable?
Ideas, comments?
Aside from fixing the comments... I think this needs more tests on corner cases. For example, what happens when you do
foreach a, b, c in array(array(1,2),array(3,4)) ?it is relative simple behave -- empty values are NULL
array(1,2),array(3,4) -- do you think ARRAY[[1,2],[3,4]] is effectively ARRAY[1,2,3,4]
Or the opposite case of
foreach a,b in array(array(1,2,3))
Also, what about:
foreach a,b in '{{{1,2},{3,4}},{{5,6},{7,8}}}'::int[] ?
postgres=# select array(select unnest('{{{1,2},{3,4}},{{5,6},{7,8}}}'::int[]));
array
-------------------
{1,2,3,4,5,6,7,8}
(1 row)so it generate pairs {1,2}{3,4},{5,6},{7,8}I fixed situation when array has not enough elements.
On Tue, Jan 27, 2015 at 10:58 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:Hi2015-01-27 11:41 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com>:2015-01-26 21:44 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com>:On 1/25/15 4:23 AM, Pavel Stehule wrote:
I tested a concept iteration over array in format [key1, value1, key2, value2, .. ] - what is nice, it works for [[key1,value1],[key2, value2], ...] too
It is only a few lines more to current code, and this change doesn't break a compatibility.
Do you think, so this patch is acceptable?
Ideas, comments?
Aside from fixing the comments... I think this needs more tests on corner cases. For example, what happens when you do
foreach a, b, c in array(array(1,2),array(3,4)) ?it is relative simple behave -- empty values are NULL
array(1,2),array(3,4) -- do you think ARRAY[[1,2],[3,4]] is effectively ARRAY[1,2,3,4]
Or the opposite case of
foreach a,b in array(array(1,2,3))
Also, what about:
foreach a,b in '{{{1,2},{3,4}},{{5,6},{7,8}}}'::int[] ?
postgres=# select array(select unnest('{{{1,2},{3,4}},{{5,6},{7,8}}}'::int[]));
array
-------------------
{1,2,3,4,5,6,7,8}
(1 row)so it generate pairs {1,2}{3,4},{5,6},{7,8}I fixed situation when array has not enough elements.This no longer applies due to conflicts in src/pl/plpgsql/src/pl_exec.c caused by e524cbdc45ec6d677b1dd49Also, what is the relationship of this patch to the row_to_array patch? Are they independent, or does one depend on the other? row_to_array by itself applies but doesn't compile.Cheers,Jeff
Attachment
Pavel Stehule <pavel.stehule@gmail.com> writes: > here is rebased patch. > It contains both patches - row_to_array function and foreach array support. While I don't have a problem with hstore_to_array, I don't think that row_to_array is a very good idea; it's basically encouraging people to throw away SQL datatypes altogether and imagine that everything is text. They've already bought into that concept if they are using hstore or json, so smashing elements of those containers to text is not a problem. But that doesn't make this version a good thing. (In any case, those who insist can get there through row_to_json, no?) Also, could we please *not* mix up these two very independent features? "foreach array" as implemented here may or may not be a good thing, but it should get its own discussion. regards, tom lane
Pavel Stehule <pavel.stehule@gmail.com> writes:
> here is rebased patch.
> It contains both patches - row_to_array function and foreach array support.
While I don't have a problem with hstore_to_array, I don't think that
row_to_array is a very good idea; it's basically encouraging people to
throw away SQL datatypes altogether and imagine that everything is text.
They've already bought into that concept if they are using hstore or
json, so smashing elements of those containers to text is not a problem.
But that doesn't make this version a good thing.
(In any case, those who insist can get there through row_to_json, no?)
Also, could we please *not* mix up these two very independent features?
"foreach array" as implemented here may or may not be a good thing, but
it should get its own discussion.
regards, tom lane
2015-03-29 20:27 GMT+02:00 Tom Lane <tgl@sss.pgh.pa.us>:Pavel Stehule <pavel.stehule@gmail.com> writes:
> here is rebased patch.
> It contains both patches - row_to_array function and foreach array support.
While I don't have a problem with hstore_to_array, I don't think that
row_to_array is a very good idea; it's basically encouraging people to
throw away SQL datatypes altogether and imagine that everything is text.This is complementation of ARRAY API - we have row_to_json, probably will have row_to_jsonb, row_to_hstore and "row_to_array" is relative logical. Casting to text is not fast, but on second hand - working with text arrays is fast.I know so casting to text is a problem, but if you iterate over record's fields, then you have to find common shared type due sharing plans - and text arrays can be simple solution.Now, with current possibilities I'll do full sql expression SELECT key, value FROM each(hstore(ROW)) or FOREACH ARRAY hstore_to_matrix(hstore(ROW))row_to_array(ROW) can reduce a hstore overheadany other solution based on PL/Perl or PL/Python are slower due PL engine start and due same transformation to some form of structured text.They've already bought into that concept if they are using hstore or
json, so smashing elements of those containers to text is not a problem.
But that doesn't make this version a good thing.
(In any case, those who insist can get there through row_to_json, no?)
Also, could we please *not* mix up these two very independent features?
"foreach array" as implemented here may or may not be a good thing, but
it should get its own discussion.ok, I'll send two patches.
regards, tom lane
Attachment
On Sun, Mar 29, 2015 at 1:27 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Pavel Stehule <pavel.stehule@gmail.com> writes: >> here is rebased patch. >> It contains both patches - row_to_array function and foreach array support. > > While I don't have a problem with hstore_to_array, I don't think that > row_to_array is a very good idea; it's basically encouraging people to > throw away SQL datatypes altogether and imagine that everything is text. > They've already bought into that concept if they are using hstore or > json, so smashing elements of those containers to text is not a problem. > But that doesn't make this version a good thing. > > (In any case, those who insist can get there through row_to_json, no?) You have a point. What does attached do that to_json does not do besides completely discard type information? Our json api is pretty rich and getting richer. For better or ill, we dumped all json support into the already stupendously bloated public namespace and so it's always available. merlin
On Sun, Mar 29, 2015 at 1:27 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> While I don't have a problem with hstore_to_array, I don't think that
> row_to_array is a very good idea; it's basically encouraging people to
> throw away SQL datatypes altogether and imagine that everything is text.
>
> (In any case, those who insist can get there through row_to_json, no?)
You have a point. What does attached do that to_json does not do
besides completely discard type information?
On 2 April 2015 at 01:59, Merlin Moncure <mmoncure@gmail.com> wrote: > On Sun, Mar 29, 2015 at 1:27 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Pavel Stehule <pavel.stehule@gmail.com> writes: >>> here is rebased patch. >>> It contains both patches - row_to_array function and foreach array support. >> >> While I don't have a problem with hstore_to_array, I don't think that >> row_to_array is a very good idea; it's basically encouraging people to >> throw away SQL datatypes altogether and imagine that everything is text. >> They've already bought into that concept if they are using hstore or >> json, so smashing elements of those containers to text is not a problem. >> But that doesn't make this version a good thing. >> >> (In any case, those who insist can get there through row_to_json, no?) > > You have a point. What does attached do that to_json does not do > besides completely discard type information? Our json api is pretty > rich and getting richer. For better or ill, we dumped all json > support into the already stupendously bloated public namespace and so > it's always available. I can see plenty of utility for a function like Pavel speaks of, but I'd personally rather see it as a function that returns table (colname name, coltype regtype, coltypmod integer, coltextvalue text, colordinal integer) so it can carry more complete information and there's no need to worry about foreach(array). The main use of a function that includes text representations of the values would IMO be using it from plain SQL, rather than PL/PgSQL, when faced with anonymous records. I'd find it more useful to have lvalue-expressions for dynamic access to record fields and a function to get record metadata - field names, types and typmods. Some kind of "pg_get_record_info(record) returns table(fieldname text, fieldtype regtype, fieldtypmod integer)" and a PL/PgSQL lvalue-expression for record field access like "RECORD_FIELD(therecord, fieldname)". I would _certainly_ want to be able to get the type metadata without the values. That way you could interact natively with the fields in their true types, without forcing conversion into and out of 'text', which is a known performance pain-point with PL/PgSQL. (PL/PgSQL doesn't have a VARIANT type or support for using 'anyelement', which would be the other way to solve the type flattening problem IMO). Think: DECLARE myrow record; fi record; BEGIN EXECUTE user_supplied_dynamic_query INTO myrow; FOR fi IN SELECT fieldname, fieldtype, fieldtypmod FROM pg_get_record_info(myrow) LOOP IF fi.fieldtype == 'int4'::regtype THEN RECORD_FIELD(myrow, fi.fieldname):= RECORD_FIELD(myrow, fi.fieldname) + 1; END IF; END LOOP; END; OK, so it's a stupid example - increment all int4 fields by one. It conveys the rough idea though - native use of the field types. Note that RECORD_FIELD is distinct from the existing support for EXECUTE format('SELECT $1.%I', fieldname) USING therecord; in that that approach doesn't work for all ways that a record can be produced, it's slow, it doesn't have a good way to enumerate field names, and there's no equivalent to write to the field. Current approaches for that are ghastly: http://stackoverflow.com/q/7711432/398670 . -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
I can see plenty of utility for a function like Pavel speaks of, butOn 2 April 2015 at 01:59, Merlin Moncure <mmoncure@gmail.com> wrote:
> On Sun, Mar 29, 2015 at 1:27 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Pavel Stehule <pavel.stehule@gmail.com> writes:
>>> here is rebased patch.
>>> It contains both patches - row_to_array function and foreach array support.
>>
>> While I don't have a problem with hstore_to_array, I don't think that
>> row_to_array is a very good idea; it's basically encouraging people to
>> throw away SQL datatypes altogether and imagine that everything is text.
>> They've already bought into that concept if they are using hstore or
>> json, so smashing elements of those containers to text is not a problem.
>> But that doesn't make this version a good thing.
>>
>> (In any case, those who insist can get there through row_to_json, no?)
>
> You have a point. What does attached do that to_json does not do
> besides completely discard type information? Our json api is pretty
> rich and getting richer. For better or ill, we dumped all json
> support into the already stupendously bloated public namespace and so
> it's always available.
I'd personally rather see it as a function that returns table (colname
name, coltype regtype, coltypmod integer, coltextvalue text,
colordinal integer) so it can carry more complete information and
there's no need to worry about foreach(array). The main use of a
function that includes text representations of the values would IMO be
using it from plain SQL, rather than PL/PgSQL, when faced with
anonymous records.
I'd find it more useful to have lvalue-expressions for dynamic access
to record fields and a function to get record metadata - field names,
types and typmods. Some kind of "pg_get_record_info(record) returns
table(fieldname text, fieldtype regtype, fieldtypmod integer)" and a
PL/PgSQL lvalue-expression for record field access like
"RECORD_FIELD(therecord, fieldname)". I would _certainly_ want to be
able to get the type metadata without the values.
That way you could interact natively with the fields in their true
types, without forcing conversion into and out of 'text', which is a
known performance pain-point with PL/PgSQL. (PL/PgSQL doesn't have a
VARIANT type or support for using 'anyelement', which would be the
other way to solve the type flattening problem IMO).
Think:
DECLARE
myrow record;
fi record;
BEGIN
EXECUTE user_supplied_dynamic_query INTO myrow;
FOR fi IN
SELECT fieldname, fieldtype, fieldtypmod
FROM pg_get_record_info(myrow)
LOOP
IF fi.fieldtype == 'int4'::regtype THEN
RECORD_FIELD(myrow, fi.fieldname) := RECORD_FIELD(myrow,
fi.fieldname) + 1;
END IF;
END LOOP;
END;
OK, so it's a stupid example - increment all int4 fields by one. It
conveys the rough idea though - native use of the field types.
Note that RECORD_FIELD is distinct from the existing support for
EXECUTE format('SELECT $1.%I', fieldname) USING therecord;
in that that approach doesn't work for all ways that a record can be
produced, it's slow, it doesn't have a good way to enumerate field
names, and there's no equivalent to write to the field. Current
approaches for that are ghastly:
http://stackoverflow.com/q/7711432/398670 .
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 6/22/15 2:46 AM, Pavel Stehule wrote: > > FOREACH key, val IN RECORD myrow > LOOP > IF pg_typeof(val) IN ('int4', 'double precision', 'numeric') THEN > val := val + 1; -- these variables can be mutable > -- or maybe in futore > myrow[key] := val + 1; > END IF; > END LOOP; > > What is important - "val" is automatic variable, and it can has > different type in any step. > > It is little bit strange, but impossible to solve, so we cannot to > support row[var] as right value (without immutable casting). But we can > do it with left value. Actually, you can (theoretically) solve it for the right value as well with if val is an actual type and you have operators on that type that know to search for a specific operator given the actual types that are involved. So if val is int4, val + 1 becomes int4 + int4. The problem I've run into with this is by the time you've added enough casts to make this workable you've probably created a situation where val + something is going to recurse back to itself. I've partially solved this in [1], and intend to finish it by calling back in via SPI to do the final resolution, the same way the RI triggers do. What would be a lot better is if we had better control over function and operator resolution. [1] https://github.com/decibel/variant/commit/2b99067744a405da8a325de1ebabd213106f794f#diff-8aa2db4a577ee4201d6eb9041c2a457eR846 -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Data in Trouble? Get it in Treble! http://BlueTreble.com
On 6/22/15 2:46 AM, Pavel Stehule wrote:
FOREACH key, val IN RECORD myrow
LOOP
IF pg_typeof(val) IN ('int4', 'double precision', 'numeric') THEN
val := val + 1; -- these variables can be mutable
-- or maybe in futore
myrow[key] := val + 1;
END IF;
END LOOP;
What is important - "val" is automatic variable, and it can has
different type in any step.
It is little bit strange, but impossible to solve, so we cannot to
support row[var] as right value (without immutable casting). But we can
do it with left value.
Actually, you can (theoretically) solve it for the right value as well with if val is an actual type and you have operators on that type that know to search for a specific operator given the actual types that are involved. So if val is int4, val + 1 becomes int4 + int4.
The problem I've run into with this is by the time you've added enough casts to make this workable you've probably created a situation where val + something is going to recurse back to itself. I've partially solved this in [1], and intend to finish it by calling back in via SPI to do the final resolution, the same way the RI triggers do.
What would be a lot better is if we had better control over function and operator resolution.
[1] https://github.com/decibel/variant/commit/2b99067744a405da8a325de1ebabd213106f794f#diff-8aa2db4a577ee4201d6eb9041c2a457eR846
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
On 6/23/15 9:45 AM, Pavel Stehule wrote: > > 2015-06-23 1:56 GMT+02:00 Jim Nasby <Jim.Nasby@bluetreble.com > <mailto:Jim.Nasby@bluetreble.com>>: > > On 6/22/15 2:46 AM, Pavel Stehule wrote: > > > FOREACH key, val IN RECORD myrow > LOOP > IF pg_typeof(val) IN ('int4', 'double precision', 'numeric') > THEN > val := val + 1; -- these variables can be mutable > -- or maybe in futore > myrow[key] := val + 1; > END IF; > END LOOP; > > What is important - "val" is automatic variable, and it can has > different type in any step. > > It is little bit strange, but impossible to solve, so we cannot to > support row[var] as right value (without immutable casting). But > we can > do it with left value. > > > Actually, you can (theoretically) solve it for the right value as > well with if val is an actual type and you have operators on that > type that know to search for a specific operator given the actual > types that are involved. So if val is int4, val + 1 becomes int4 + int4. > > The problem I've run into with this is by the time you've added > enough casts to make this workable you've probably created a > situation where val + something is going to recurse back to itself. > I've partially solved this in [1], and intend to finish it by > calling back in via SPI to do the final resolution, the same way the > RI triggers do. > > What would be a lot better is if we had better control over function > and operator resolution. > > [1] > https://github.com/decibel/variant/commit/2b99067744a405da8a325de1ebabd213106f794f#diff-8aa2db4a577ee4201d6eb9041c2a457eR846 > > > The solution of dynamic operators changes philosophy about 180° - and I > afraid about a performance. > > Now if I am thinking about possibilities - probably it is solvable on > right side too. It needs to solve two steps: > > 1. parametrized record reference syntax - some like SELECT $1[$] > 2. possibility to throw plan cache, if result has different type than is > expected in cache. Well, the other option is we allow for cases where we don't know in advance what the type will be. That would handle this, JSON, variant, and possibly some other scenarios. BTW, I think this relates to the desire to be able to do more OO-ish things in the database. Like "do X to all elements in this array". And to have actual classes, private members, real arrays of arrays. It seems like there's a bigger need here that's only being addressed piecemeal. :/ -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Data in Trouble? Get it in Treble! http://BlueTreble.com
On Tue, Jun 23, 2015 at 2:57 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote: > On 6/23/15 9:45 AM, Pavel Stehule wrote: >> 1. parametrized record reference syntax - some like SELECT $1[$] >> 2. possibility to throw plan cache, if result has different type than is >> expected in cache. > > > Well, the other option is we allow for cases where we don't know in advance > what the type will be. That would handle this, JSON, variant, and possibly > some other scenarios. > > BTW, I think this relates to the desire to be able to do more OO-ish things > in the database. Like "do X to all elements in this array". And to have > actual classes, private members, real arrays of arrays. It seems like > there's a bigger need here that's only being addressed piecemeal. :/ I would rephrase that to: "do X to all fields of an object". Array handling is pretty good now (minus arrays of arrays, but arrays of objects containing arrays is 'good enough' for most real world cases). We've suffered for a while now with hstore/json as a temporary container to handle operations that are not well supported by postgres's particularly strongly typed flavor SQL. The "OO" of postgres has been gradually diluting away; it's not a 'object relational' database anymore and the OO features, very much a product of the silly 90's OO hysteria, have been recast into more useful features like inheritance and/or pruned back. I don't mind having to push everything to jsonb and back for tuple manipulation and I expect that's how these types of things are going to be done moving forwards. jsonb has clearly caught a bid judging by what I'm reading in the blogosphere and will continue to accrete features things like this. merlin
On 6/23/15 9:45 AM, Pavel Stehule wrote:
2015-06-23 1:56 GMT+02:00 Jim Nasby <Jim.Nasby@bluetreble.com
<mailto:Jim.Nasby@bluetreble.com>>:
On 6/22/15 2:46 AM, Pavel Stehule wrote:
FOREACH key, val IN RECORD myrow
LOOP
IF pg_typeof(val) IN ('int4', 'double precision', 'numeric')
THEN
val := val + 1; -- these variables can be mutable
-- or maybe in futore
myrow[key] := val + 1;
END IF;
END LOOP;
What is important - "val" is automatic variable, and it can has
different type in any step.
It is little bit strange, but impossible to solve, so we cannot to
support row[var] as right value (without immutable casting). But
we can
do it with left value.
Actually, you can (theoretically) solve it for the right value as
well with if val is an actual type and you have operators on that
type that know to search for a specific operator given the actual
types that are involved. So if val is int4, val + 1 becomes int4 + int4.
The problem I've run into with this is by the time you've added
enough casts to make this workable you've probably created a
situation where val + something is going to recurse back to itself.
I've partially solved this in [1], and intend to finish it by
calling back in via SPI to do the final resolution, the same way the
RI triggers do.
What would be a lot better is if we had better control over function
and operator resolution.
[1]
https://github.com/decibel/variant/commit/2b99067744a405da8a325de1ebabd213106f794f#diff-8aa2db4a577ee4201d6eb9041c2a457eR846
The solution of dynamic operators changes philosophy about 180° - and I
afraid about a performance.
Now if I am thinking about possibilities - probably it is solvable on
right side too. It needs to solve two steps:
1. parametrized record reference syntax - some like SELECT $1[$]
2. possibility to throw plan cache, if result has different type than is
expected in cache.
Well, the other option is we allow for cases where we don't know in advance what the type will be. That would handle this, JSON, variant, and possibly some other scenarios.
BTW, I think this relates to the desire to be able to do more OO-ish things in the database. Like "do X to all elements in this array". And to have actual classes, private members, real arrays of arrays. It seems like there's a bigger need here that's only being addressed piecemeal. :/
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com
On 6/23/15 3:22 PM, Merlin Moncure wrote: > I would rephrase that to: "do X to all fields of an object". > Array handling is pretty good now (minus arrays of arrays, but arrays Except that still won't make it easy to do something to each element of an array in SQL, which I think would be nice to have. > of objects containing arrays is 'good enough' for most real world > cases). We've suffered for a while now with hstore/json as a > temporary container to handle operations that are not well supported > by postgres's particularly strongly typed flavor SQL. The "OO" of > postgres has been gradually diluting away; it's not a 'object > relational' database anymore and the OO features, very much a product > of the silly 90's OO hysteria, have been recast into more useful > features like inheritance and/or pruned back. Admittedly I've never played with an OO database, but I think our data features are pretty good [1]. Where I do think we can improve though is developing/coding things in the database. For example, I'd love to have the equivalent to a class. Perhaps that could be accomplished by allowing multiple instances of an extension. I'd also like stronger support for private objects (permissions don't really fit that bill). > I don't mind having to push everything to jsonb and back for tuple > manipulation and I expect that's how these types of things are going > to be done moving forwards. jsonb has clearly caught a bid judging by > what I'm reading in the blogosphere and will continue to accrete > features things like this. I think it's unfortunate to lose the strong typing that we have. That can be especially important for something like numbers (was it originally a float or a numeric?). But maybe JSON is good enough. [1] The one OO-ish data feature I'd like is the ability to de-reference a foreign key "pointer". So if CREATE TABLE b( a_id int REFERENCES a); then have SELECT a_id.some_field FROM b; transform to SELECT a.some_field FROM b JOIN a ...; -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Data in Trouble? Get it in Treble! http://BlueTreble.com
On 6/23/15 3:40 PM, Pavel Stehule wrote: > BTW, I think this relates to the desire to be able to do more OO-ish > things in the database. Like "do X to all elements in this array". > And to have actual classes, private members, real arrays of arrays. > It seems like there's a bigger need here that's only being addressed > piecemeal. :/ > > > I would not to open this box - and I would not to throw or redesign > almost all PostgreSQL type handling system. I am sure, so it is not > necessary. PL can be relative static if the dynamic is covered by query > language. The few features can implemented without to necessity to > redesign all. Still there are other PL - and we have not force to design > new Perl, JavaScript, ... By that argument why are we putting it into plpgsql either? You can easily do the stuff we've been talking about in plperl (and presumably most other pl's). So why mess around with adding it to plpgsql? More importantly, these are things that would be extremely useful at the SQL level. When it comes to records for example, we frequently know exactly what's in them, so why do we force users to statically specify that at the SQL level? This is why we don't support pivot tables (which in the BI world is a Big Deal). I think it's a mistake to try and solve this strictly through plpgsql without recognizing the larger desire and trying to move the ball that direction. I'm not saying a first effort should boil the ocean, but if we keep piecemealing this without more though we're going to keep getting more warts (like a lot of the gotchas we have with arrays). -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Data in Trouble? Get it in Treble! http://BlueTreble.com
On Tue, Jun 23, 2015 at 3:45 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote: > On 6/23/15 3:22 PM, Merlin Moncure wrote: >> >> I would rephrase that to: "do X to all fields of an object". >> Array handling is pretty good now (minus arrays of arrays, but arrays > > > Except that still won't make it easy to do something to each element of an > array in SQL, which I think would be nice to have. Maybe, or maybe we're framing the problem incorrectly. To me, it's not really all that difficult to do: select foo(x) from unnest(bar) x; Unless you have to maintain state inside of foo(), in which case I'd probably using the highly underutilized 'window over custom aggregate' technique. merlin