Re: MAP syntax for arrays - Mailing list pgsql-hackers
From | Ashutosh Bapat |
---|---|
Subject | Re: MAP syntax for arrays |
Date | |
Msg-id | CAFjFpRek0WWatOLUJ4jbt0JWhF4_AnevwNoBza21wR7JJLka4g@mail.gmail.com Whole thread Raw |
In response to | MAP syntax for arrays (Ildar Musin <i.musin@postgrespro.ru>) |
Responses |
Re: MAP syntax for arrays
|
List | pgsql-hackers |
On Fri, May 4, 2018 at 6:38 PM, Ildar Musin <i.musin@postgrespro.ru> wrote: > Hello hackers, > > Recently I was working with sql arrays in postgres and it turned out > that postgres doesn't have such very convinient functional constructions > as map, reduce and filter. Currently to map function over array user has > to make a subquery like: > > select u.* from > my_table, > lateral ( > select array_agg(lower(elem)) > from unnest(arr) as elem > ) as u; > > Which is not only inconvenient but not very efficient as well (see > 'Demo' section below). Is there a way we can improve unnest() and array_agg() to match the performance you have specified by let's say optimizing the cases specially when those two are used together. Identifying that may be some work, but will not require introducing new syntax. > > When I dug into the code I found that postgres already has the needed > infrastructure for implementing map for arrays; actually array coercing > already works that way (it basically maps cast function). > > In the attached patch there is a simple map implementation which > introduces new expression type and syntax: > > MAP(<func_name> OVER <array_expression>) > > For example: > > SELECT MAP(upper OVER array['one', 'two', 'three']::text[]); > ?column? > ----------------- > {ONE,TWO,THREE} > (1 row) > > This is probably not the most useful notation and it would be better to > have syntax for mapping arbitrary expressions over array, not just > function. I'm struggling to come up with a good idea of how it should > look like. It could look something like following: > > MAP(<expr> FOR <placeholder> IN <array_expressin>) > > For instance: > > SELECT MAP(x*2 FOR x IN array[1, 2, 3]::int[]); > > Looking forward for community's suggestions! What if the expression has more than one variable, each mapping to a different array? What if the arrays have different lengths or worse different dimensions? This looks like the way SRFs used to work. Instead of introducing a new syntax, is it possible to detect that argument to a function is an array of the same type as the argument and apply MAP automatically? In your example, upper(arr) would detect that the input is an array of the same type as the scalar argument type and do array_agg(upper(arr[id1], arr[id2], ...). > > elements per array | map (tps) | unnest/aggregate (tps) > --------------------+------------+------------------------ > 5 | 139.105359 | 74.434010 > 10 | 74.089743 | 43.622554 > 100 | 7.693000 | 5.325805 > > Apparently map is more efficient for small arrays. And as the size of > array increases the difference decreases. I am afraid that the way difference is diminishing with increase in the number of elements, unnest, array_agg combination might win for large number of elements. Have you tried that? If we try to improve unnest, array_agg combination for small array, we will get consistent performance without any additional syntax. Although, I admit that query involving unnest and array_agg is not very readable. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
pgsql-hackers by date: