Re: MAP syntax for arrays - Mailing list pgsql-hackers

From Ashutosh Bapat
Subject Re: MAP syntax for arrays
Date
Msg-id CAFjFpRek0WWatOLUJ4jbt0JWhF4_AnevwNoBza21wR7JJLka4g@mail.gmail.com
Whole thread Raw
In response to MAP syntax for arrays  (Ildar Musin <i.musin@postgrespro.ru>)
Responses Re: MAP syntax for arrays
List pgsql-hackers
On Fri, May 4, 2018 at 6:38 PM, Ildar Musin <i.musin@postgrespro.ru> wrote:
> Hello hackers,
>
> Recently I was working with sql arrays in postgres and it turned out
> that postgres doesn't have such very convinient functional constructions
> as map, reduce and filter. Currently to map function over array user has
> to make a subquery like:
>
> select u.* from
>         my_table,
>         lateral (
>                 select array_agg(lower(elem))
>                 from unnest(arr) as elem
>         ) as u;
>
> Which is not only inconvenient but not very efficient as well (see
> 'Demo' section below).

Is there a way we can improve unnest() and array_agg() to match the
performance you have specified by let's say optimizing the cases
specially when those two are used together. Identifying that may be
some work, but will not require introducing new syntax.

>
> When I dug into the code I found that postgres already has the needed
> infrastructure for implementing map for arrays; actually array coercing
> already works that way (it basically maps cast function).
>
> In the attached patch there is a simple map implementation which
> introduces new expression type and syntax:
>
>         MAP(<func_name> OVER <array_expression>)
>
> For example:
>
>         SELECT MAP(upper OVER array['one', 'two', 'three']::text[]);
>             ?column?
>         -----------------
>          {ONE,TWO,THREE}
>         (1 row)
>
> This is probably not the most useful notation and it would be better to
> have syntax for mapping arbitrary expressions over array, not just
> function. I'm struggling to come up with a good idea of how it should
> look like. It could look something like following:
>
>         MAP(<expr> FOR <placeholder> IN <array_expressin>)
>
> For instance:
>
>         SELECT MAP(x*2 FOR x IN array[1, 2, 3]::int[]);
>
> Looking forward for community's suggestions!

What if the expression has more than one variable, each mapping to a
different array? What if the arrays have different lengths or worse
different dimensions? This looks like the way SRFs used to work.

Instead of introducing a new syntax, is it possible to detect that
argument to a function is an array of the same type as the argument
and apply MAP automatically? In your example, upper(arr) would detect
that the input is an array of the same type as the scalar argument
type and do array_agg(upper(arr[id1], arr[id2], ...).

>
>  elements per array |  map (tps) | unnest/aggregate (tps)
> --------------------+------------+------------------------
>                   5 | 139.105359 |       74.434010
>                  10 |  74.089743 |       43.622554
>                 100 |   7.693000 |        5.325805
>
> Apparently map is more efficient for small arrays. And as the size of
> array increases the difference decreases.

I am afraid that the way difference is diminishing with increase in
the number of elements, unnest, array_agg combination might win for
large number of elements. Have you tried that?
If we try to improve unnest, array_agg combination for small array, we
will get consistent performance without any additional syntax.
Although, I admit that query involving unnest and array_agg is not
very readable.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


pgsql-hackers by date:

Previous
From: Sergey Mirvoda
Date:
Subject: Re: citext function overloads for text parameters
Next
From: Alexander Kuzmenkov
Date:
Subject: Re: Reopen logfile on SIGHUP