Re: Partial aggregates pushdown - Mailing list pgsql-hackers

From Jelte Fennema-Nio
Subject Re: Partial aggregates pushdown
Date
Msg-id CAGECzQSXXKVCQ13jHXtmk=PMsJHhFGb78PoT3JiT4eAcoLXxoA@mail.gmail.com
Whole thread Raw
In response to RE: Partial aggregates pushdown  ("Fujii.Yuki@df.MitsubishiElectric.co.jp" <Fujii.Yuki@df.MitsubishiElectric.co.jp>)
Responses RE: Partial aggregates pushdown
List pgsql-hackers
On Wed, 12 Jun 2024 at 07:27, Fujii.Yuki@df.MitsubishiElectric.co.jp
<Fujii.Yuki@df.mitsubishielectric.co.jp> wrote:
> Could you please clarify what you mean?
> Are you referring to:
>   Option 1: Modifying existing aggregate functions to minimize the use of internal state values.
>   Option 2: Not supporting the push down of partial aggregates for functions with internal state values.

Basically I mean both Option 1 and Option 2 together. i.e. once we do
option 1, supporting partial aggregate pushdown for all important
aggregates with internal state values, then supporting pushdown of
internal state values becomes unnecessary.

> There are many aggregate functions with internal state values, so if we go with Option 1,
> we might need to change a lot of existing code, like transition functions and finalize functions.
> Also, I'm not sure how many existing aggregate functions can be modified this way.

There are indeed 57 aggregate functions with internal state values:

> SELECT count(*) from pg_aggregate where aggtranstype = 'internal'::regtype;
 count
───────
    57
(1 row)

But there are only 26 different aggtransfns. And the most used 8 of
those cover 39 of those 57 aggregates.

> SELECT
    sum (count) OVER(ORDER BY count desc, aggtransfn ROWS BETWEEN
unbounded preceding and current row) AS cumulative_count
    , *
FROM (
    SELECT
        count(*),
        aggtransfn
    from pg_aggregate
    where aggtranstype = 'internal'::regtype
    group by aggtransfn
    order by count(*) desc, aggtransfn
);
 cumulative_count │ count │               aggtransfn
──────────────────┼───────┼────────────────────────────────────────
                7 │     7 │ ordered_set_transition
               13 │     6 │ numeric_accum
               19 │     6 │ int2_accum
               25 │     6 │ int4_accum
               31 │     6 │ int8_accum
               35 │     4 │ ordered_set_transition_multi
               37 │     2 │ int8_avg_accum
               39 │     2 │ numeric_avg_accum
               40 │     1 │ array_agg_transfn
               41 │     1 │ json_agg_transfn
               42 │     1 │ json_object_agg_transfn
               43 │     1 │ jsonb_agg_transfn
               44 │     1 │ jsonb_object_agg_transfn
               45 │     1 │ string_agg_transfn
               46 │     1 │ bytea_string_agg_transfn
               47 │     1 │ array_agg_array_transfn
               48 │     1 │ range_agg_transfn
               49 │     1 │ multirange_agg_transfn
               50 │     1 │ json_agg_strict_transfn
               51 │     1 │ json_object_agg_strict_transfn
               52 │     1 │ json_object_agg_unique_transfn
               53 │     1 │ json_object_agg_unique_strict_transfn
               54 │     1 │ jsonb_agg_strict_transfn
               55 │     1 │ jsonb_object_agg_strict_transfn
               56 │     1 │ jsonb_object_agg_unique_transfn
               57 │     1 │ jsonb_object_agg_unique_strict_transfn
(26 rows)


And actually most of those don't have a serialfn, so they wouldn't be
supported by your suggested approach either. Looking at the
distribution of aggserialfns instead we see the following:

> SELECT
    sum (count) OVER(ORDER BY count desc, aggserialfn ROWS BETWEEN
unbounded preceding and current row) AS cumulative_count
    , *
FROM (
    SELECT
        count(*),
        aggserialfn
    from pg_aggregate
    where
        aggtranstype = 'internal'::regtype
        AND aggserialfn != 0
    group by aggserialfn
    order by count(*) desc, aggserialfn
);
 cumulative_count │ count │        aggserialfn
──────────────────┼───────┼───────────────────────────
               12 │    12 │ numeric_serialize
               24 │    12 │ numeric_poly_serialize
               26 │     2 │ numeric_avg_serialize
               28 │     2 │ int8_avg_serialize
               30 │     2 │ string_agg_serialize
               31 │     1 │ array_agg_serialize
               32 │     1 │ array_agg_array_serialize
(7 rows)

So there are only 7 aggserialfns, and thus at most 7 new postgres
types that you would need to create to support the same aggregates as
in your current proposal. But looking at the implementations of these
serialize functions even that is an over-estimation: numeric_serialize
and numeric_avg_serialize both serialize a NumericAggState, and
numeric_poly_serialize and int8_avg_serialize both serialize a
PolyNumAggState. So probably a we could even do with only 5 types. And
to be clear: only converting PolyNumAggState and NumericAggState to
actual postgres types would already cover 28 out of the 32 aggregates.
That seems quite feasible to do.

So I agree it's probably more code than your current approach. At the
very least because you would need to implement in/out text
serialization functions for these internal types that currently don't
have them. But I do think it would be quite a feasible amount. And to
clarify, I see a few benefits of using the approach that I'm
proposing:

1. So far aggserialfn and aggdeserialfn haven't been required to be
network safe at all. In theory extensions might reference shared
memory pointers with in them, that are valid for serialization within
the same postgres process tree but not outside of it. Or they might
serialize to bytes in a way that does not work across different
bigendian/littleendian systems, thus causing wrong aggregation
results. Never sending results of serialfn over the network solves
that issue.
2. Partial aggregate pushdown across different postgres version could
be made to work by using the in/out functions instead of receive/send
functions, to use the text based serialization format (which should be
stable across versions)
3. It seems nice to be able to get the text representation of all
PARTIAL_AGGREGATE output for debugging purposes. With your approach I
think what currently happens is that it will show a bytea for when
using PARTIAL_AGGREGATE for avg(bigint) directly from psql.
4. In my experience it's easier to get patches merged if they don't
change a lot at once and are useful by themselves. This way you could
split your current patch up into multiple smaller patches, each of
which could be merged separately (appart from b relying on a).
a. Introduce PARTIAL_AGGREGATE syntax for non-internal & non-pseudo types
b. Start using PARTIAL_AGGREGATE for FDW pushdown
c. Convert NumericAggState to non-internal
d. Convert PolyNumAggState to non-internal
e. Use non-internal for string_agg_serialize aggregates
f. Use non-internal for array_agg_serialize
g. Use non-internal for array_agg_array_serialize


P.S. The problem described in benefit 1 could also be solved in your
approach by adding a boolean opt in flag to CREATE AGGREGATE. e.g.
CREATE AGGREGATE ( ..., NETWORKSAFESERIALFUNCS = true)



pgsql-hackers by date:

Previous
From: Alexander Kukushkin
Date:
Subject: Re: Addressing SECURITY DEFINER Function Vulnerabilities in PostgreSQL Extensions
Next
From: Kartyshov Ivan
Date:
Subject: Re: [HACKERS] make async slave to wait for lsn to be replayed