Re: Partial Mode in Aggregate Functions - Mailing list pgsql-hackers

From David G. Johnston
Subject Re: Partial Mode in Aggregate Functions
Date
Msg-id CAKFQuwYqPUYAC+9ruL-xcmoP0LEbPKp8y_g-sRnHJBvtDhRCzA@mail.gmail.com
Whole thread
In response to Re: Partial Mode in Aggregate Functions  (David Rowley <dgrowleyml@gmail.com>)
Responses Re: Partial Mode in Aggregate Functions
List pgsql-hackers
On Wed, Feb 25, 2026 at 9:57 PM David Rowley <dgrowleyml@gmail.com> wrote:
On Thu, 26 Feb 2026 at 02:00, Marcos Pegoraro <marcos@f10.com.br> wrote:
> Perhaps this way we will have a better understanding of this case.

I'm not sure how you thought partial mode was related to FILTER. Did
you read something in the documentation that led you to believe
they're somehow related? I imagine you just wrongly assumed that and
that's why you're confused.  FILTER is implemented by filtering rows
according to the FILTER's WHERE clause before the aggregate's
transition function is called. That does not require any functionality
provided by the partial aggregate code.

As for the proposed patch, I'm strongly against it. If people are
confused about what partial aggregation is, then let's modify the
documentation to explain what it means.

Currently, [1] says:

"Aggregate functions that support Partial Mode are eligible to
participate in various optimizations, such as parallel aggregation.".

We could replace that with something like:

"Partial Mode allows input values which belong to the same logical
group to be aggregated separately and later combined to form a single
aggregate state per group.  These aggregate states must then be
finalized, which will produce a result equivalent to if all input
values had been aggregated together.  Parallel aggregation uses this
so that each parallel worker can aggregate a subset of input values
and form an aggregate state per group. We say these aggregate states
are "partial" as other parallel workers may have aggregated input
values which logically belong to the same group. In the leader
process, the partial aggregate states generated by the parallel
workers are combined to form a single aggregate state per logical
group.  The leader finalizes these aggregate states to produce the
final result."


I commented about the phrasing for this being not ideal for the target audience of the functions reference page.  Here's some other wording to consider:

--This first paragraph covers the same material, and then some, just a bit differently:

"Partial Mode communicates that the computation of an aggregate value
can possibly be done piecemeal - where multiple intermediate computations
are performed on a subset of the data which are then combined into a
final aggregate value.  Parallel aggregation uses this feature
to assign each parallel worker (and optionally the leader) to aggregate its subset
of input values.  The leader then accepts all these parital aggregations
and computes the final aggregate value for the row.  For partitioned tables,
each partition's data is aggregated individually and then finalized into the
combined value for the entire partitioned table."

--This paragraph explains why the reader should care:

"This optimization mode can only be used if applying the aggregate function
to the output (including backend state, not just the scalar result) of the
partial aggregates is guaranteed to produce the same outcome as applying the
function to all the original inputs.  This is not the case if the order
of those original inputs is important.  The benefit of enabling parallelism
is obvious, but even in the non-parallel partitioned table scenario, not having
to push the entire input set into memory at once to compute the aggregate
usually compensates for the extra handful of processing steps needed during
finalizing."


I am a bit unconfident regarding the second paragraph in terms of correctness and adding all this here versus possibly elsewhere.  Especially since it isn't all that actionable.

Are there any special considerations for this as it pertains to executing these functions in a window context, versus a normal group by, that should be mentioned here (or the possible elsewhere)?

David J.

pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: POC: Parallel processing of indexes in autovacuum
Next
From: jian he
Date:
Subject: Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row