Home > mailing lists

Re: [PATCH] DISTINCT in plain aggregate window functions - Mailing list pgsql-hackers

From	Haibo Yan
Subject	Re: [PATCH] DISTINCT in plain aggregate window functions
Date	April 30 04:57:07
Msg-id	CABXr29EW_+XRD4dTcEN4+=K78OvetMWMqF0qgSv_JwG6D4Gu3g@mail.gmail.com Whole thread
In response to	[PATCH] DISTINCT in plain aggregate window functions (Haibo Yan <tristan.yim@gmail.com>)
List	pgsql-hackers

Tree view

On Tue, Apr 7, 2026 at 10:31 PM Haibo Yan <tristan.yim@gmail.com> wrote:

Hi Hackers
I’d like to start a patch series to add support for DISTINCT in plain aggregate window functions.
PostgreSQL currently rejects cases such as:
---------------------------------------------------------------------------------------------------------
count(DISTINCT x) OVER (PARTITION BY p)
sum(DISTINCT x) OVER ()
---------------------------------------------------------------------------------------------------------
My plan is to implement this incrementally, by frame class and by feature dimension, rather than trying to solve every case in a single patch.
For the first step, I’m posting patches 1-2 only and would appreciate your review on those.
Patch 1 is intentionally very small:
add parse/deparse plumbing for DISTINCT in plain aggregate window functions
carry the information through WindowFunc
preserve it in ruleutils / deparse
but still reject execution
Patch 1 by itself does not add user-visible execution support, so I think it is best reviewed together with patch 2.
Patch 2 adds the first real executor support:
plain aggregate window functions only
single-argument DISTINCT only
whole-partition frames only
That means support for cases where the frame is effectively the entire partition, for example:
---------------------------------------------------------------------------------------------------------
count(DISTINCT x) OVER (PARTITION BY p)
sum(DISTINCT x) OVER ()
avg(DISTINCT x) OVER (
PARTITION BY p
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
)
---------------------------------------------------------------------------------------------------------
The executor approach in patch 2 is deliberately conservative:
collect the partition’s aggregate inputs
sort and deduplicate them
feed the distinct values into the aggregate transition function
finalize once
reuse the cached result for all rows in the partition
This avoids the much harder moving-frame cases for now.
My proposed overall roadmap is below:
Patch 1
parse/deparse plumbing only
allow DISTINCT to be represented on plain aggregate window functions
preserve it through deparse / view definition
still reject execution
Patch 2
executor support for whole-partition frames
plain aggregate window functions only
single-argument DISTINCT only
sort-and-dedup implementation
Patch 3
executor support for non-shrinking frames
frames starting at UNBOUNDED PRECEDING with no EXCLUDE
incremental hash-based seen-set
covers default ORDER BY frame and supported ... CURRENT ROW / ... FOLLOWING cases
Patch 4
executor support for sliding ROWS frames
refcounted DISTINCT state
add/remove distinct contributions as rows enter and leave the frame
fallback to restart/recompute for aggregates without inverse transition support
Patch 5
extend the sliding DISTINCT machinery to sliding RANGE and GROUPS
keep the same refcounted model
no EXCLUDE yet
Patch 6
support EXCLUDE clauses
likely correctness-first, with restart/recompute where incremental maintenance is too awkward
Patch 7
support multi-argument DISTINCT
upgrade DISTINCT keys from single datum to tuple/composite key representation
Patch 8
support aggregate ORDER BY inside window aggregates
left until last because it is orthogonal to frame-shape support and substantially complicates both parse representation and executor behavior
In short, the roadmap is:
plumbing
whole-partition
non-shrinking
sliding ROWS
sliding RANGE / GROUPS
EXCLUDE
multi-arg DISTINCT
aggregate ORDER BY
For this posting, I’d especially appreciate feedback on:
whether patch 1 + patch 2 is a reasonable first split
whether whole-partition-only executor support is a good first executable step
whether the proposed long-term breakdown seems sensible
Thanks in advance for any review or comments.
Best regards,
Haibo Yan

I’ve managed to finish the first sub-series adding initial support for
DISTINCT in plain aggregate window functions.

Patch 1 teaches the parser and deparser to accept DISTINCT in plain
window aggregates. This is representation-only and does not change
execution yet.

Patch 2 adds executor support for the simplest case, whole-partition
frames, using a sort-and-deduplicate path.

Patch 3 extends that support to non-shrinking frames, where rows only
enter the frame, by using an incremental hash-based seen-set instead of
restarting the aggregate for each row.

Please review.

Regards,

Haibo

Attachment

pgsql-hackers by date:

From: Thomas Munro
Date: 30 April, 03:40:52
Subject: Re: Experimenting with wider Unicode storage

From: Michael Paquier
Date: 30 April, 05:33:35
Subject: Re: Fix the error hint message and test for reset_shared with iso-8859-1 stats type

Re: [PATCH] DISTINCT in plain aggregate window functions - Mailing list pgsql-hackers

Attachment

Previous

Next