Re: [PATCH] DISTINCT in plain aggregate window functions - Mailing list pgsql-hackers

From Haibo Yan
Subject Re: [PATCH] DISTINCT in plain aggregate window functions
Date
Msg-id CABXr29EW_+XRD4dTcEN4+=K78OvetMWMqF0qgSv_JwG6D4Gu3g@mail.gmail.com
Whole thread
In response to [PATCH] DISTINCT in plain aggregate window functions  (Haibo Yan <tristan.yim@gmail.com>)
List pgsql-hackers
On Tue, Apr 7, 2026 at 10:31 PM Haibo Yan <tristan.yim@gmail.com> wrote:

Hi Hackers

I’d like to start a patch series to add support for DISTINCT in plain aggregate window functions.

PostgreSQL currently rejects cases such as:

---------------------------------------------------------------------------------------------------------

count(DISTINCT x) OVER (PARTITION BY p)

sum(DISTINCT x)   OVER ()

---------------------------------------------------------------------------------------------------------

My plan is to implement this incrementally, by frame class and by feature dimension, rather than trying to solve every case in a single patch.

For the first step, I’m posting patches 1-2 only and would appreciate your review on those.

Patch 1 is intentionally very small:

  • add parse/deparse plumbing for DISTINCT in plain aggregate window functions
  • carry the information through WindowFunc
  • preserve it in ruleutils / deparse
  • but still reject execution

Patch 1 by itself does not add user-visible execution support, so I think it is best reviewed together with patch 2.

Patch 2 adds the first real executor support:

  • plain aggregate window functions only
  • single-argument DISTINCT only
  • whole-partition frames only

That means support for cases where the frame is effectively the entire partition, for example:

---------------------------------------------------------------------------------------------------------

count(DISTINCT x) OVER (PARTITION BY p)
sum(DISTINCT x)   OVER ()
avg(DISTINCT x)   OVER (
    PARTITION BY p
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
)

---------------------------------------------------------------------------------------------------------

The executor approach in patch 2 is deliberately conservative:

  • collect the partition’s aggregate inputs
  • sort and deduplicate them
  • feed the distinct values into the aggregate transition function
  • finalize once
  • reuse the cached result for all rows in the partition

This avoids the much harder moving-frame cases for now.

My proposed overall roadmap is below:

Patch 1

  • parse/deparse plumbing only
  • allow DISTINCT to be represented on plain aggregate window functions
  • preserve it through deparse / view definition
  • still reject execution

Patch 2

  • executor support for whole-partition frames
  • plain aggregate window functions only
  • single-argument DISTINCT only
  • sort-and-dedup implementation

Patch 3

  • executor support for non-shrinking frames
  • frames starting at UNBOUNDED PRECEDING with no EXCLUDE
  • incremental hash-based seen-set
  • covers default ORDER BY frame and supported ... CURRENT ROW / ... FOLLOWING cases

Patch 4

  • executor support for sliding ROWS frames
  • refcounted DISTINCT state
  • add/remove distinct contributions as rows enter and leave the frame
  • fallback to restart/recompute for aggregates without inverse transition support

Patch 5

  • extend the sliding DISTINCT machinery to sliding RANGE and GROUPS
  • keep the same refcounted model
  • no EXCLUDE yet

Patch 6

  • support EXCLUDE clauses
  • likely correctness-first, with restart/recompute where incremental maintenance is too awkward

Patch 7

  • support multi-argument DISTINCT
  • upgrade DISTINCT keys from single datum to tuple/composite key representation

Patch 8

  • support aggregate ORDER BY inside window aggregates
  • left until last because it is orthogonal to frame-shape support and substantially complicates both parse representation and executor behavior

In short, the roadmap is:

  1. plumbing
  2. whole-partition
  3. non-shrinking
  4. sliding ROWS
  5. sliding RANGE / GROUPS
  6. EXCLUDE
  7. multi-arg DISTINCT
  8. aggregate ORDER BY

For this posting, I’d especially appreciate feedback on:

  • whether patch 1 + patch 2 is a reasonable first split
  • whether whole-partition-only executor support is a good first executable step
  • whether the proposed long-term breakdown seems sensible

Thanks in advance for any review or comments.

Best regards,

Haibo Yan


I’ve managed to finish the first sub-series adding initial support for
DISTINCT in plain aggregate window functions.

Patch 1 teaches the parser and deparser to accept DISTINCT in plain
window aggregates. This is representation-only and does not change
execution yet.

Patch 2 adds executor support for the simplest case, whole-partition
frames, using a sort-and-deduplicate path.

Patch 3 extends that support to non-shrinking frames, where rows only
enter the frame, by using an incremental hash-based seen-set instead of
restarting the aggregate for each row.

Please review.

Regards,

Haibo 
Attachment

pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: Experimenting with wider Unicode storage
Next
From: Michael Paquier
Date:
Subject: Re: Fix the error hint message and test for reset_shared with iso-8859-1 stats type