Home > mailing lists

[PATCH] DISTINCT in plain aggregate window functions - Mailing list pgsql-hackers

From	Haibo Yan
Subject	[PATCH] DISTINCT in plain aggregate window functions
Date	April 8 08:31:39
Msg-id	CABXr29H2X+HtaPw-R3EheZUgv9fM7nSBjQCCaWCRv62mDYdM3w@mail.gmail.com Whole thread Raw
List	pgsql-hackers

Tree view

Hi Hackers

I’d like to start a patch series to add support for DISTINCT in plain aggregate window functions.

PostgreSQL currently rejects cases such as:

---------------------------------------------------------------------------------------------------------

count(DISTINCT x) OVER (PARTITION BY p)

sum(DISTINCT x) OVER ()

---------------------------------------------------------------------------------------------------------

My plan is to implement this incrementally, by frame class and by feature dimension, rather than trying to solve every case in a single patch.

For the first step, I’m posting patches 1-2 only and would appreciate your review on those.

Patch 1 is intentionally very small:

add parse/deparse plumbing for DISTINCT in plain aggregate window functions
carry the information through WindowFunc
preserve it in ruleutils / deparse
but still reject execution

Patch 1 by itself does not add user-visible execution support, so I think it is best reviewed together with patch 2.

Patch 2 adds the first real executor support:

plain aggregate window functions only
single-argument DISTINCT only
whole-partition frames only

That means support for cases where the frame is effectively the entire partition, for example:

---------------------------------------------------------------------------------------------------------

count(DISTINCT x) OVER (PARTITION BY p)
sum(DISTINCT x) OVER ()
avg(DISTINCT x) OVER (
PARTITION BY p
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
)

---------------------------------------------------------------------------------------------------------

The executor approach in patch 2 is deliberately conservative:

collect the partition’s aggregate inputs
sort and deduplicate them
feed the distinct values into the aggregate transition function
finalize once
reuse the cached result for all rows in the partition

This avoids the much harder moving-frame cases for now.

My proposed overall roadmap is below:

Patch 1

parse/deparse plumbing only
allow DISTINCT to be represented on plain aggregate window functions
preserve it through deparse / view definition
still reject execution

Patch 2

executor support for whole-partition frames
plain aggregate window functions only
single-argument DISTINCT only
sort-and-dedup implementation

Patch 3

executor support for non-shrinking frames
frames starting at UNBOUNDED PRECEDING with no EXCLUDE
incremental hash-based seen-set
covers default ORDER BY frame and supported ... CURRENT ROW / ... FOLLOWING cases

Patch 4

executor support for sliding ROWS frames
refcounted DISTINCT state
add/remove distinct contributions as rows enter and leave the frame
fallback to restart/recompute for aggregates without inverse transition support

Patch 5

extend the sliding DISTINCT machinery to sliding RANGE and GROUPS
keep the same refcounted model
no EXCLUDE yet

Patch 6

support EXCLUDE clauses
likely correctness-first, with restart/recompute where incremental maintenance is too awkward

Patch 7

support multi-argument DISTINCT
upgrade DISTINCT keys from single datum to tuple/composite key representation

Patch 8

support aggregate ORDER BY inside window aggregates
left until last because it is orthogonal to frame-shape support and substantially complicates both parse representation and executor behavior

In short, the roadmap is:

plumbing
whole-partition
non-shrinking
sliding ROWS
sliding RANGE / GROUPS
EXCLUDE
multi-arg DISTINCT
aggregate ORDER BY

For this posting, I’d especially appreciate feedback on:

whether patch 1 + patch 2 is a reasonable first split
whether whole-partition-only executor support is a good first executable step
whether the proposed long-term breakdown seems sensible

Thanks in advance for any review or comments.

Best regards,

Haibo Yan

Attachment

pgsql-hackers by date:

From: Amit Kapila
Date: 08 April, 08:30:58
Subject: Re: [Proposal] Adding Log File Capability to pg_createsubscriber

From: Tom Lane
Date: 08 April, 08:32:15
Subject: Re: Do we still need MULE_INTERNAL?

[PATCH] DISTINCT in plain aggregate window functions - Mailing list pgsql-hackers

Attachment

Previous

Next