Jon Nelson <jnelson+pgsql@jamponi.net> writes:
> A rough summary of the patch follows:
> - a GUC variable enables or disables this capability
> - in nodeAgg.c, eliding duplicate tuples is enabled if the number of
> distinct columns is equal to the number of sort columns (and both are
> greater than zero).
> - in createplan.c, eliding duplicate tuples is enabled if we are
> creating a unique plan which involves sorting first
> - ditto planner.c
> - all of the remaining changes are in tuplesort.c, which consist of:
> + a new macro, DISCARDTUP and a new structure member, discardtup, are
> both defined and operate similar to COMPARETUP, COPYTUP, etc...
> + in puttuple_common, when state is TSS_BUILDRUNS, we *may* simply
> throw out the new tuple if it compares as identical to the tuple at
> the top of the heap. Since we're already performing this comparison,
> this is essentially free.
> + in mergeonerun, we may discard a tuple if it compares as identical
> to the *last written tuple*. This is a comparison that did not take
> place before, so it's not free, but it saves a write I/O.
> + We perform the same logic in dumptuples
[ raised eyebrow ... ] And what happens if the planner drops the
unique step and then the sort doesn't actually go to disk?
regards, tom lane