On Wed, Jan 22, 2014 at 3:26 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Jeremy Harris <jgh@wizmail.org> writes:
>> On 22/01/14 03:53, Tom Lane wrote:
>>> Jon Nelson <jnelson+pgsql@jamponi.net> writes:
>>>> - in createplan.c, eliding duplicate tuples is enabled if we are
>>>> creating a unique plan which involves sorting first
>
>>> [ raised eyebrow ... ] And what happens if the planner drops the
>>> unique step and then the sort doesn't actually go to disk?
>
>> I don't think Jon was suggesting that the planner drop the unique step.
>
> Hm, OK, maybe I misread what he said there. Still, if we've told
> tuplesort to remove duplicates, why shouldn't we expect it to have
> done the job? Passing the data through a useless Unique step is
> not especially cheap.
That's correct - I do not propose to drop the unique step. Duplicates
are only dropped if it's convenient to do so. In one case, it's a
zero-cost drop (no extra comparison is made). In most other cases, an
extra comparison is made, typically right before writing a tuple to
tape. If it compares as identical to the previously-written tuple,
it's thrown out instead of being written.
The output of the modified code is still sorted, still *might* (and in
most cases, probably will) contain duplicates, but will (probably)
contain fewer duplicates.
--
Jon