Re: PoC: Duplicate Tuple Elidation during External Sort for DISTINCT - Mailing list pgsql-hackers

From Jon Nelson
Subject Re: PoC: Duplicate Tuple Elidation during External Sort for DISTINCT
Date
Msg-id CAKuK5J2R44SwGPyKJtrDZfGbWZ44CM1HHvhfzJP_ngz3MGdNWg@mail.gmail.com
Whole thread Raw
In response to Re: PoC: Duplicate Tuple Elidation during External Sort for DISTINCT  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Tue, Jan 21, 2014 at 9:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Jon Nelson <jnelson+pgsql@jamponi.net> writes:
>> A rough summary of the patch follows:
>
>> - a GUC variable enables or disables this capability
>> - in nodeAgg.c, eliding duplicate tuples is enabled if the number of
>>   distinct columns is equal to the number of sort columns (and both are
>>   greater than zero).
>> - in createplan.c, eliding duplicate tuples is enabled if we are
>>   creating a unique plan which involves sorting first
>> - ditto planner.c
>> - all of the remaining changes are in tuplesort.c, which consist of:
>>   + a new macro, DISCARDTUP and a new structure member, discardtup, are
>>     both defined and operate similar to COMPARETUP, COPYTUP, etc...
>>   + in puttuple_common, when state is TSS_BUILDRUNS, we *may* simply
>>     throw out the new tuple if it compares as identical to the tuple at
>>     the top of the heap. Since we're already performing this comparison,
>>     this is essentially free.
>>   + in mergeonerun, we may discard a tuple if it compares as identical
>>     to the *last written tuple*. This is a comparison that did not take
>>     place before, so it's not free, but it saves a write I/O.
>>   + We perform the same logic in dumptuples
>
> [ raised eyebrow ... ]  And what happens if the planner drops the
> unique step and then the sort doesn't actually go to disk?

I'm not familiar enough with the code to be able to answer your
question with any sort of authority, but I believe that if the state
TSS_BUILDRUNS is never hit, then basically nothing new happens.

-- 
Jon



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Why conf.d should be default, and auto.conf and recovery.conf should be in it
Next
From: Kyotaro HORIGUCHI
Date:
Subject: Re: Funny representation in pg_stat_statements.query.