Re: PoC: Duplicate Tuple Elidation during External Sort for DISTINCT - Mailing list pgsql-hackers

From Jeremy Harris
Subject Re: PoC: Duplicate Tuple Elidation during External Sort for DISTINCT
Date
Msg-id 52E03607.2020501@wizmail.org
Whole thread Raw
In response to PoC: Duplicate Tuple Elidation during External Sort for DISTINCT  (Jon Nelson <jnelson+pgsql@jamponi.net>)
List pgsql-hackers
On 22/01/14 03:16, Jon Nelson wrote:
> Greetings -hackers:
>
> I have worked up a patch to PostgreSQL which elides tuples during an
> external sort. The primary use case is when sorted input is being used
> to feed a DISTINCT operation. The idea is to throw out tuples that
> compare as identical whenever it's convenient, predicated on the
> assumption that even a single I/O is more expensive than some number
> of (potentially extra) comparisons.  Obviously, this is where a cost
> model comes in, which has not been implemented. This patch is a
> work-in-progress.


Dedup-in-sort is also done by my WIP internal merge sort, and
extended (in much the same ways as Jon's) to the external merge.
https://github.com/j47996/pgsql_sorb


I've not done a cost model either, but the dedup capability is
exposed from tuplesort.c to the executor, and downstream uniq
nodes removed.

I've not worked out yet how to eliminate upstream hashagg nodes,
which would be worthwhile from testing results.

-- 
Cheers,   Jeremy



pgsql-hackers by date:

Previous
From: Jan Kara
Date:
Subject: Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance
Next
From: Jeremy Harris
Date:
Subject: Re: PoC: Duplicate Tuple Elidation during External Sort for DISTINCT