On 22/01/14 03:16, Jon Nelson wrote:
> Greetings -hackers:
>
> I have worked up a patch to PostgreSQL which elides tuples during an
> external sort. The primary use case is when sorted input is being used
> to feed a DISTINCT operation. The idea is to throw out tuples that
> compare as identical whenever it's convenient, predicated on the
> assumption that even a single I/O is more expensive than some number
> of (potentially extra) comparisons. Obviously, this is where a cost
> model comes in, which has not been implemented. This patch is a
> work-in-progress.
Dedup-in-sort is also done by my WIP internal merge sort, and
extended (in much the same ways as Jon's) to the external merge.
https://github.com/j47996/pgsql_sorb
I've not done a cost model either, but the dedup capability is
exposed from tuplesort.c to the executor, and downstream uniq
nodes removed.
I've not worked out yet how to eliminate upstream hashagg nodes,
which would be worthwhile from testing results.
--
Cheers, Jeremy