Tom Lane wrote:
> Richard Huxton <dev@archonet.com> writes:
>> Anyone got anything more elegant?
>
> Seems to me that no document should have an empty dup_set. If it's not
> a match to any existing document, then immediately assign a new dup_set
> number to it.
That was my initial thought too, but it means when I actually find a
duplicate I have to decide which "direction" to renumber them in. It
also means probably keeping a summary table with counts to show which
are duplicates, since the duplicates table is now the same size as the
documents table.
-- Richard Huxton Archonet Ltd