I have a fairly large table (21M) records. One field of type varchar(16)
has some duplicate values, which I'm trying to identify.
Executing select dup_field from dup_table group by dup_field having
count(*) > 1 errs with Out of Memory error. Server has 4GB memory, the
backend-process errs after 3.7GB consumed. Is there any work-around that
I may use to get this duplicates?
Explain output:
"HashAggregate (cost=881509.02..881510.02 rows=200 width=20)"
" Filter: (count(*) > 1)"
" -> Seq Scan on lssi_base (cost=0.00..872950.68 rows=1711668
width=20)"
Why is the hash eating so much memory? A fast calc of the memory
occupied by this data is less than 512MB.
TIA,
--
Werner Bohl <WernerBohl@infutor.com>
IDS de Costa Rica S.A.