>The problem is that the HashAgg will have to maintain a counter for
>every distinct value of trid, not just those that occur more than
>once. So if there are a huge number of one-time-only values you could
>still blow out memory (and HashAgg doesn't currently know how to spill
>to disk).
>
>
One-time-only values are in my case more probable, so it will use a lot
of counters.
>That "rows=200" estimate looks suspiciously like a default. Has this
>table been ANALYZEd recently? I'd expect the planner not to choose
>HashAgg if it has a more realistic estimate of the number of groups.
>
> regards, tom lane
>
>
I did a vacuum analyze, now the explain gives different results.
pointspp=# vacuum analyze;
VACUUM
pointspp=# explain select trid, count(*) from pptran group by trid
having count(*) > 1;
QUERY PLAN
--------------------------------------------------------------------------------
GroupAggregate (cost=9842885.29..10840821.57 rows=36288592 width=18)
Filter: (count(*) > 1)
-> Sort (cost=9842885.29..9933606.77 rows=36288592 width=18)
Sort Key: trid
-> Seq Scan on pptran (cost=0.00..1039725.92 rows=36288592
width=18)
(5 rows)
pointspp=# select trid, count(*) from pptran group by trid having
count(*) > 1;
ERROR: could not write block 661572 of temporary file: No space left on
device
HINT: Perhaps out of disk space?
I have 5.1GB of free disk space. If this is the cause, I have a
problem... or is there another way to extract (and remove) duplicate rows?