"Tim Perdue" <archiver@db.geocrawler.com> writes:
> serial_half is a 1-column list of 10-digit
> numbers. I'm doing a select distinct because I
> believe there may be duplicates in that column.
> The misunderstanding on my end came because
> serial_half was a 60MB text file, but when it was
> inserted into postgres, it became 345MB (6.8
> million rows has a lot of bloat apparently).
The overhead per tuple is forty-something bytes, IIRC. So when the only
useful data in a tuple is an int, the expansion factor is unpleasantly
large. Little to be done about it though. All the overhead fields
appear to be necessary if you want proper transaction semantics.
> So the temp-sort space for 345MB could easily surpass the 1GB I had on
> my hard disk.
Yes, the merge algorithm used up through 6.5.* seems to have typical
space usage of about 4X the actual data volume. I'm trying to reduce
this to just 1X for 7.0, although some folks are complaining that the
result is slower than before :-(.
> Actually what was stated is that it is retarded to fill up a hard disk
> and then hang instead of bowing out gracefully,
Yup, that was a bug --- failure to check for write errors on the sort
temp files. I believe it's fixed in current sources too.
regards, tom lane