"C. Bensend" <benny@bennyvision.com> writes:
>> The above transformation is a guaranteed loser in Postgres.
> By transformation, do you mean the varchar() -> char() change?
Right. The only thing that will do for you is waste space with padding
blanks. The performance improvement you saw was due to something else,
which I theorize is not having to waste time reading dead space.
Since you obviously doubt this assertion, try the same experiment except
load the data into a new table with the *same* structure as the original
(ie, all varchar). You'll see the same or probably better performance.
> Actually, all databases on this server are vacuumed nightly, right
> before backups.
Not often enough evidently...
> INFO: --Relation public.emails--
> INFO: Pages 5793: Changed 0, Empty 0; Tup 24198: Vac 82, Keep 0, UnUsed
> 71757.
> INFO: --Relation public.emails2--
> INFO: Pages 2646: Changed 0, Empty 0; Tup 24162: Vac 0, Keep 0, UnUsed 0.
Note the difference in "pages". emails is nearly double the physical
size, even though the live data in it is doubtless smaller. (You could
use the contrib/pgstattuple functions to see exactly how much live data
there is.)
I was actually expecting to see more than a 2:1 difference in file size,
seeing that you reported more than a 2:1 difference in read time. It
could be that there is also some question of physical layout of the file
on disk. The original table probably accreted over time and was given
space that's not very consecutive on disk. The copied table was
written in one swoop, more or less, and is very likely stored in a more
nearly consecutive set of physical disk blocks. I'm not sure how you'd
investigate this theory though --- AFAIK there isn't any real easy way
to find out how badly fragmented a file is in most Unix filesystems.
regards, tom lane