I'm developing a database for scientific recordings. These recordings
are traditionally saved as binary flat files for simplicity and
compact storage. Although I think ultimately having a database is
better than 1,000s of flat files in terms of data access, I've found
that the database (or at least my design) is pretty wasteful on
storage space compared with the binary flat files.
In particular, I tried importing all of the data from a binary flat
file that is 1.35 MB into a PostgreSQL database (a very small test
file; average production file is probably more like 100 MB). The
database directory ballooned from 4.1 MB to 92 MB (a bloat of 65X the
original storage of the binary flat file).
Now I know that table design and normalizing is important. As far as
my partner and I can tell, we've made good use of normalizing (no
redundancy), we've set field sizes to their theoretical skinniness,
and we've made use of foreign keys and views. I'm also aware that
indicies/keys and other database internals will necessarily make the
DBMS solution bloated in terms of storage space. However, a 65X bloat
in space seems excessive.
Has anyone run across similar storage concerns? I'd be interested in
knowing if I just have really poorly designed tables, or if something
else is going on here. I figure a bloat of 3-4X would be permissible
(and possibly expected). But this bloat just seems too much.
Thanks.
-Tony