Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Now the argument against that is that it won't scale terribly well
> to situations with very large numbers of blobs. However, I'm not
> convinced that the current approach of cramming them all into one
> TOC entry scales so well either. If your large objects are
> actually large, there's not going to be an enormous number of
> them. We've heard of people with many tens of thousands of
> tables, and pg_dump speed didn't seem to be a huge bottleneck for
> them (at least not in recent versions). So I'm feeling we should
> not dismiss the idea of one TOC entry per blob.
>
> Thoughts?
We've got a "DocImage" table with about 7 million rows storing PDF
documents in a bytea column, approaching 1 TB of data. (We don't
want to give up ACID guarantees, replication, etc. by storing them
on the file system with filenames in the database.) This works
pretty well, except that client software occasionally has a tendency
to run out of RAM. The interface could arguably be cleaner if we
used BLOBs, but the security issues have precluded that in
PostgreSQL.
I suspect that 7 million BLOBs (and growing fast) would be a problem
for this approach. Of course, if we're atypical, we could stay with
bytea if this changed. Just a data point.
-Kevin
cir=> select count(*) from "DocImage"; count
---------6891626
(1 row)
cir=> select pg_size_pretty(pg_total_relation_size('"DocImage"'));pg_size_pretty
----------------956 GB
(1 row)