Thread: BLOB updates -> database size explodes
I'm having some issue with BLOB updates (via ECPG). The total blobs size should be ~280MB, but after partially updating all of them for 150 times the size on disk grows up from 184MB to 18GB. In more details: There are 608 blobs of size 460800 bytes. All blobs are updated piecewise in 150 repetitions; so first all blobs are updated in bytes 0 - 3071, then 3072 - 6143, etc. In the end on the disk the database is 18GB. Computing the size of pg_largeobject gives me 267MB: SELECT pg_size_pretty(count(loid) * 2048) FROM pg_largeobject; pg_size_pretty ---------------- 267 MB On the other hand, this gives me 18GB, and du -sh on the disk also reports 18.4GB: SELECT tablename, pg_size_pretty(size) AS size_pretty, pg_size_pretty(total_size) AS total_size_pretty FROM (SELECT *, pg_relation_size(schemaname||'.'||tablename) AS size, pg_total_relation_size(schemaname||'.'||tablename) AS total_size FROM pg_tables) AS TABLES WHERE TABLES.tablename = 'pg_largeobject' ORDER BY total_size DESC; tablename | size_pretty | total_size_pretty ----------------+-------------+------------------- pg_largeobject | 18 GB | 18 GB Doing these updates takes 85 minutes on quad-core i7 with 6GB RAM and SSD hard disk. This is PostgreSQL 8.4.12 on Debian 64 bit. Anyone knows what's going on here?
This is just a guess (I haven't dug into the low-level page/disk access Postgres code for Large Objects yet but if I'm right, the LO-based project I'm working on will likely face the same issues you're seeing), but LOs enjoy transactional behavior just like anything else (as far as I can tell from my testing) and so are subject to MVCC effects. Since LOs are opaque to Postgres and it can't infer anything about their structure, even flipping a single bit in a LO causes whatever page that bit maps to be marked invalid (as if the page corresponded exactly to one row in a normal table), and the page copied to a new one along with your change(s).
If this hypothesis is correct, doing a vacuum should free up dead pages and your size expectations should be more accurate. And if that's the case putting more intelligence into the application could mitigate some of the update growth (predicting what page temporally similar updates will go to and grouping them into a single transaction, for instance).
On Tue, May 28, 2013 at 2:53 PM, Dimitar Misev <dimitarmisev@gmail.com> wrote:
I'm having some issue with BLOB updates (via ECPG). The total blobs size should be ~280MB, but after partially updating all of them for 150 times the size on disk grows up from 184MB to 18GB.
In more details:
There are 608 blobs of size 460800 bytes. All blobs are updated piecewise in 150 repetitions; so first all blobs are updated in bytes 0 - 3071, then 3072 - 6143, etc. In the end on the disk the database is 18GB.
Computing the size of pg_largeobject gives me 267MB:
SELECT pg_size_pretty(count(loid) * 2048) FROM pg_largeobject;
pg_size_pretty
----------------
267 MB
On the other hand, this gives me 18GB, and du -sh on the disk also reports 18.4GB:
SELECT tablename,
pg_size_pretty(size) AS size_pretty,
pg_size_pretty(total_size) AS total_size_pretty
FROM (SELECT *, pg_relation_size(schemaname||'.'||tablename) AS size,
pg_total_relation_size(schemaname||'.'||tablename)
AS total_size
FROM pg_tables) AS TABLES
WHERE TABLES.tablename = 'pg_largeobject'
ORDER BY total_size DESC;
tablename | size_pretty | total_size_pretty
----------------+-------------+-------------------
pg_largeobject | 18 GB | 18 GB
Doing these updates takes 85 minutes on quad-core i7 with 6GB RAM and SSD hard disk. This is PostgreSQL 8.4.12 on Debian 64 bit.
Anyone knows what's going on here?
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On Thu, May 30, 2013 at 12:49 AM, Stephen Scheck <singularsyntax@gmail.com> wrote: > If this hypothesis is correct, doing a vacuum should free up dead pages and > your size expectations should be more accurate. And if that's the case > putting more intelligence into the application could mitigate some of the > update growth (predicting what page temporally similar updates will go to > and grouping them into a single transaction, for instance). > Seems correct to me, according to this http://www.postgresql.org/docs/current/static/lo.html I would give a try to vacuumlo to see if the size decreases, in such case that is the right explaination. Luca