Inefficient handling of LO-restore + Patch - Mailing list pgsql-hackers

From Mario Weilguni
Subject Inefficient handling of LO-restore + Patch
Date
Msg-id D143FBF049570C4BB99D962DC25FC2D204B858@freedom.icomedias.com
Whole thread Raw
Responses Re: Inefficient handling of LO-restore + Patch
List pgsql-hackers
I've detected that the restoring of large objects may consume huge amounts of
diskspace when using unusual blocksizes (e.g. 32KB). My setup is Postgresql-7.2.1
+ 32KB blocks + LOBLKSIZE 16KB, a unusual combination I think, , because this setup gave
the very best performance. I wanted to restore a  database containing 2 gigabytes of
large objects, and noticed that it took around 6 gigabytes of diskspace to finish.
After it finished I ran "VACUUM FULL VERBOSE pg_largeobject",
and had around 140000 of live tuples, and around 480000 of dead tuples (I don't remember the exact
values, but I think there were 3 times dead tuples to live tuples).

I checked the pg_dump sources and found out that data is writen in 4KB chunks to the large object.
Since in my database the LO tuples are 16KB each, that would mean:
1. write 4KB -> have 1 live 4KB tuple
2. write 4KB -> 1 live 8KB tuple and 1 dead 4KB tuple
3. write 4KB -> 1 live 12KB tuple and 2 dead tuples
3. write 4KB -> 1 live 16KB tuple and 3 dead tuples

So creating a 16KB chunk took 16+12+8+4 => 40KB of diskspace, so recovering 2GB large objects
takes around 40/16 * 2 => 5GB diskspace and leaves 3 times the number of dead tuples (supposing
all LO's have sizes which are multples of 16KB).

I've written a patch which buffers LO's in 32KB blocks and tested again, and had 140000 live tuples
and nearly no dead tuples (around 10000, I'm still not sure where they're coming from).

Is there a better way to fix this? Can I post the patch to this list (~150 lines).And I did not find out how I can
detectthe large object chunksize, either from getting it from the headers (include "storage/large_object.h" did not
work)or how to get it from the database I restore to. Any hints? 

Best regards,       Mario Weilguni




pgsql-hackers by date:

Previous
From: "Nicolas Bazin"
Date:
Subject: Re: 7.3 schedule
Next
From: "Christopher Kings-Lynne"
Date:
Subject: Make text output more generic