Thread: BUG #14384: pg_dump uses excessive amounts of memory for LOBs
BUG #14384: pg_dump uses excessive amounts of memory for LOBs
From
boleslaw.ziobrowski@yahoo.pl
Date:
VGhlIGZvbGxvd2luZyBidWcgaGFzIGJlZW4gbG9nZ2VkIG9uIHRoZSB3ZWJz aXRlOgoKQnVnIHJlZmVyZW5jZTogICAgICAxNDM4NApMb2dnZWQgYnk6ICAg ICAgICAgIEJvbGVrIFppb2Jyb3dza2kKRW1haWwgYWRkcmVzczogICAgICBi b2xlc2xhdy56aW9icm93c2tpQHlhaG9vLnBsClBvc3RncmVTUUwgdmVyc2lv bjogOS41LjQKT3BlcmF0aW5nIHN5c3RlbTogICBVYnVudHUgMTQuMDQuNSBM VFMgCkRlc2NyaXB0aW9uOiAgICAgICAgCgpwZ19kdW1wIHNlZW1zIHRvIGFs bG9jYXRlIG1lbW9yeSBwcm9wb3J0aW9uYWwgdG8gdGhlIG51bWJlciBvZiBy b3dzIGluCnBnX2xhcmdlb2JqZWN0IChub3QgbmVjZXNzYXJpbHkgY29ycmVs YXRlZCB3aXRoIHNpemUgb2YgdGhlc2Ugb2JqZWN0cykgLAplLmcuIDQgR0Ig Zm9yIGEgZmV3IG1pbGxpb25zIGFuZCBhYm91dCAyNiBHQiBmb3IgMzAgbWls bGlvbnMgb2YgbG9icyAuICANCg0KU3RlcHMgdG8gcmVwcm9kdWNlIDoNCnBz cWwgOg0KDQpjcmVhdGUgdGFibGUgbGFyZ2Vfb2JqZWN0X3Rlc3QoIGEgaW50 LCBsb2JqIG9pZCApOw0KDQppbnNlcnQgaW50byBsYXJnZV9vYmplY3RfdGVz dCBzZWxlY3QgYS5pLGxvX2Zyb21fYnl0ZWEoMCwKRSdcXHhmZmZmZmYwMDAw MDAwMDAwJykgZnJvbSBnZW5lcmF0ZV9zZXJpZXMoMSw1MDAwMDAwKSBhcyBh KGkpIDsNCg0KY2xpOg0KDQpwZ19kdW1wIHBvc3RncmVzID4gL3RtcC9kdW1w LnNxbA0KDQp0b3AgLW8gIiVNRU0iDQoKCg==
boleslaw.ziobrowski@yahoo.pl writes: > pg_dump seems to allocate memory proportional to the number of rows in > pg_largeobject (not necessarily correlated with size of these objects) , Yes, it does. It also allocates memory proportional to the number of, eg, tables, or any other DB object for that matter. This is a consequence of the fact that blobs grew owners and privileges in 9.0. pg_dump uses its usual per-object infrastructure to keep track of that. The argument was that this'd be okay because if your large objects are, well, large, then there couldn't be so many of them that the space consumption would be fatal. I had doubts about that at the time, but I think we're more or less locked into it now. It would take a lot of restructuring to change it, and we'd lose functionality too, because we couldn't have a separate TOC entry per blob. That means no ability to select out individual blobs during pg_restore. TL;DR: blobs are not exactly lightweight objects. If you want something with less overhead, maybe you should just store the data in a plain bytea column. regards, tom lane
Tom Lane wrote: > > TL;DR: blobs are not exactly lightweight objects. If you want something > with less overhead, maybe you should just store the data in a plain > bytea column. > Yes, I tested this case with bytea/TOAST in place of LOB column and pg_dump used just a few MB . Normally (when dumping a database with hundreds of tables) it uses insignificant amount of memory so I was surprised when it allocated about 26GB while processing a 100GB database with 30*10^6 LOBs (caused swapping and almost triggered OOM killer). The 'obvious' (for database this size) and easy solution was to use pg_basebackup - and it sped up the process almost 40 times . I found a similar case here (https://www.postgresql.org/message-id/524C3163.1050502%40iqbuzz.ru) so I think it would be good to warn users about this behavior in pg_dump documentation (to save a few hours of the third person who triggers this issue in ... October 2019). Thank you, Bolek Ziobrowski