I wrote:
> Hm. Given that the message type and length seem perfectly reasonable,
> I suspect this must actually represent an out-of-memory condition within
> pg_dump (*not* on the server end). But you'd have to be running it on a
> toy box, or with a rather silly ulimit, for 6MB to be a problem...
BTW, how old is your pg_dump (or really, libpq)? I wonder if you are
hitting this bug in some form:
Author: Tom Lane <tgl@sss.pgh.pa.us>
Branch: master Release: REL9_4_BR [2f557167b] 2014-05-07 21:39:13 -0400
Branch: REL9_3_STABLE Release: REL9_3_5 [b4f9c93ce] 2014-05-07 21:38:38 -0400
Branch: REL9_2_STABLE Release: REL9_2_9 [f7672c8ce] 2014-05-07 21:38:41 -0400
Branch: REL9_1_STABLE Release: REL9_1_14 [86888054a] 2014-05-07 21:38:44 -0400
Branch: REL9_0_STABLE Release: REL9_0_18 [77e662827] 2014-05-07 21:38:47 -0400
Branch: REL8_4_STABLE Release: REL8_4_22 [664ac3de7] 2014-05-07 21:38:50 -0400
Avoid buffer bloat in libpq when server is consistently faster than client.
If the server sends a long stream of data, and the server + network are
consistently fast enough to force the recv() loop in pqReadData() to
iterate until libpq's input buffer is full, then upon processing the last
incomplete message in each bufferload we'd usually double the buffer size,
due to supposing that we didn't have enough room in the buffer to finish
collecting that message. After filling the newly-enlarged buffer, the
cycle repeats, eventually resulting in an out-of-memory situation (which
would be reported misleadingly as "lost synchronization with server").
Of course, we should not enlarge the buffer unless we still need room
after discarding already-processed messages.
This bug dates back quite a long time: pqParseInput3 has had the behavior
since perhaps 2003, getCopyDataMessage at least since commit 70066eb1a1ad
in 2008. Probably the reason it's not been isolated before is that in
common environments the recv() loop would always be faster than the server
(if on the same machine) or faster than the network (if not); or at least
it wouldn't be slower consistently enough to let the buffer ramp up to a
problematic size. The reported cases involve Windows, which perhaps has
different timing behavior than other platforms.
Per bug #7914 from Shin-ichi Morita, though this is different from his
proposed solution. Back-patch to all supported branches.
regards, tom lane