Thread: Bug in backend/lib/stringinfo.c:enlargeStringInfo()
Hi, for some time a postgres process on one of our web servers repeatedly gets into an infinite loop. This happens very rarely, about once in a week. Today I installed gdb on the server to trace down the problem. I found out that the process was looping in enlargeStringInfo() in backend/lib/stringinfo.c. The call trace was #0 0x0810e490 in enlargeStringInfo () #1 0x081138e4 in pq_getmessage () #2 0x0816561b in SocketBackend () #3 0x081657bb in ReadCommand () #4 0x08167a5e in PostgresMain () #5 0x08144353 in BackendFork () #6 0x08143d33 in BackendStartup () #7 0x08142516 in ServerLoop () #8 0x08142057 in PostmasterMain () #9 0x08114a4d in main () #10 0x400e8857 in __libc_start_main () from /lib/libc.so.6 The "needed" argument to enlargeStringInfo was 0x5454502b, apparently caused by another bug, which I have yet to find. So the following loop never stops while (needed > newlen) newlen = 2 * newlen; because needed and newlen are compared as signed integers. (If "newlen" has grown to 0x40000000 it's still smaller than "needed". Multiplying by 2 overflows and yields 0x80000000, which is negative, thus still smaller than "needed". Multiplying by 2 again yields 0, ...) The numbers should be compared as unsigned ints. Or the maximum string length should be restricted. On the other hand I wonder if it's desired to even try the following memory allocation of at least a GB of RAM. The pq_getmessage() that called enlargeStringInfo() has a "maxlen" argument of 0, that seems to mean unlimited. The real cause of the problem seems to be a frontend/backend communication problem. The "needed" argument 0x5454502b comes from a 4-byte length field which string content is 'TTP/'. Looks like a part of a HTTP request to me. I'm using Apache/mod_perl/DBI to access Postgres. Can I log the frontend/backend communication somehow? Nick Wellnhofer -- aevum gmbh leopoldstr. 87 80802 münchen germany fon: +4989 38380653 fax: +4989 38799384 wellnhofer@aevum.de http://aevum.de/
Nick Wellnhofer <wellnhofer@aevum.de> writes: > I found out that the process was looping in enlargeStringInfo() in > backend/lib/stringinfo.c. This problem was reported by someone else recently. I've just applied the attached patch. > The real cause of the problem seems to be a frontend/backend > communication problem. The "needed" argument 0x5454502b comes from a > 4-byte length field which string content is 'TTP/'. Looks like a part of > a HTTP request to me. Yeah, it kinda sounds like someone is trying to send an HTTP request to the Postgres port :-( regards, tom lane *** src/backend/lib/stringinfo.c.orig Sat Nov 29 17:39:42 2003 --- src/backend/lib/stringinfo.c Tue May 11 16:00:20 2004 *************** *** 16,21 **** --- 16,22 ---- #include "postgres.h" #include "lib/stringinfo.h" + #include "utils/memutils.h" /* *************** *** 220,226 **** --- 221,240 ---- { int newlen; + /* + * Guard against ridiculous "needed" values, which can occur if we're + * fed bogus data. Without this, we can get an overflow or infinite + * loop in the following. + */ + if (needed < 0 || + ((Size) needed) >= (MaxAllocSize - (Size) str->len)) + elog(ERROR, "invalid string enlargement request size %d", + needed); + needed += str->len + 1; /* total space required now */ + + /* Because of the above test, we now have needed <= MaxAllocSize */ + if (needed <= str->maxlen) return; /* got enough space already */ *************** *** 233,238 **** --- 247,260 ---- newlen = 2 * str->maxlen; while (needed > newlen) newlen = 2 * newlen; + + /* + * Clamp to MaxAllocSize in case we went past it. Note we are assuming + * here that MaxAllocSize <= INT_MAX/2, else the above loop could + * overflow. We will still have newlen >= needed. + */ + if (newlen > (int) MaxAllocSize) + newlen = (int) MaxAllocSize; str->data = (char *) repalloc(str->data, newlen);
Nick Wellnhofer <wellnhofer@aevum.de> writes: > Tom Lane wrote: >> Yeah, it kinda sounds like someone is trying to send an HTTP request to >> the Postgres port :-( > I thought about that, but I have TCP disabled. And it's definitely > triggered by my own Perl code serving dynamic web pages. But it happens > at completely random places. Maybe it's a bug in DBD::Pg. Is there an > easy way to log the whole frontend/backend communication? Not at the low level you'd need to debug this. I'd suggest whipping out ethereal, tcpdump, or your favorite other TCP packet sniffer, and running the misbehaving client over local-TCP-loopback until you have diagnosed the problem. My original thought about 'HTTP' coming from an independent, misdirected connection request is clearly wrong, since you'd have to get past the initial connection handshake to get to ReadCommand. Instead, the HTTP must be part of the data stream being sent by a legitimate client. My best guess is that the client is getting out of sync by sending a packet length word that doesn't match the number of data bytes it actually transmits. If the length word is, say, 5 bytes too large, this would result in the backend eating the next packet's length word and first data byte and then trying to make sense of the second through fifth data bytes as a new length word. Which fits in with your observation. So what you need to do is identify which path of control in the client allows that to happen. It does sound like the proximate bug is in DBD::Pg, unless that module allows higher level code to spit whatever-it-pleases onto the connection --- I'd expect DBD::Pg to be the level responsible for computing packet length words. I'm not at all familiar with that code, though. A different line of thought is that some unrelated Perl code within the client is somehow getting hold of the handle for the DB-connection socket and pushing out data that was meant to go someplace else entirely. I have no idea how likely that would be. regards, tom lane
Tom Lane wrote: > Nick Wellnhofer <wellnhofer@aevum.de> writes: >>The real cause of the problem seems to be a frontend/backend >>communication problem. The "needed" argument 0x5454502b comes from a >>4-byte length field which string content is 'TTP/'. Looks like a part of >>a HTTP request to me. > > > Yeah, it kinda sounds like someone is trying to send an HTTP request to > the Postgres port :-( I thought about that, but I have TCP disabled. And it's definitely triggered by my own Perl code serving dynamic web pages. But it happens at completely random places. Maybe it's a bug in DBD::Pg. Is there an easy way to log the whole frontend/backend communication? Nick -- aevum gmbh leopoldstr. 87 80802 münchen germany fon: +4989 38380653 fax: +4989 38799384 wellnhofer@aevum.de http://aevum.de/