Thread: more on out-of-memory
So, the reason I started the thread about postmaster dying on OOM is that somebody asked me on IM what could have caused a backend to die with this backtrace: libc.so.1`_ndoprnt+0x14() libc.so.1`fprintf+0x11d() AllocSetStats+0x15d() MemoryContextStatsInternal+0x1c() MemoryContextStats+0xb() AllocSetAlloc+0x1c0() MemoryContextAllocZeroAligned+0x57() makeTypeNameFromNameList+0x20() SystemTypeName+0x40() base_yyparse+0xcd42() raw_parser+0x29() pg_parse_query+0x23() exec_simple_query+0x6d() PostgresMain+0xf6a() BackendRun+0x254() BackendStartup+0xf8() ServerLoop+0x116() PostmasterMain+0xd98() main+0x18a() 0x4e08ec() Postmaster only logged this one with 2009-04-06 16:33:48 EDT::@:[13741]: LOG: server process (PID 12146) was terminated by signal 11 and there's no indication of any activity from that process in the log at all. Several other processes seem to be exiting or terminating transactions with errno "Not enough space". His question was: is it possible that we're handing a NULL pointer to a %s on fprintf? The involved code looks like this: fprintf(stderr, "%s: %lu total in %ld blocks; %lu free (%ld chunks); %lu used\n", set->header.name, totalspace,nblocks, freespace, nchunks, totalspace - freespace); And since this is being called from AllocSetAlloc, which is always handed a complete memory context (and not something that has only been partially set), I think the answer is that it's not possible, and that the bug must be on libc which is perhaps not handling out-of-memory very cleanly in its fprintf implementation. Am I all wet? -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Alvaro Herrera <alvherre@commandprompt.com> writes: > So, the reason I started the thread about postmaster dying on OOM is > that somebody asked me on IM what could have caused a backend to die > with this backtrace: [ of course you realize this is a backend, not the postmaster ] > His question was: is it possible that we're handing a NULL pointer to a > %s on fprintf? The involved code looks like this: > ... > And since this is being called from AllocSetAlloc, which is always > handed a complete memory context (and not something that has only been > partially set), I think the answer is that it's not possible, and that > the bug must be on libc which is perhaps not handling out-of-memory very > cleanly in its fprintf implementation. Another theory is that the name pointer got clobbered by some sort of memory-stomping bug. (We don't know from the available evidence that it was NULL --- it could have been any garbage value that pointed outside backend memory.) However, given that the context clearly indicates being out-of-memory overall, your theory seems a bit more probable. The really odd thing is that the stack trace is so short; it seems to have failed *very* early in query parsing, which is hard to credit unless this person is in the habit of sending megabytes-long queries. I guess if the system as a whole were under really severe memory pressure, a backend could hit OOM without having eaten much itself. What platform is this, and which PG version? regards, tom lane
Alvaro Herrera wrote: > His question was: is it possible that we're handing a NULL pointer to a > %s on fprintf? The involved code looks like this: > > fprintf(stderr, > "%s: %lu total in %ld blocks; %lu free (%ld chunks); %lu used\n", > set->header.name, totalspace, nblocks, freespace, nchunks, > totalspace - freespace); Note that glibc prints "(null)" if you pass NULL for %s. Others don't. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com