So, the reason I started the thread about postmaster dying on OOM is
that somebody asked me on IM what could have caused a backend to die
with this backtrace:
libc.so.1`_ndoprnt+0x14()
libc.so.1`fprintf+0x11d()
AllocSetStats+0x15d()
MemoryContextStatsInternal+0x1c()
MemoryContextStats+0xb()
AllocSetAlloc+0x1c0()
MemoryContextAllocZeroAligned+0x57()
makeTypeNameFromNameList+0x20()
SystemTypeName+0x40()
base_yyparse+0xcd42()
raw_parser+0x29()
pg_parse_query+0x23()
exec_simple_query+0x6d()
PostgresMain+0xf6a()
BackendRun+0x254()
BackendStartup+0xf8()
ServerLoop+0x116()
PostmasterMain+0xd98()
main+0x18a()
0x4e08ec()
Postmaster only logged this one with
2009-04-06 16:33:48 EDT::@:[13741]: LOG: server process (PID 12146) was terminated by signal 11
and there's no indication of any activity from that process in the log
at all.
Several other processes seem to be exiting or terminating transactions
with errno "Not enough space".
His question was: is it possible that we're handing a NULL pointer to a
%s on fprintf? The involved code looks like this:
fprintf(stderr, "%s: %lu total in %ld blocks; %lu free (%ld chunks); %lu used\n", set->header.name,
totalspace,nblocks, freespace, nchunks, totalspace - freespace);
And since this is being called from AllocSetAlloc, which is always
handed a complete memory context (and not something that has only been
partially set), I think the answer is that it's not possible, and that
the bug must be on libc which is perhaps not handling out-of-memory very
cleanly in its fprintf implementation.
Am I all wet?
--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.