Doug McNaught <doug@wireboard.com> writes:
> The problem I'm having is that the backends will crash randomly, after
> the database has been up for a few days, with:
> FATAL 1: Memory exhausted in AllocSetAlloc()
> The system has plenty of memory and swap, and under normal
> circumstances the backends take up 10-15 megabytes. If it's a
> runaway situation of some kind, it happens very fast, as I've even
> taken snapshots of the process table at 1 minute intervals, and they
> show no abnormality right up to the time of the crash.
Hmm. That puts a damper on the idea that it's a memory leak --- doesn't
eliminate the theory entirely, however. The other likely theory is that
you've got a variable-size column value someplace whose size word has
been corrupted, so that it claims to be umpteen megabytes long. Any
attempt to copy such a value out of the tuple it's in will result in
an instant "out of memory" complaint.
Is there any consistency about which table is being touched when the
failure occurs? It's not hard to isolate and delete a damaged tuple
once you know which table it's in, but if you've got a lot of tables
the initial search can be tedious.
One way to get more info is to tweak the code to abort() just before
it would normally report the out-of-memory error. Then you will get
a coredump and can learn something from the backtrace (don't forget
to compile with -g).
regards, tom lane