Simon Riggs wrote:
> On Thu, 2009-01-08 at 15:50 -0500, Tom Lane wrote:
>> Simon Riggs <simon@2ndQuadrant.com> writes:
>>> On Thu, 2009-01-08 at 22:31 +0200, Heikki Linnakangas wrote:
>>>> When a backend dies with FATAL, it writes an abort record before exiting.
>>>>
>>>> (I was under the impression it doesn't until few minutes ago myself,
>>>> when I actually read the shutdown code :-))
>>> Not in all cases; keep reading :-)
>> If it doesn't, that's a bug. A FATAL exit is not supposed to leave the
>> shared state corrupted, it's only supposed to be a forcible session
>> termination. Any open transaction should be rolled back.
>
> Please look back at the earlier discussion we had on this exact point:
> http://archives.postgresql.org/pgsql-hackers/2008-09/msg01809.php
I think the running-xacts list we dump to WAL at every checkpoint is
enough to handle that. Just treat the dead transaction as in-progress
until the next running-xacts record. It's presumably extremely rare to
have a process die with FATAL, and not write an abort record.
A related issue is that currently the recovery PANICs if it runs out of
recovery procs. I think that's not acceptable, regardless of whether we
use slotids or some other method to avoid it in normal operation,
because it means you can't recover at all if you set max_connections too
low in the standby (or in the primary, and you have to recover from
crash), or you run out of recovery procs because of an abort failed in
the primary like discussed on that thread. The standby should just
fast-forward to the next running-xacts record in that case.
-- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com