Ok, I have a fairly nasty situation. I am having a production server
that is crashing upon execution of a pl/pgsql function...on code that
has been working flawlessly for weeks. My production server is running
8.0 on win32 and I was able to 'sort-of' reproduce the behavior on my
development machine here at the office.
What happens:
On the production machine, upon execution of the function (with a very
specific parameter value, all others work ok), all server backends
freeze and completely stop working. Any attempt to connect to the
server hangs psql in limbo. In addition, the service fails to shut
down, and the only way to get working again is to kill postmaster.exe
and all instances of postgres.exe. However after that everything runs
o.k. until I try to run the function again. There is nothing useful in
the log.
Following this, I did a dump of the production database and loaded it
into my office machine. Here, I try and execute the function and I get:
esp=# select generate_oe_bom(18208);
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
And server seems to recover from this. Looking at the event log, I see:
NOTICE: hello
LOG: server process (PID 5720) exited with unexpected status 128
LOG: terminating any other active server processes
WARNING: terminating connection because of crash of another server
process
DETAIL: The postmaster has commanded this server process to roll back
the current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.
LOG: all server processes terminated; reinitializing
LOG: database system was interrupted at 2005-03-03 14:16:15 Eastern
Standard Time
LOG: checkpoint record is at 6/D7546E28
LOG: redo record is at 6/D7546E28; undo record is at 0/0; shutdown TRUE
LOG: next transaction ID: 11208897; next OID: 62532404
LOG: database system was not properly shut down; automatic recovery in
progress
LOG: redo starts at 6/D7546E68
LOG: unexpected pageaddr 6/CF5BA000 in log file 6, segment 215, offset
6004736
LOG: redo done at 6/D75B7F90 LOG: database system is ready
This will repeat if the function is run again.
Only the exact parameter (order# 18208) will cause the crash. Another
order, 18150, runs through ok. I would expect data corrumption to be
the cause of the problem except I was able to reproduce the problem on a
different server following a dump/restore. Unfortunately, this is
sensitive data.
Attached is the pl/pgsql code. There is a raise notice 'hello'. This
gets raised exactly once before the crash.
Merlin