Tom Lane wrote:
> I read it like this:
>
> #0 0x0827441d in MemoryContextAlloc () <-- real
> #1 0x08274467 in MemoryContextStrdup () <-- real
> #2 0x0826501c in database_getflatfilename () <-- real
> #3 0x0826504e in database_getflatfilename () <-- must be write_database_file
> #4 0x08265ec1 in AtEOXact_UpdateFlatFiles () <-- real
> #5 0x080a9111 in RecordTransactionCommit () <-- must be CommitTransaction
> #6 0x080a93a7 in CommitTransactionCommand () <-- real
> #7 0x081a6c3b in autovac_stopped () <-- must be process_whole_db
> #8 0x081a75cd in autovac_start () <-- real
> #9 0x081ae33c in ClosePostmasterPorts () <-- must be ServerLoop
> #10 0x081af058 in PostmasterMain ()
> #11 0x0816b3e2 in main ()
>
> although this requires one or two leaps of faith about single-call
> static functions getting inlined so that they don't produce a callstack
> entry (in particular that must have happened to AutoVacMain). In any
> case, it's very hard to see how MemoryContextAlloc would dump core
> unless the method pointer of the context it was pointed to was
> clobbered. So I'm pretty sure that's what happened, and now we must
> work backwards to how it happened,
>
> Justin, it's entirely possible that the only way we'll figure it out
> is for a developer to go poking at the entrails. Are you in a position
> to give Alvaro or me ssh access to your test machine?
>
> regards, tom lane
>
I'm currently working on recompiling Postgres with the new configure
parameters. I'm trying to go the easier route by downloading the Debian
source package, add the new options, compile, then install the package.
Hopefully this will give the closest possible binary to the current one.
Incidentally, the --enable-debug option is already set for the Debian
package (I did have to add --enable-cassert though). I'll let you know
once I get it up if things work properly.
As far as access to the machine, I'll contact you off-list if I can work
something out for that. The data is not overly sensitive, but it's still
client data nonetheless. I'll try to make a copy of the cluster and try
to reduce the database count and see if I can still duplicate the problem.
Thanks.
Justin Pasher