On Thu, 16 Sep 2004, Tom Lane wrote:
> Gavin Sherry <swm@linuxworld.com.au> writes:
> > Interestingly, I *cannot* recreate on the single CPU system and I cannot
> > get abort() to generate a core.
>
> By that do you mean that you don't see any corefile in the DB directory
> when you look after the dust settles?
Right. I was actually just doing a find . -name core and came up with no
results.
>
> I ran into the same problem yesterday in another connection, and
> eventually realized that the corefile is getting removed because of the
> logic I added recently to do WAL replay of CREATE/DROP DATABASE. The
> regression test sequence is short enough (on modern machines) that there
> may not be any checkpoint between its start and the point where you have
> a crash, so that the initial "CREATE DATABASE regression" operation is
> still in the range of WAL entries to be replayed. In dbcommands.c
> it sez:
>
> /*
> * Our theory for replaying a CREATE is to forcibly drop the
> * target subdirectory if present, then re-copy the source data.
> * This may be more work than needed, but it is simple to
> * implement.
> */
>
> So what's happening is that WAL replay is wiping the database directory
> (including the core file).
>
> I don't really want to change the CREATE DATABASE replay logic, so I was
> thinking of suggesting that we hack around this by modifying pg_regress
> to force a checkpoint right after its CREATE DATABASE. Then any crashes
> during the regression tests wouldn't cause a replay of the CREATE. This
> is mighty ugly though :-(
Yes, a bit ugly. Not as ugly as my putting a for(;;) in
SubTransGetTopmostTransaction() when the Assert condition would fail so
that I could get a useful backtrace.
Gavin