Tom Lane wrote:
> There's a fairly interesting crash here:
> http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=jaguar&dt=2008-07-16%2003:00:02
> The buildfarm was nice enough to provide a stack trace at the bottom of
> the page, which shows clearly that autovac tried to pfree a null
> pointer.
>
> What I think happened was that the table that was selected to be
> autovacuumed got dropped during the setup steps, leading get_rel_name()
> to return NULL at line 2167. vacuum() itself would have fallen out
> silently ... however, had it errored out, the attempts at error
> reporting in the PG_CATCH block would have crashed.
Ouch. This was introduced when we added the PG_TRY block to add the
errcontext and resume vacuuming with the next table; the original code
was clean about this problem.
> I see that we already noticed and fixed this type of problem in
> autovac_report_activity(), but do_autovacuum() didn't get the word.
> Is there anyplace else in there with the same issue?
I'll have a more detailed look.
> For that matter, why is autovac_report_activity repeating the lookups
> already done at the outer level?
Ah, good catch -- it's because the outer code was added later. It
should be easy to pass the names down, I think.
> One other point is that the postmaster log just says
>
> TRAP: FailedAssertion("!(pointer != ((void *)0))", File: "mcxt.c", Line: 580)
> [487d6715.3a87:2] LOG: server process (PID 16885) was terminated by signal 6: Aborted
>
> Could we get that to say "autovacuum worker" instead of "server"?
Hmm, that would require moving code from HandleChildCrash to
CleanupBackend, I think. (We're creating the process name in
CleanupBackend without scanning the BackendList, which we would need to
figure out if it's a worker.)
--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support