Re: Re: Bug: ERROR: invalid cache ID: 42 CONTEXT: parallel worker - Mailing list pgsql-bugs

From Thomas Munro
Subject Re: Re: Bug: ERROR: invalid cache ID: 42 CONTEXT: parallel worker
Date
Msg-id CAEepm=30uOeesrmZWBj6zFh-E2hByJseyoM2ZtUS2r0E5G9zyA@mail.gmail.com
Whole thread Raw
In response to Re:Re: Bug: ERROR: invalid cache ID: 42 CONTEXT: parallel worker  (jimmy <mpokky@126.com>)
Responses Re:Re: Re: Bug: ERROR: invalid cache ID: 42 CONTEXT: parallelworker
List pgsql-bugs
On Wed, Aug 22, 2018 at 2:54 PM, jimmy <mpokky@126.com> wrote:
> This is the debug log below. Is it useful. Thank you.

That's not showing the path that reaches the error.  If it's happening
in a parallel worker, that'll probably be tricky to catch with a
breakpoint.  Are you able to recompile PostgreSQL?  If you could do
that after changing all cases of elog(ERROR, "invalid cache ID: %d",
cacheId) to PANIC instead of ERROR, and then start it with ulimit -c
unlimited, you might get a core file that you can load into a debugger
to see how we reached it.

It's a strange error.  I don't think it can be coming from these
places in inval.c:

        if (cacheid < 0 || cacheid >= SysCacheSize)
                elog(ERROR, "invalid cache ID: %d", cacheid);

... because we can see that it's 42 (PROCNAMEARGSNSP, a valid cache
ID), and SysCacheSize is a compile-time constant greater than 42.  So
it must be coming from one of the places in syscache.c that look like
this:

        if (cacheId < 0 || cacheId >= SysCacheSize ||
                !PointerIsValid(SysCache[cacheId]))
                elog(ERROR, "invalid cache ID: %d", cacheId);

Since InitCatalogCache() puts a non-NULL pointer into every index from
0 to SysCacheSize - 1 without gaps (or it errors out if it fails while
trying), it seems like either InitCatalogCache() didn't run, or
SysCache[42] has later been overwritten with NULL?  I wondered if
there is some way for a parallel worker to reach shared invalidation
message processing code before the InitCatalogCache() has run, but
that doesn't seem to be an issue: SysCacheInvalidate() quietly
tolerates that.

I wonder how we could reach one of SearchSysCache(PROCNAMEARGSNSP,
...), SysCacheGetAttr(PROCNAMEARGSNSP, ...),
GetSysCacheHashValue(PROCNAMEARGSNSP, ...),
SearchSysCacheList(PROCNAMEARGSNSP, ...) before InitCatalogCache() has
finished?  The answer probably involves oracle_fdw.

Ahh, how about this line here:

https://github.com/laurenz/oracle_fdw/blob/master/oracle_fdw.c#L6237

catlist = SearchSysCacheList2(
PROCNAMEARGSNSP,
CStringGetDatum("geometry_recv"),
PointerGetDatum(buildoidvector(argtypes, argcount)));

I don't immediately see how that can be reached before
InitCatalogCache() has run, though.

-- 
Thomas Munro
http://www.enterprisedb.com


pgsql-bugs by date:

Previous
From: jimmy
Date:
Subject: Re:Re: Bug: ERROR: invalid cache ID: 42 CONTEXT: parallel worker
Next
From: PG Bug reporting form
Date:
Subject: BUG #15345: pg_upgrade from 9.6.10 to 10.5 fails due to function callin index definition