On Sat, Aug 25, 2018 at 3:16 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> =?utf-8?q?PG_Bug_reporting_form?= <noreply@postgresql.org> writes:
> > We're using 10.5 with parallel queries enabled and the config options
> > #max_worker_processes = 8
> > #max_parallel_workers_per_gather = 2
> > #max_parallel_workers = 8
>
> > I'm seeing invalid cache ID: 11 errors in the log. It's only happening
> > occasionally (15 times today on a not very busy system).
>
> Interesting. Syscache 11 would be AUTHOID, which seems to be consulted
> mostly for privilege checks, though there's at least one reference
> during process startup.
Hi Kieran,
Are you using extensions, by any chance? If an extension were to
access the AUTHOID syscache during _PG_init(), it would fail like this
in parallel workers, because they run RestoreLibraryState() before
they run BackgroundWorkerInitializeConnectionByOid() (which runs
InitPostgres() which runs InitCatalogCache()). Oracle_fdw has this
problem (see nearby thread) and there may be others out there. The
extension wouldn't have to be used by the query that exhibited the
symptom... it could have been loaded earlier in the life of the leader
backend but caused no problem until eventually a parallel query was
launched.
We could probably improve that situation by making syscache lookups
(and probably other things too) fail when called from _PG_init() in
regular backends so that extension authors are made aware of this
hazard, or perhaps go the other way and change the order we do things
in parallel workers.
--
Thomas Munro
http://www.enterprisedb.com