Hi,
Due to a debug ereport I just noticed that worker.c's
slot_store_error_callback is doing something quite dangerous:
static void
slot_store_error_callback(void *arg)
{
SlotErrCallbackArg *errarg = (SlotErrCallbackArg *) arg;
LogicalRepRelMapEntry *rel;
char *remotetypname;
Oid remotetypoid,
localtypoid;
/* Nothing to do if remote attribute number is not set */
if (errarg->remote_attnum < 0)
return;
rel = errarg->rel;
remotetypoid = rel->remoterel.atttyps[errarg->remote_attnum];
/* Fetch remote type name from the LogicalRepTypMap cache */
remotetypname = logicalrep_typmap_gettypname(remotetypoid);
/* Fetch local type OID from the local sys cache */
localtypoid = get_atttype(rel->localreloid, errarg->local_attnum + 1);
errcontext("processing remote data for replication target relation \"%s.%s\" column \"%s\", "
"remote type %s, local type %s",
rel->remoterel.nspname, rel->remoterel.relname,
rel->remoterel.attnames[errarg->remote_attnum],
remotetypname,
format_type_be(localtypoid));
}
that's not code that can run in an error context callback. It's
theoretically possible (but unlikely) that
logicalrep_typmap_gettypname() is safe to run in an error context
callback. But get_atttype() definitely isn't.
get_attype() may do catalog accesses. That definitely can't happen
inside an error context callback - the entire transaction might be
borked at this point!
I think this was basically broken from the start in
commit 665d1fad99e7b11678b0d5fa24d2898424243cd6
Author: Peter Eisentraut <peter_e@gmx.net>
Date: 2017-01-19 12:00:00 -0500
Logical replication
but then got a lot worse in
commit 24c0a6c649768f428929e76dd7f5012ec9b93ce1
Author: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date: 2018-03-14 21:34:26 -0300
logical replication: fix OID type mapping mechanism
Greetings,
Andres Freund