logical replication worker accesses catalogs in error context callback - Mailing list pgsql-hackers

From Andres Freund
Subject logical replication worker accesses catalogs in error context callback
Date
Msg-id 20210106020229.ne5xnuu6wlondjpe@alap3.anarazel.de
Whole thread Raw
Responses Re: logical replication worker accesses catalogs in error context callback
List pgsql-hackers
Hi,

Due to a debug ereport I just noticed that worker.c's
slot_store_error_callback is doing something quite dangerous:

static void
slot_store_error_callback(void *arg)
{
    SlotErrCallbackArg *errarg = (SlotErrCallbackArg *) arg;
    LogicalRepRelMapEntry *rel;
    char       *remotetypname;
    Oid            remotetypoid,
                localtypoid;

    /* Nothing to do if remote attribute number is not set */
    if (errarg->remote_attnum < 0)
        return;

    rel = errarg->rel;
    remotetypoid = rel->remoterel.atttyps[errarg->remote_attnum];

    /* Fetch remote type name from the LogicalRepTypMap cache */
    remotetypname = logicalrep_typmap_gettypname(remotetypoid);

    /* Fetch local type OID from the local sys cache */
    localtypoid = get_atttype(rel->localreloid, errarg->local_attnum + 1);

    errcontext("processing remote data for replication target relation \"%s.%s\" column \"%s\", "
               "remote type %s, local type %s",
               rel->remoterel.nspname, rel->remoterel.relname,
               rel->remoterel.attnames[errarg->remote_attnum],
               remotetypname,
               format_type_be(localtypoid));
}


that's not code that can run in an error context callback. It's
theoretically possible (but unlikely) that
logicalrep_typmap_gettypname() is safe to run in an error context
callback. But get_atttype() definitely isn't.

get_attype() may do catalog accesses. That definitely can't happen
inside an error context callback - the entire transaction might be
borked at this point!


I think this was basically broken from the start in

commit 665d1fad99e7b11678b0d5fa24d2898424243cd6
Author: Peter Eisentraut <peter_e@gmx.net>
Date:   2017-01-19 12:00:00 -0500

    Logical replication

but then got a lot worse in

commit 24c0a6c649768f428929e76dd7f5012ec9b93ce1
Author: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date:   2018-03-14 21:34:26 -0300

    logical replication: fix OID type mapping mechanism

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
Next
From: Masahiko Sawada
Date:
Subject: Re: [PATCH] Feature improvement for CLOSE, FETCH, MOVE tab completion