Re: BUG #17182: Race condition on concurrent DROP and CREATE of dependent object - Mailing list pgsql-bugs

From Alexander Lakhin
Subject Re: BUG #17182: Race condition on concurrent DROP and CREATE of dependent object
Date
Msg-id cd7dd37e-281c-1973-017f-d7d013191bc7@gmail.com
Whole thread Raw
In response to Re: BUG #17182: Race condition on concurrent DROP and CREATE of dependent object  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs
Hello Tom,
05.09.2021 17:15, Tom Lane wrote:
>> I get several broken functions with the invalid return type:
>> SELECT f1()
>> ERROR:  cache lookup failed for type 16519
>> CONTEXT:  SQL function "f1" during inlining
> I don't find this particularly surprising, and I'm unwilling to add the
> amount of locking overhead it'd take to prevent it.
>
> The generic problem is that a newly-created dependent object is not
> protected against deletion of its referenced object(s) until we commit
> its new pg_depend entries; before that, a concurrent DROP won't see
> the dependencies.  I recall some discussion of trying to take an
> anti-deletion lock in recordDependency, but that's too late: the
> deletion might have committed and released its own lock since we
> looked up the type (or other referenced object).  So the only real fix
> for this would be to make every object lookup in the entire system do
> the sort of dance that's done in RangeVarGetRelidExtended.  We have
> agreed that the cost is worth it for tables (though I don't think
> that that was without controversy, nor am I 100% convinced that
> RangeVarGetRelidExtended is correct).  But I'm not excited about
> extending the principle to other object types.
>
Thanks for the detailed explanation!
So if that is an acceptable behaviour, then it seems that to prevent not
an accidental, but malicious exploitation of it, postgres should be
hardened enough to stand up to the attempts to access such invalid
objects (e.g. objects with non-existing schemas).
(Maybe there are other ways for an unprivileged user to produce such
objects, so fixing follow-up crashes is beneficial anyway.)

For example, the following script produces functions with invalid schemas:
for i in `seq 100`; do
( { for n in `seq 20`; do echo "DROP SCHEMA s;"; done } | psql )
>psql2.log 2>&1 &
( { for n in `seq 1`; do
echo "
CREATE SCHEMA s;
CREATE FUNCTION s.func1() RETURNS int LANGUAGE SQL AS 'SELECT 1;';
CREATE FUNCTION s.func2() RETURNS int LANGUAGE SQL AS 'SELECT 2;';
CREATE FUNCTION s.func3() RETURNS int LANGUAGE SQL AS 'SELECT 3;';
CREATE FUNCTION s.func4() RETURNS int LANGUAGE SQL AS 'SELECT 4;';
CREATE FUNCTION s.func5() RETURNS int LANGUAGE SQL AS 'SELECT 5;';
";  done } | psql ) >psql1.log 2>&1 &
wait
psql -c "DROP SCHEMA s CASCADE" >psql3.log
done

and then the following query crashes the server:
SELECT pg_identify_object('pg_proc'::regclass, pp.oid, 0), pp.oid FROM
pg_proc pp LEFT JOIN pg_namespace pn ON pp.pronamespace = pn.oid WHERE
pn.oid IS NULL

Core was generated by `postgres: law regression [local]
SELECT                                         '.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  quote_identifier (ident=0x0) at ruleutils.c:11357
11357           safe = ((ident[0] >= 'a' && ident[0] <= 'z') || ident[0]
== '_');
(gdb) bt
#0  quote_identifier (ident=0x0) at ruleutils.c:11357
#1  0x00005585a5dcee76 in pg_identify_object (fcinfo=<optimized out>) at
objectaddress.c:4215
#2  0x00005585a5ed1291 in ExecInterpExpr (state=0x5585a67b1e80,
econtext=0x5585a67a8b00, isnull=0x7ffcfbee9e5f)
    at execExprInterp.c:749
#3  0x00005585a5ecdb05 in ExecInterpExprStillValid
(state=0x5585a67b1e80, econtext=0x5585a67a8b00,
    isNull=0x7ffcfbee9e5f) at execExprInterp.c:1824

So the quote_identifier(schema) construction seems unsafe in the
circumstances. My colleagues Alexander Kozhemyakin and Sergey Shinderuk
discovered similar crash case with OCLASS_DEFACL in
getObjectIdentityParts(), that drove me to study this issue additionally.

Best regards,
Alexander



pgsql-bugs by date:

Previous
From: Michael Paquier
Date:
Subject: Re: BUG #17186: In connect.c, the lock connections_mutex is not correctly released(Line 463) at the return(Line 522)
Next
From: Kyotaro Horiguchi
Date:
Subject: Re: BUG #17184: When using openssl, the memory of a static variable is not freed in libpq