Thread: Avoid orphaned objects dependencies, take 3

Avoid orphaned objects dependencies, take 3

From
Bertrand Drouvot
Date:
Hi,

This new thread is a follow-up of [1] and [2].

Problem description:

We have occasionally observed objects having an orphaned dependency, the 
most common case we have seen is functions not linked to any namespaces.

Examples to produce such orphaned dependencies:

Scenario 1:

session 1: begin; drop schema schem;
session 2: create a function in the schema schem
session 1: commit;

With the above, the function created in session 2 would be linked to a non
existing schema.

Scenario 2:

session 1: begin; create a function in the schema schem
session 2: drop schema schem;
session 1: commit;

With the above, the function created in session 1 would be linked to a non
existing schema.

A patch has been initially proposed to fix this particular 
(function-to-namespace) dependency (see [1]), but there could be much 
more scenarios (like the function-to-datatype one highlighted by Gilles 
in [1] that could lead to a function having an invalid parameter datatype).

As Tom said there are dozens more cases that would need to be 
considered, and a global approach to avoid those race conditions should 
be considered instead.

A first global approach attempt has been proposed in [2] making use of a dirty
snapshot when recording the dependency. But this approach appeared to be "scary"
and it was still failing to close some race conditions (see [2] for details).

Then, Tom proposed another approach in [2] which is that "creation DDL will have
to take a lock on each referenced object that'd conflict with a lock taken by
DROP".

This is what the attached patch is trying to achieve.

It does the following:

1) A new lock (that conflicts with a lock taken by DROP) has been put in place
when the dependencies are being recorded.

Thanks to it, the drop schema in scenario 2 would be locked (resulting in an
error should session 1 committs).

2) After locking the object while recording the dependency, the patch checks
that the object still exists.

Thanks to it, session 2 in scenario 1 would be locked and would report an error
once session 1 committs (that would not be the case should session 1 abort the
transaction).

The patch also adds a few tests for some dependency cases (that would currently
produce orphaned objects):

- schema and function (as the above scenarios)
- function and type
- table and type (which is I think problematic enough, as involving a table into
the game, to fix this stuff as a whole).

[1]:
https://www.postgresql.org/message-id/flat/a4f55089-7cbd-fe5d-a9bb-19adc6418ae9@darold.net#9af5cdaa9e80879beb1def3604c976e8
[2]: https://www.postgresql.org/message-id/flat/8369ff70-0e31-f194-2954-787f4d9e21dd%40amazon.com

Please note that I'm not used to with this area of the code so that the patch
might not take the option proposed by Tom the "right" way.

Adding the patch to the July CF.

Looking forward to your feedback,

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

Re: Avoid orphaned objects dependencies, take 3

From
Alexander Lakhin
Date:
Hi Bertrand,

22.04.2024 11:45, Bertrand Drouvot wrote:
> Hi,
>
> This new thread is a follow-up of [1] and [2].
>
> Problem description:
>
> We have occasionally observed objects having an orphaned dependency, the
> most common case we have seen is functions not linked to any namespaces.
>
> ...
>
> Looking forward to your feedback,

This have reminded me of bug #17182 [1].
Unfortunately, with the patch applied, the following script:

for ((i=1;i<=100;i++)); do
   ( { for ((n=1;n<=20;n++)); do echo "DROP SCHEMA s;"; done } | psql ) >psql1.log 2>&1 &
   echo "
CREATE SCHEMA s;
CREATE FUNCTION s.func1() RETURNS int LANGUAGE SQL AS 'SELECT 1;';
CREATE FUNCTION s.func2() RETURNS int LANGUAGE SQL AS 'SELECT 2;';
CREATE FUNCTION s.func3() RETURNS int LANGUAGE SQL AS 'SELECT 3;';
CREATE FUNCTION s.func4() RETURNS int LANGUAGE SQL AS 'SELECT 4;';
CREATE FUNCTION s.func5() RETURNS int LANGUAGE SQL AS 'SELECT 5;';
   "  | psql >psql2.log 2>&1 &
   wait
   psql -c "DROP SCHEMA s CASCADE" >psql3.log
done
echo "
SELECT pg_identify_object('pg_proc'::regclass, pp.oid, 0), pp.oid FROM pg_proc pp
   LEFT JOIN pg_namespace pn ON pp.pronamespace = pn.oid WHERE pn.oid IS NULL" | psql

still ends with:
server closed the connection unexpectedly
         This probably means the server terminated abnormally
         before or while processing the request.

2024-04-22 09:54:39.171 UTC|||662633dc.152bbc|LOG:  server process (PID 1388378) was terminated by signal 11: 
Segmentation fault
2024-04-22 09:54:39.171 UTC|||662633dc.152bbc|DETAIL:  Failed process was running: SELECT 
pg_identify_object('pg_proc'::regclass, pp.oid, 0), pp.oid FROM pg_proc pp
       LEFT JOIN pg_namespace pn ON pp.pronamespace = pn.oid WHERE pn.oid IS NULL

[1] https://www.postgresql.org/message-id/flat/17182-a6baa001dd1784be%40postgresql.org

Best regards,
Alexander



Re: Avoid orphaned objects dependencies, take 3

From
Bertrand Drouvot
Date:
Hi,

On Mon, Apr 22, 2024 at 01:00:00PM +0300, Alexander Lakhin wrote:
> Hi Bertrand,
> 
> 22.04.2024 11:45, Bertrand Drouvot wrote:
> > Hi,
> > 
> > This new thread is a follow-up of [1] and [2].
> > 
> > Problem description:
> > 
> > We have occasionally observed objects having an orphaned dependency, the
> > most common case we have seen is functions not linked to any namespaces.
> > 
> > ...
> > 
> > Looking forward to your feedback,
> 
> This have reminded me of bug #17182 [1].

Thanks for the link to the bug!

> Unfortunately, with the patch applied, the following script:
> 
> for ((i=1;i<=100;i++)); do
>   ( { for ((n=1;n<=20;n++)); do echo "DROP SCHEMA s;"; done } | psql ) >psql1.log 2>&1 &
>   echo "
> CREATE SCHEMA s;
> CREATE FUNCTION s.func1() RETURNS int LANGUAGE SQL AS 'SELECT 1;';
> CREATE FUNCTION s.func2() RETURNS int LANGUAGE SQL AS 'SELECT 2;';
> CREATE FUNCTION s.func3() RETURNS int LANGUAGE SQL AS 'SELECT 3;';
> CREATE FUNCTION s.func4() RETURNS int LANGUAGE SQL AS 'SELECT 4;';
> CREATE FUNCTION s.func5() RETURNS int LANGUAGE SQL AS 'SELECT 5;';
>   "  | psql >psql2.log 2>&1 &
>   wait
>   psql -c "DROP SCHEMA s CASCADE" >psql3.log
> done
> echo "
> SELECT pg_identify_object('pg_proc'::regclass, pp.oid, 0), pp.oid FROM pg_proc pp
>   LEFT JOIN pg_namespace pn ON pp.pronamespace = pn.oid WHERE pn.oid IS NULL" | psql
> 
> still ends with:
> server closed the connection unexpectedly
>         This probably means the server terminated abnormally
>         before or while processing the request.
> 
> 2024-04-22 09:54:39.171 UTC|||662633dc.152bbc|LOG:  server process (PID
> 1388378) was terminated by signal 11: Segmentation fault
> 2024-04-22 09:54:39.171 UTC|||662633dc.152bbc|DETAIL:  Failed process was
> running: SELECT pg_identify_object('pg_proc'::regclass, pp.oid, 0), pp.oid
> FROM pg_proc pp
>       LEFT JOIN pg_namespace pn ON pp.pronamespace = pn.oid WHERE pn.oid IS NULL
> 

Thanks for sharing the script.

That's weird, I just launched it several times with the patch applied and I'm not
able to see the seg fault (while I can see it constently failing on the master
branch).

Are you 100% sure you tested it against a binary with the patch applied?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Avoid orphaned objects dependencies, take 3

From
Alexander Lakhin
Date:
22.04.2024 13:52, Bertrand Drouvot wrote:
>
> That's weird, I just launched it several times with the patch applied and I'm not
> able to see the seg fault (while I can see it constently failing on the master
> branch).
>
> Are you 100% sure you tested it against a binary with the patch applied?
>

Yes, at least I can't see what I'm doing wrong. Please try my
self-contained script attached.

Best regards,
Alexander
Attachment

Re: Avoid orphaned objects dependencies, take 3

From
Bertrand Drouvot
Date:
Hi,

On Mon, Apr 22, 2024 at 03:00:00PM +0300, Alexander Lakhin wrote:
> 22.04.2024 13:52, Bertrand Drouvot wrote:
> > 
> > That's weird, I just launched it several times with the patch applied and I'm not
> > able to see the seg fault (while I can see it constently failing on the master
> > branch).
> > 
> > Are you 100% sure you tested it against a binary with the patch applied?
> > 
> 
> Yes, at least I can't see what I'm doing wrong. Please try my
> self-contained script attached.

Thanks for sharing your script!

Yeah your script ensures the patch is applied before the repro is executed.

I do confirm that I can also see the issue with the patch applied (but I had to
launch multiple attempts, while on master one attempt is enough).

I'll have a look.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Avoid orphaned objects dependencies, take 3

From
Bertrand Drouvot
Date:
Hi,

On Tue, Apr 23, 2024 at 04:59:09AM +0000, Bertrand Drouvot wrote:
> Hi,
> 
> On Mon, Apr 22, 2024 at 03:00:00PM +0300, Alexander Lakhin wrote:
> > 22.04.2024 13:52, Bertrand Drouvot wrote:
> > > 
> > > That's weird, I just launched it several times with the patch applied and I'm not
> > > able to see the seg fault (while I can see it constently failing on the master
> > > branch).
> > > 
> > > Are you 100% sure you tested it against a binary with the patch applied?
> > > 
> > 
> > Yes, at least I can't see what I'm doing wrong. Please try my
> > self-contained script attached.
> 
> Thanks for sharing your script!
> 
> Yeah your script ensures the patch is applied before the repro is executed.
> 
> I do confirm that I can also see the issue with the patch applied (but I had to
> launch multiple attempts, while on master one attempt is enough).
> 
> I'll have a look.

Please find attached v2 that should not produce the issue anymore (I launched a
lot of attempts without any issues). v1 was not strong enough as it was not
always checking for the dependent object existence. v2 now always checks if the
object still exists after the additional lock acquisition attempt while recording
the dependency.

I still need to think about v2 but in the meantime could you please also give
v2 a try on you side?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

Re: Avoid orphaned objects dependencies, take 3

From
Bertrand Drouvot
Date:
Hi,

On Tue, Apr 23, 2024 at 04:20:46PM +0000, Bertrand Drouvot wrote:
> Please find attached v2 that should not produce the issue anymore (I launched a
> lot of attempts without any issues). v1 was not strong enough as it was not
> always checking for the dependent object existence. v2 now always checks if the
> object still exists after the additional lock acquisition attempt while recording
> the dependency.
> 
> I still need to think about v2 but in the meantime could you please also give
> v2 a try on you side?

I gave more thought to v2 and the approach seems reasonable to me. Basically what
it does is that in case the object is already dropped before we take the new lock
(while recording the dependency) then the error message is a "generic" one (means
it does not provide the description of the "already" dropped object). I think it
makes sense to write the patch that way by abandoning the patch's ambition to
tell the description of the dropped object in all the cases.

Of course, I would be happy to hear others thought about it.

Please find v3 attached (which is v2 with more comments).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

Re: Avoid orphaned objects dependencies, take 3

From
Alexander Lakhin
Date:
Hi Bertrand,

24.04.2024 11:38, Bertrand Drouvot wrote:
>> Please find attached v2 that should not produce the issue anymore (I launched a
>> lot of attempts without any issues). v1 was not strong enough as it was not
>> always checking for the dependent object existence. v2 now always checks if the
>> object still exists after the additional lock acquisition attempt while recording
>> the dependency.
>>
>> I still need to think about v2 but in the meantime could you please also give
>> v2 a try on you side?
> I gave more thought to v2 and the approach seems reasonable to me. Basically what
> it does is that in case the object is already dropped before we take the new lock
> (while recording the dependency) then the error message is a "generic" one (means
> it does not provide the description of the "already" dropped object). I think it
> makes sense to write the patch that way by abandoning the patch's ambition to
> tell the description of the dropped object in all the cases.
>
> Of course, I would be happy to hear others thought about it.
>
> Please find v3 attached (which is v2 with more comments).

Thank you for the improved version!

I can confirm that it fixes that case.
I've also tested other cases that fail on master (most of them also fail
with v1), please try/look at the attached script. (There could be other
broken-dependency cases, of course, but I think I've covered all the
representative ones.)

All tested cases work correctly with v3 applied — I couldn't get broken
dependencies, though concurrent create/drop operations can still produce
the "cache lookup failed" error, which is probably okay, except that it is
an INTERNAL_ERROR, which assumed to be not easily triggered by users.

Best regards,
Alexander
Attachment

Re: Avoid orphaned objects dependencies, take 3

From
Bertrand Drouvot
Date:
Hi,

On Wed, Apr 24, 2024 at 03:00:00PM +0300, Alexander Lakhin wrote:
> 24.04.2024 11:38, Bertrand Drouvot wrote:
> > I gave more thought to v2 and the approach seems reasonable to me. Basically what
> > it does is that in case the object is already dropped before we take the new lock
> > (while recording the dependency) then the error message is a "generic" one (means
> > it does not provide the description of the "already" dropped object). I think it
> > makes sense to write the patch that way by abandoning the patch's ambition to
> > tell the description of the dropped object in all the cases.
> > 
> > Of course, I would be happy to hear others thought about it.
> > 
> > Please find v3 attached (which is v2 with more comments).
> 
> Thank you for the improved version!
> 
> I can confirm that it fixes that case.

Great, thanks for the test!

> I've also tested other cases that fail on master (most of them also fail
> with v1), please try/look at the attached script.

Thanks for all those tests!

> (There could be other broken-dependency cases, of course, but I think I've
> covered all the representative ones.)

Agree. Furthermore the way the patch is written should be agnostic to the
object's kind that are part of the dependency. Having said that, that does not
hurt to add more tests in this patch, so v4 attached adds some of your tests (
that would fail on the master branch without the patch applied).

The way the tests are written in the patch are less "racy" that when triggered
with your script. Indeed, I think that in the patch the dependent object can not
be removed before the new lock is taken when recording the dependency. We may
want to add injection points in the game if we feel the need.

> All tested cases work correctly with v3 applied —

Yeah, same on my side, I did run them too and did not find any issues.

> I couldn't get broken
> dependencies,

Same here.

> though concurrent create/drop operations can still produce
> the "cache lookup failed" error, which is probably okay, except that it is
> an INTERNAL_ERROR, which assumed to be not easily triggered by users.

I did not see any of those "cache lookup failed" during my testing with/without
your script. During which test(s) did you see those with v3 applied?

Attached v4, simply adding more tests to v3.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

Re: Avoid orphaned objects dependencies, take 3

From
Alexander Lakhin
Date:
Hi,

25.04.2024 08:00, Bertrand Drouvot wrote:

though concurrent create/drop operations can still produce
the "cache lookup failed" error, which is probably okay, except that it is
an INTERNAL_ERROR, which assumed to be not easily triggered by users.
I did not see any of those "cache lookup failed" during my testing with/without
your script. During which test(s) did you see those with v3 applied?

You can try, for example, table-trigger, or other tests that check for
"cache lookup failed" psql output only (maybe you'll need to increase the
iteration count). For instance, I've got (with v4 applied):
2024-04-25 05:48:08.102 UTC [3638763] ERROR:  cache lookup failed for function 20893
2024-04-25 05:48:08.102 UTC [3638763] STATEMENT:  CREATE TRIGGER modified_c1 BEFORE UPDATE OF c ON t
        FOR EACH ROW WHEN (OLD.c <> NEW.c) EXECUTE PROCEDURE trigger_func('modified_c');

Or with function-function:
2024-04-25 05:52:31.390 UTC [3711897] ERROR:  cache lookup failed for function 32190 at character 54
2024-04-25 05:52:31.390 UTC [3711897] STATEMENT:  CREATE FUNCTION f1() RETURNS int LANGUAGE SQL RETURN f() + 1;
--
2024-04-25 05:52:37.639 UTC [3720011] ERROR:  cache lookup failed for function 34465 at character 54
2024-04-25 05:52:37.639 UTC [3720011] STATEMENT:  CREATE FUNCTION f1() RETURNS int LANGUAGE SQL RETURN f() + 1;

Attached v4, simply adding more tests to v3.

Thank you for working on this!

Best regards,
Alexander

Re: Avoid orphaned objects dependencies, take 3

From
Bertrand Drouvot
Date:
Hi,

On Thu, Apr 25, 2024 at 09:00:00AM +0300, Alexander Lakhin wrote:
> Hi,
> 
> 25.04.2024 08:00, Bertrand Drouvot wrote:
> > 
> > > though concurrent create/drop operations can still produce
> > > the "cache lookup failed" error, which is probably okay, except that it is
> > > an INTERNAL_ERROR, which assumed to be not easily triggered by users.
> > I did not see any of those "cache lookup failed" during my testing with/without
> > your script. During which test(s) did you see those with v3 applied?
> 
> You can try, for example, table-trigger, or other tests that check for
> "cache lookup failed" psql output only (maybe you'll need to increase the
> iteration count). For instance, I've got (with v4 applied):
> 2024-04-25 05:48:08.102 UTC [3638763] ERROR:  cache lookup failed for function 20893
> 2024-04-25 05:48:08.102 UTC [3638763] STATEMENT:  CREATE TRIGGER modified_c1 BEFORE UPDATE OF c ON t
>         FOR EACH ROW WHEN (OLD.c <> NEW.c) EXECUTE PROCEDURE trigger_func('modified_c');
> 
> Or with function-function:
> 2024-04-25 05:52:31.390 UTC [3711897] ERROR:  cache lookup failed for function 32190 at character 54
> 2024-04-25 05:52:31.390 UTC [3711897] STATEMENT:  CREATE FUNCTION f1() RETURNS int LANGUAGE SQL RETURN f() + 1;
> --
> 2024-04-25 05:52:37.639 UTC [3720011] ERROR:  cache lookup failed for function 34465 at character 54
> 2024-04-25 05:52:37.639 UTC [3720011] STATEMENT:  CREATE FUNCTION f1() RETURNS int LANGUAGE SQL RETURN f() + 1;

I see, so this is during object creation.

It's easy to reproduce this kind of issue with gdb. For example set a breakpoint
on SearchSysCache1() and during the create function f1() once it breaks on:

#0  SearchSysCache1 (cacheId=45, key1=16400) at syscache.c:221
#1  0x00005ad305beacd6 in func_get_detail (funcname=0x5ad308204d50, fargs=0x0, fargnames=0x0, nargs=0,
argtypes=0x7ffff2ff9cc0,expand_variadic=true, expand_defaults=true, include_out_arguments=false, funcid=0x7ffff2ff9ba0,
rettype=0x7ffff2ff9b9c,retset=0x7ffff2ff9b94, nvargs=0x7ffff2ff9ba4,
 
    vatype=0x7ffff2ff9ba8, true_typeids=0x7ffff2ff9bd8, argdefaults=0x7ffff2ff9be0) at parse_func.c:1622
#2  0x00005ad305be7dd0 in ParseFuncOrColumn (pstate=0x5ad30823be98, funcname=0x5ad308204d50, fargs=0x0, last_srf=0x0,
fn=0x5ad308204da0,proc_call=false, location=55) at parse_func.c:266
 
#3  0x00005ad305bdffb0 in transformFuncCall (pstate=0x5ad30823be98, fn=0x5ad308204da0) at parse_expr.c:1474
#4  0x00005ad305bdd2ee in transformExprRecurse (pstate=0x5ad30823be98, expr=0x5ad308204da0) at parse_expr.c:230
#5  0x00005ad305bdec34 in transformAExprOp (pstate=0x5ad30823be98, a=0x5ad308204e20) at parse_expr.c:990
#6  0x00005ad305bdd1a0 in transformExprRecurse (pstate=0x5ad30823be98, expr=0x5ad308204e20) at parse_expr.c:187
#7  0x00005ad305bdd00b in transformExpr (pstate=0x5ad30823be98, expr=0x5ad308204e20, exprKind=EXPR_KIND_SELECT_TARGET)
atparse_expr.c:131
 
#8  0x00005ad305b96b7e in transformReturnStmt (pstate=0x5ad30823be98, stmt=0x5ad308204ee0) at analyze.c:2395
#9  0x00005ad305b92213 in transformStmt (pstate=0x5ad30823be98, parseTree=0x5ad308204ee0) at analyze.c:375
#10 0x00005ad305c6321a in interpret_AS_clause (languageOid=14, languageName=0x5ad308204c40 "sql",
funcname=0x5ad308204ad8"f100", as=0x0, sql_body_in=0x5ad308204ee0, parameterTypes=0x0, inParameterNames=0x0,
prosrc_str_p=0x7ffff2ffa208,probin_str_p=0x7ffff2ffa200, sql_body_out=0x7ffff2ffa210,
 
    queryString=0x5ad3082040b0 "CREATE FUNCTION f100() RETURNS int LANGUAGE SQL RETURN f() + 1;") at
functioncmds.c:953
#11 0x00005ad305c63c93 in CreateFunction (pstate=0x5ad308186310, stmt=0x5ad308204f00) at functioncmds.c:1221

then drop function f() in another session. Then the create function f1() would:

postgres=# CREATE FUNCTION f() RETURNS int LANGUAGE SQL RETURN f() + 1;
ERROR:  cache lookup failed for function 16400

This stuff does appear before we get a chance to call the new depLockAndCheckObject()
function.

I think this is what Tom was referring to in [1]:

"
So the only real fix for this would be to make every object lookup in the entire
system do the sort of dance that's done in RangeVarGetRelidExtended.
"

The fact that those kind of errors appear also somehow ensure that no orphaned
dependencies can be created.

The patch does ensure that no orphaned depencies can occur after those "initial"
look up are done (should the dependent object be dropped).

I'm tempted to not add extra complexity to avoid those kind of errors and keep the
patch as it is. All of those servicing the same goal: no orphaned depencies are
created.

[1]: https://www.postgresql.org/message-id/2872252.1630851337%40sss.pgh.pa.us

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Avoid orphaned objects dependencies, take 3

From
Alexander Lakhin
Date:
Hi Bertrand,

25.04.2024 10:20, Bertrand Drouvot wrote:
> postgres=# CREATE FUNCTION f() RETURNS int LANGUAGE SQL RETURN f() + 1;
> ERROR:  cache lookup failed for function 16400
>
> This stuff does appear before we get a chance to call the new depLockAndCheckObject()
> function.
>
> I think this is what Tom was referring to in [1]:
>
> "
> So the only real fix for this would be to make every object lookup in the entire
> system do the sort of dance that's done in RangeVarGetRelidExtended.
> "
>
> The fact that those kind of errors appear also somehow ensure that no orphaned
> dependencies can be created.

I agree; the only thing that I'd change here, is the error code.

But I've discovered yet another possibility to get a broken dependency.
Please try this script:
res=0
numclients=20
for ((i=1;i<=100;i++)); do
for ((c=1;c<=numclients;c++)); do
   echo "
CREATE SCHEMA s_$c;
CREATE CONVERSION myconv_$c FOR 'LATIN1' TO 'UTF8' FROM iso8859_1_to_utf8;
ALTER CONVERSION myconv_$c SET SCHEMA s_$c;
   " | psql >psql1-$c.log 2>&1 &
   echo "DROP SCHEMA s_$c RESTRICT;" | psql >psql2-$c.log 2>&1 &
done
wait
pg_dump -f db.dump || { echo "on iteration $i"; res=1; break; }
for ((c=1;c<=numclients;c++)); do
   echo "DROP SCHEMA s_$c CASCADE;" | psql >psql3-$c.log 2>&1
done
done
psql -c "SELECT * FROM pg_conversion WHERE connamespace NOT IN (SELECT oid FROM pg_namespace);"

It fails for me (with the v4 patch applied) as follows:
pg_dump: error: schema with OID 16392 does not exist
on iteration 1
   oid  | conname  | connamespace | conowner | conforencoding | contoencoding |      conproc      | condefault
-------+----------+--------------+----------+----------------+---------------+-------------------+------------
  16396 | myconv_6 |        16392 |       10 |              8 |             6 | iso8859_1_to_utf8 | f

Best regards,
Alexander



Re: Avoid orphaned objects dependencies, take 3

From
Bertrand Drouvot
Date:
Hi,

On Tue, Apr 30, 2024 at 08:00:00PM +0300, Alexander Lakhin wrote:
> Hi Bertrand,
> 
> But I've discovered yet another possibility to get a broken dependency.

Thanks for the testing and the report!

> Please try this script:
> res=0
> numclients=20
> for ((i=1;i<=100;i++)); do
> for ((c=1;c<=numclients;c++)); do
>   echo "
> CREATE SCHEMA s_$c;
> CREATE CONVERSION myconv_$c FOR 'LATIN1' TO 'UTF8' FROM iso8859_1_to_utf8;
> ALTER CONVERSION myconv_$c SET SCHEMA s_$c;
>   " | psql >psql1-$c.log 2>&1 &
>   echo "DROP SCHEMA s_$c RESTRICT;" | psql >psql2-$c.log 2>&1 &
> done
> wait
> pg_dump -f db.dump || { echo "on iteration $i"; res=1; break; }
> for ((c=1;c<=numclients;c++)); do
>   echo "DROP SCHEMA s_$c CASCADE;" | psql >psql3-$c.log 2>&1
> done
> done
> psql -c "SELECT * FROM pg_conversion WHERE connamespace NOT IN (SELECT oid FROM pg_namespace);"
> 
> It fails for me (with the v4 patch applied) as follows:
> pg_dump: error: schema with OID 16392 does not exist
> on iteration 1
>   oid  | conname  | connamespace | conowner | conforencoding | contoencoding |      conproc      | condefault
> -------+----------+--------------+----------+----------------+---------------+-------------------+------------
>  16396 | myconv_6 |        16392 |       10 |              8 |             6 | iso8859_1_to_utf8 | f
> 

Thanks for sharing the test, I'm able to reproduce the issue with v4.

Oh I see, your test updates an existing dependency. v4 took care about brand new 
dependencies creation (recordMultipleDependencies()) but forgot to take care
about changing an existing dependency (which is done in another code path:
changeDependencyFor()).

Please find attached v5 that adds:

- a call to the new depLockAndCheckObject() function in changeDependencyFor().
- a test when altering an existing dependency.

With v5 applied, I don't see the issue anymore.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

Re: Avoid orphaned objects dependencies, take 3

From
Alexander Lakhin
Date:
Hi Bertrand,

09.05.2024 15:20, Bertrand Drouvot wrote:
> Oh I see, your test updates an existing dependency. v4 took care about brand new
> dependencies creation (recordMultipleDependencies()) but forgot to take care
> about changing an existing dependency (which is done in another code path:
> changeDependencyFor()).
>
> Please find attached v5 that adds:
>
> - a call to the new depLockAndCheckObject() function in changeDependencyFor().
> - a test when altering an existing dependency.
>
> With v5 applied, I don't see the issue anymore.

Me too. Thank you for the improved version!
I will test the patch in the background, but for now I see no other
issues with it.

Best regards,
Alexander



Re: Avoid orphaned objects dependencies, take 3

From
Michael Paquier
Date:
On Thu, May 09, 2024 at 12:20:51PM +0000, Bertrand Drouvot wrote:
> Oh I see, your test updates an existing dependency. v4 took care about brand new
> dependencies creation (recordMultipleDependencies()) but forgot to take care
> about changing an existing dependency (which is done in another code path:
> changeDependencyFor()).
>
> Please find attached v5 that adds:
>
> - a call to the new depLockAndCheckObject() function in changeDependencyFor().
> - a test when altering an existing dependency.
>
> With v5 applied, I don't see the issue anymore.

+            if (object_description)
+                ereport(ERROR, errmsg("%s does not exist", object_description));
+            else
+                ereport(ERROR, errmsg("a dependent object does not ex

This generates an internal error code.  Is that intended?

--- /dev/null
+++ b/src/test/modules/test_dependencies_locks/specs/test_dependencies_locks.spec

This introduces a module with only one single spec.  I could get
behind an extra module if splitting the tests into more specs makes
sense or if there is a restriction on installcheck.  However, for
one spec file filed with a bunch of objects, and note that I'm OK to
let this single spec grow more for this range of tests, it seems to me
that this would be better under src/test/isolation/.

+        if (use_dirty_snapshot)
+        {
+            InitDirtySnapshot(DirtySnapshot);
+            snapshot = &DirtySnapshot;
+        }
+        else
+            snapshot = NULL;

I'm wondering if Robert has a comment about that.  It looks backwards
in a world where we try to encourage MVCC snapshots for catalog
scans (aka 568d4138c646), especially for the part related to
dependency.c and ObjectAddresses.
--
Michael

Attachment

Re: Avoid orphaned objects dependencies, take 3

From
Bertrand Drouvot
Date:
Hi,

On Wed, May 15, 2024 at 10:14:09AM +0900, Michael Paquier wrote:
> On Thu, May 09, 2024 at 12:20:51PM +0000, Bertrand Drouvot wrote:
> > Oh I see, your test updates an existing dependency. v4 took care about brand new 
> > dependencies creation (recordMultipleDependencies()) but forgot to take care
> > about changing an existing dependency (which is done in another code path:
> > changeDependencyFor()).
> > 
> > Please find attached v5 that adds:
> > 
> > - a call to the new depLockAndCheckObject() function in changeDependencyFor().
> > - a test when altering an existing dependency.
> > 
> > With v5 applied, I don't see the issue anymore.
> 
> +            if (object_description)
> +                ereport(ERROR, errmsg("%s does not exist", object_description));
> +            else
> +                ereport(ERROR, errmsg("a dependent object does not ex
> 
> This generates an internal error code.  Is that intended?

Thanks for looking at it!

Yes, it's like when say you want to create an object in a schema that does not
exist (see get_namespace_oid()).

> --- /dev/null
> +++ b/src/test/modules/test_dependencies_locks/specs/test_dependencies_locks.spec 
> 
> This introduces a module with only one single spec.  I could get
> behind an extra module if splitting the tests into more specs makes
> sense or if there is a restriction on installcheck.  However, for 
> one spec file filed with a bunch of objects, and note that I'm OK to
> let this single spec grow more for this range of tests, it seems to me
> that this would be better under src/test/isolation/.

Yeah, I was not sure about this one (the location is from take 2 mentioned
up-thread). I'll look at moving the tests to src/test/isolation/.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Avoid orphaned objects dependencies, take 3

From
Bertrand Drouvot
Date:
Hi,

On Tue, May 14, 2024 at 03:00:00PM +0300, Alexander Lakhin wrote:
> Hi Bertrand,
> 
> 09.05.2024 15:20, Bertrand Drouvot wrote:
> > Oh I see, your test updates an existing dependency. v4 took care about brand new
> > dependencies creation (recordMultipleDependencies()) but forgot to take care
> > about changing an existing dependency (which is done in another code path:
> > changeDependencyFor()).
> > 
> > Please find attached v5 that adds:
> > 
> > - a call to the new depLockAndCheckObject() function in changeDependencyFor().
> > - a test when altering an existing dependency.
> > 
> > With v5 applied, I don't see the issue anymore.
> 
> Me too. Thank you for the improved version!
> I will test the patch in the background, but for now I see no other
> issues with it.

Thanks for confirming!

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Avoid orphaned objects dependencies, take 3

From
Bertrand Drouvot
Date:
Hi,

On Wed, May 15, 2024 at 08:31:43AM +0000, Bertrand Drouvot wrote:
> Hi,
> 
> On Wed, May 15, 2024 at 10:14:09AM +0900, Michael Paquier wrote:
> > +++ b/src/test/modules/test_dependencies_locks/specs/test_dependencies_locks.spec 
> > 
> > This introduces a module with only one single spec.  I could get
> > behind an extra module if splitting the tests into more specs makes
> > sense or if there is a restriction on installcheck.  However, for 
> > one spec file filed with a bunch of objects, and note that I'm OK to
> > let this single spec grow more for this range of tests, it seems to me
> > that this would be better under src/test/isolation/.
> 
> Yeah, I was not sure about this one (the location is from take 2 mentioned
> up-thread). I'll look at moving the tests to src/test/isolation/.

Please find attached v6 (only diff with v5 is moving the tests as suggested
above).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

Re: Avoid orphaned objects dependencies, take 3

From
Alexander Lakhin
Date:
Hello Bertrand,

15.05.2024 11:31, Bertrand Drouvot wrote:
> On Wed, May 15, 2024 at 10:14:09AM +0900, Michael Paquier wrote:
>
>> +            if (object_description)
>> +                ereport(ERROR, errmsg("%s does not exist", object_description));
>> +            else
>> +                ereport(ERROR, errmsg("a dependent object does not ex
>>
>> This generates an internal error code.  Is that intended?
> Yes, it's like when say you want to create an object in a schema that does not
> exist (see get_namespace_oid()).

AFAICS, get_namespace_oid() throws not ERRCODE_INTERNAL_ERROR,
but ERRCODE_UNDEFINED_SCHEMA:

# SELECT regtype('unknown_schema.type');
ERROR:  schema "unknown_schema" does not exist
LINE 1: SELECT regtype('unknown_schema.type');
                        ^
# \echo :LAST_ERROR_SQLSTATE
3F000

Probably, it's worth to avoid ERRCODE_INTERNAL_ERROR here in light of [1]
and [2], as these errors are not that abnormal (not Assert-like).

[1] https://www.postgresql.org/message-id/Zic_GNgos5sMxKoa%40paquier.xyz
[2] https://commitfest.postgresql.org/48/4735/

Best regards,
Alexander



Re: Avoid orphaned objects dependencies, take 3

From
Bertrand Drouvot
Date:
Hi,

On Sun, May 19, 2024 at 11:00:00AM +0300, Alexander Lakhin wrote:
> Hello Bertrand,
> 
> Probably, it's worth to avoid ERRCODE_INTERNAL_ERROR here in light of [1]
> and [2], as these errors are not that abnormal (not Assert-like).
> 
> [1] https://www.postgresql.org/message-id/Zic_GNgos5sMxKoa%40paquier.xyz
> [2] https://commitfest.postgresql.org/48/4735/
> 

Thanks for mentioning the above examples, I agree that it's worth to avoid
ERRCODE_INTERNAL_ERROR here: please find attached v7 that makes use of a new
ERRCODE: ERRCODE_DEPENDENT_OBJECTS_DOES_NOT_EXIST 

I thought about this name as it is close enough to the already existing 
"ERRCODE_DEPENDENT_OBJECTS_STILL_EXIST" but I'm open to suggestion too.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

Re: Avoid orphaned objects dependencies, take 3

From
Robert Haas
Date:
On Wed, May 15, 2024 at 6:31 AM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
> Please find attached v6 (only diff with v5 is moving the tests as suggested
> above).

I don't immediately know what to think about this patch. I've known
about this issue for a long time, but I didn't think this was how we
would fix it.

I think the problem here is basically that we don't lock namespaces
(schemas) when we're adding and removing things from the schema. So I
assumed that if we ever did something about this, what we would do
would be add a bunch of calls to lock schemas to the appropriate parts
of the code. What you've done here instead is add locking at a much
lower level - whenever we are adding a dependency on an object, we
lock the object. The advantage of that approach is that we definitely
won't miss anything. The disadvantage of that approach is that it
means we have some very low-level code taking locks, which means it's
not obvious which operations are taking what locks. Maybe it could
even result in some redundancy, like the higher-level code taking a
lock also (possibly in a different mode) and then this code taking
another one.

I haven't gone through the previous threads; it sounds like there's
already been some discussion of this, but I'm just telling you how it
strikes me on first look.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Avoid orphaned objects dependencies, take 3

From
Bertrand Drouvot
Date:
Hi,

On Tue, May 21, 2024 at 08:53:06AM -0400, Robert Haas wrote:
> On Wed, May 15, 2024 at 6:31 AM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> > Please find attached v6 (only diff with v5 is moving the tests as suggested
> > above).
> 
> I don't immediately know what to think about this patch.

Thanks for looking at it!

> I've known about this issue for a long time, but I didn't think this was how we
> would fix it.

I started initially with [1] but it was focusing on function-schema only.

Then I proposed [2] making use of a dirty snapshot when recording the dependency.
But this approach appeared to be "scary" and it was still failing to close
some race conditions.

Then, Tom proposed another approach in [2] which is that "creation DDL will have
to take a lock on each referenced object that'd conflict with a lock taken by DROP".
This is the one the current patch is trying to implement.

> What you've done here instead is add locking at a much
> lower level - whenever we are adding a dependency on an object, we
> lock the object. The advantage of that approach is that we definitely
> won't miss anything.

Right, as there is much more than the ones related to schemas, for example:

- function and arg type
- function and return type
- function and function
- domain and domain
- table and type
- server and foreign data wrapper

to name a few.

> The disadvantage of that approach is that it
> means we have some very low-level code taking locks, which means it's
> not obvious which operations are taking what locks.

Right, but the new operations are "only" the ones leading to recording or altering
a dependency.

> Maybe it could
> even result in some redundancy, like the higher-level code taking a
> lock also (possibly in a different mode) and then this code taking
> another one.

The one that is added here is in AccessShareLock mode. It could conflict with
the ones in AccessExclusiveLock means (If I'm not missing any):

- AcquireDeletionLock(): which is exactly what we want
- get_object_address()
    - get_object_address_rv()
        - ExecAlterObjectDependsStmt()
    - ExecRenameStmt()
    - ExecAlterObjectDependsStmt()
    - ExecAlterOwnerStmt()
    - RemoveObjects()
- AlterPublication()

I think there is 2 cases here:

First case: the "conflicting" lock mode is for one of our own lock then LockCheckConflicts()
would report this as a NON conflict.

Second case: the "conflicting" lock mode is NOT for one of our own lock then LockCheckConflicts()
would report a conflict. But I've the feeling that the existing code would
already lock those sessions.

One example where it would be the case:

Session 1: doing "BEGIN; ALTER FUNCTION noschemas.foo2() SET SCHEMA alterschema" would
acquire the lock in AccessExclusiveLock during ExecAlterObjectSchemaStmt()->get_object_address()->LockDatabaseObject()
(in the existing code and before the new call that would occur through changeDependencyFor()->depLockAndCheckObject()
with the patch in place).

Then, session 2: doing "alter function noschemas.foo2() owner to newrole;"
would be locked in the existing code while doing ExecAlterOwnerStmt()->get_object_address()->LockDatabaseObject()).

Means that in this example, the new lock this patch is introducing would not be
responsible of session 2 beging locked.

Was your concern about "First case" or "Second case" or both?

[1]: https://www.postgresql.org/message-id/flat/5a9daaae-5538-b209-6279-e903c3ea2157%40amazon.com
[2]: https://www.postgresql.org/message-id/flat/8369ff70-0e31-f194-2954-787f4d9e21dd%40amazon.com

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Avoid orphaned objects dependencies, take 3

From
Robert Haas
Date:
On Wed, May 22, 2024 at 6:21 AM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
> I started initially with [1] but it was focusing on function-schema only.

Yeah, that's what I thought we would want to do. And then just extend
that to the other cases.

> Then I proposed [2] making use of a dirty snapshot when recording the dependency.
> But this approach appeared to be "scary" and it was still failing to close
> some race conditions.

The current patch still seems to be using dirty snapshots for some
reason, which struck me as a bit odd. My intuition is that if we're
relying on dirty snapshots to solve problems, we likely haven't solved
the problems correctly, which seems consistent with your statement
about "failing to close some race conditions". But I don't think I
understand the situation well enough to be sure just yet.

> Then, Tom proposed another approach in [2] which is that "creation DDL will have
> to take a lock on each referenced object that'd conflict with a lock taken by DROP".
> This is the one the current patch is trying to implement.

It's a clever idea, but I'm not sure that I like it.

> I think there is 2 cases here:
>
> First case: the "conflicting" lock mode is for one of our own lock then LockCheckConflicts()
> would report this as a NON conflict.
>
> Second case: the "conflicting" lock mode is NOT for one of our own lock then LockCheckConflicts()
> would report a conflict. But I've the feeling that the existing code would
> already lock those sessions.
>
> Was your concern about "First case" or "Second case" or both?

The second case, I guess. It's bad to take a weaker lock first and
then a stronger lock on the same object later, because it can lead to
deadlocks that would have been avoided if the stronger lock had been
taken at the outset. Here, it seems like things would happen in the
other order: if we took two locks, we'd probably take the stronger
lock in the higher-level code and then the weaker lock in the
dependency code. That shouldn't break anything; it's just a bit
inefficient. My concern was really more about the maintainability of
the code. I fear that if we add code that takes heavyweight locks in
surprising places, we might later find the behavior difficult to
reason about.

Tom, what is your thought about that concern?

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Avoid orphaned objects dependencies, take 3

From
Bertrand Drouvot
Date:
Hi,

On Wed, May 22, 2024 at 10:48:12AM -0400, Robert Haas wrote:
> On Wed, May 22, 2024 at 6:21 AM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> > I started initially with [1] but it was focusing on function-schema only.
> 
> Yeah, that's what I thought we would want to do. And then just extend
> that to the other cases.
> 
> > Then I proposed [2] making use of a dirty snapshot when recording the dependency.
> > But this approach appeared to be "scary" and it was still failing to close
> > some race conditions.
> 
> The current patch still seems to be using dirty snapshots for some
> reason, which struck me as a bit odd. My intuition is that if we're
> relying on dirty snapshots to solve problems, we likely haven't solved
> the problems correctly, which seems consistent with your statement
> about "failing to close some race conditions". But I don't think I
> understand the situation well enough to be sure just yet.

The reason why we are using a dirty snapshot here is for the cases where we are
recording a dependency on a referenced object that we are creating at the same
time behind the scene (for example, creating a composite type while creating
a relation). Without the dirty snapshot, then the object we are creating behind
the scene (the composite type) would not be visible and we would wrongly assume
that it has been dropped.

Note that the usage of the dirty snapshot is only when the object is first
reported as "non existing" by the new ObjectByIdExist() function.

> > I think there is 2 cases here:
> >
> > First case: the "conflicting" lock mode is for one of our own lock then LockCheckConflicts()
> > would report this as a NON conflict.
> >
> > Second case: the "conflicting" lock mode is NOT for one of our own lock then LockCheckConflicts()
> > would report a conflict. But I've the feeling that the existing code would
> > already lock those sessions.
> >
> > Was your concern about "First case" or "Second case" or both?
> 
> The second case, I guess. It's bad to take a weaker lock first and
> then a stronger lock on the same object later, because it can lead to
> deadlocks that would have been avoided if the stronger lock had been
> taken at the outset.

In the example I shared up-thread that would be the opposite: the Session 1 would
take an AccessExclusiveLock lock on the object before taking an AccessShareLock
during changeDependencyFor().

> Here, it seems like things would happen in the
> other order: if we took two locks, we'd probably take the stronger
> lock in the higher-level code and then the weaker lock in the
> dependency code.

Yeah, I agree.

> That shouldn't break anything; it's just a bit
> inefficient.

Yeah, the second lock is useless in that case (like in the example up-thread).

> My concern was really more about the maintainability of
> the code. I fear that if we add code that takes heavyweight locks in
> surprising places, we might later find the behavior difficult to
> reason about.
>

I think I understand your concern about code maintainability but I'm not sure
that adding locks while recording a dependency is that surprising.

> Tom, what is your thought about that concern?

+1

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Avoid orphaned objects dependencies, take 3

From
Robert Haas
Date:
On Thu, May 23, 2024 at 12:19 AM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
> The reason why we are using a dirty snapshot here is for the cases where we are
> recording a dependency on a referenced object that we are creating at the same
> time behind the scene (for example, creating a composite type while creating
> a relation). Without the dirty snapshot, then the object we are creating behind
> the scene (the composite type) would not be visible and we would wrongly assume
> that it has been dropped.

The usual reason for using a dirty snapshot is that you want to see
uncommitted work by other transactions. It sounds like you're saying
you just need to see uncommitted work by the same transaction. If
that's true, I think using HeapTupleSatisfiesSelf would be clearer. Or
maybe we just need to put CommandCounterIncrement() calls in the right
places to avoid having the problem in the first place. Or maybe this
is another sign that we're doing the work at the wrong level.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Avoid orphaned objects dependencies, take 3

From
Bertrand Drouvot
Date:
Hi,

On Thu, May 23, 2024 at 02:10:54PM -0400, Robert Haas wrote:
> On Thu, May 23, 2024 at 12:19 AM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> > The reason why we are using a dirty snapshot here is for the cases where we are
> > recording a dependency on a referenced object that we are creating at the same
> > time behind the scene (for example, creating a composite type while creating
> > a relation). Without the dirty snapshot, then the object we are creating behind
> > the scene (the composite type) would not be visible and we would wrongly assume
> > that it has been dropped.
> 
> The usual reason for using a dirty snapshot is that you want to see
> uncommitted work by other transactions. It sounds like you're saying
> you just need to see uncommitted work by the same transaction.

Right.

> If that's true, I think using HeapTupleSatisfiesSelf would be clearer.

Oh thanks! I did not know about the SNAPSHOT_SELF snapshot type (I should have
check all the snapshot types first though) and that's exactly what is needed here.

Please find attached v8 making use of SnapshotSelf instead of a dirty snapshot.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

Re: Avoid orphaned objects dependencies, take 3

From
Bertrand Drouvot
Date:
Hi,

On Thu, May 23, 2024 at 02:10:54PM -0400, Robert Haas wrote:
> On Thu, May 23, 2024 at 12:19 AM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> > The reason why we are using a dirty snapshot here is for the cases where we are
> > recording a dependency on a referenced object that we are creating at the same
> > time behind the scene (for example, creating a composite type while creating
> > a relation). Without the dirty snapshot, then the object we are creating behind
> > the scene (the composite type) would not be visible and we would wrongly assume
> > that it has been dropped.
> 
> The usual reason for using a dirty snapshot is that you want to see
> uncommitted work by other transactions. It sounds like you're saying
> you just need to see uncommitted work by the same transaction. If
> that's true, I think using HeapTupleSatisfiesSelf would be clearer. Or
> maybe we just need to put CommandCounterIncrement() calls in the right
> places to avoid having the problem in the first place. Or maybe this
> is another sign that we're doing the work at the wrong level.

Thanks for having discussed your concern with Tom last week during pgconf.dev
and shared the outcome to me. I understand your concern regarding code
maintainability with the current approach.

Please find attached v9 that:

- Acquire the lock and check for object existence at an upper level, means before
calling recordDependencyOn() and recordMultipleDependencies().

- Get rid of the SNAPSHOT_SELF snapshot usage and relies on
CommandCounterIncrement() instead to ensure new entries are visible when
we check for object existence (for the cases where we create additional object
behind the scene: like composite type while creating a relation).

- Add an assertion in recordMultipleDependencies() to ensure that we locked the
object before recording the dependency (to ensure we don't miss any cases now that
the lock is acquired at an upper level).

A few remarks:

My first attempt has been to move eliminate_duplicate_dependencies() out of
record_object_address_dependencies() so that we get the calls in this order:

eliminate_duplicate_dependencies()
depLockAndCheckObjects()
record_object_address_dependencies()

What I'm doing instead in v9 is to rename record_object_address_dependencies()
to lock_record_object_address_dependencies() and add depLockAndCheckObjects()
in it at the right place. That way the caller of
[lock_]record_object_address_dependencies() is not responsible of calling
eliminate_duplicate_dependencies() (which would have been the case with my first
attempt).

We need to setup the LOCKTAG before calling the new Assert in
recordMultipleDependencies(). So, using "#ifdef USE_ASSERT_CHECKING" here to
not setup the LOCKTAG on a non Assert enabled build.

v9 is more invasive (as it changes code in much more places) than v8 but it is
easier to follow (as it is now clear where the new lock is acquired).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

Re: Avoid orphaned objects dependencies, take 3

From
Robert Haas
Date:
On Thu, Jun 6, 2024 at 1:56 AM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
> v9 is more invasive (as it changes code in much more places) than v8 but it is
> easier to follow (as it is now clear where the new lock is acquired).

Hmm, this definitely isn't what I had in mind. Possibly that's a sign
that what I had in mind was dumb, but for sure it's not what I
imagined. What I thought you were going to do was add calls like
LockDatabaseObject(NamespaceRelationId, schemaid, 0, AccessShareLock)
in various places, or perhaps LockRelationOid(reloid,
AccessShareLock), or whatever the case may be. Here you've got stuff
like this:

- record_object_address_dependencies(&conobject, addrs_auto,
-    DEPENDENCY_AUTO);
+ lock_record_object_address_dependencies(&conobject, addrs_auto,
+ DEPENDENCY_AUTO);

...which to me looks like the locking is still pushed down inside the
dependency code.

And you also have stuff like this:

  ObjectAddressSet(referenced, RelationRelationId, childTableId);
+ depLockAndCheckObject(&referenced);
  recordDependencyOn(&depender, &referenced, DEPENDENCY_PARTITION_SEC);

But in depLockAndCheckObject you have:

+ if (object->classId == RelationRelationId || object->classId ==
AuthMemRelationId)
+ return;

That doesn't seem right, because then it seems like the call isn't
doing anything, but there isn't really any reason for it to not be
doing anything. If we're dropping a dependency on a table, then it
seems like we need to have a lock on that table. Presumably the reason
why we don't end up with dangling dependencies in such cases now is
because we're careful about doing LockRelation() in the right places,
but we're not similarly careful about other operations e.g.
ConstraintSetParentConstraint is called by DefineIndex which calls
table_open(childRelId, ...) first, but there's no logic in DefineIndex
to lock the constraint.

Thoughts?

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Avoid orphaned objects dependencies, take 3

From
Bertrand Drouvot
Date:
Hi,

On Thu, Jun 06, 2024 at 04:00:23PM -0400, Robert Haas wrote:
> On Thu, Jun 6, 2024 at 1:56 AM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> > v9 is more invasive (as it changes code in much more places) than v8 but it is
> > easier to follow (as it is now clear where the new lock is acquired).
> 
> Hmm, this definitely isn't what I had in mind. Possibly that's a sign
> that what I had in mind was dumb, but for sure it's not what I
> imagined. What I thought you were going to do was add calls like
> LockDatabaseObject(NamespaceRelationId, schemaid, 0, AccessShareLock)
> in various places, or perhaps LockRelationOid(reloid,
> AccessShareLock), or whatever the case may be.

I see what you’re saying, doing things like:

LockDatabaseObject(TypeRelationId, returnType, 0, AccessShareLock);
in ProcedureCreate() for example.

> Here you've got stuff
> like this:
> 
> - record_object_address_dependencies(&conobject, addrs_auto,
> -    DEPENDENCY_AUTO);
> + lock_record_object_address_dependencies(&conobject, addrs_auto,
> + DEPENDENCY_AUTO);
> 
> ...which to me looks like the locking is still pushed down inside the
> dependency code.

Yes but it’s now located in places where, I think, it’s easier to understand
what’s going on (as compare to v8), except maybe for:

recordDependencyOnExpr()
makeOperatorDependencies()
GenerateTypeDependencies()
makeParserDependencies()
makeDictionaryDependencies()
makeTSTemplateDependencies()
makeConfigurationDependencies()

but probably for:

heap_create_with_catalog()
StorePartitionKey()
index_create()
AggregateCreate()
CastCreate()
CreateConstraintEntry()
ProcedureCreate()
RangeCreate()
InsertExtensionTuple()
CreateTransform()
CreateProceduralLanguage()

The reasons I keep it linked to the dependency code are:

- To ensure we don’t miss anything (well, with the new Assert in place that’s
probably a tangential argument)

- It’s not only about locking the object: it’s also about 1) verifying the object
is pinned, 2) checking it still exists and 3) provide a description in the error
message if we can (in case the object does not exist anymore). Relying on an
already build object (in the dependency code) avoid to 1) define the object(s)
one more time or 2) create new functions that would do the same as isObjectPinned()
and getObjectDescription() with a different set of arguments.

That may sounds like weak arguments but it has been my reasoning.

Do you still find the code hard to maintain with v9?

> 
> And you also have stuff like this:
> 
>   ObjectAddressSet(referenced, RelationRelationId, childTableId);
> + depLockAndCheckObject(&referenced);
>   recordDependencyOn(&depender, &referenced, DEPENDENCY_PARTITION_SEC);
> 
> But in depLockAndCheckObject you have:
> 
> + if (object->classId == RelationRelationId || object->classId ==
> AuthMemRelationId)
> + return;
> 
> That doesn't seem right, because then it seems like the call isn't
> doing anything, but there isn't really any reason for it to not be
> doing anything. If we're dropping a dependency on a table, then it
> seems like we need to have a lock on that table. Presumably the reason
> why we don't end up with dangling dependencies in such cases now is
> because we're careful about doing LockRelation() in the right places,

Yeah, that's what I think: we're already careful when we deal with relations.

> but we're not similarly careful about other operations e.g.
> ConstraintSetParentConstraint is called by DefineIndex which calls
> table_open(childRelId, ...) first, but there's no logic in DefineIndex
> to lock the constraint.

table_open(childRelId, ...) would lock any "ALTER TABLE <childRelId> DROP CONSTRAINT"
already. Not sure I understand your concern here.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Avoid orphaned objects dependencies, take 3

From
Robert Haas
Date:
On Fri, Jun 7, 2024 at 4:41 AM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
> Do you still find the code hard to maintain with v9?

I don't think it substantially changes my concerns as compared with
the earlier version.

> > but we're not similarly careful about other operations e.g.
> > ConstraintSetParentConstraint is called by DefineIndex which calls
> > table_open(childRelId, ...) first, but there's no logic in DefineIndex
> > to lock the constraint.
>
> table_open(childRelId, ...) would lock any "ALTER TABLE <childRelId> DROP CONSTRAINT"
> already. Not sure I understand your concern here.

I believe this is not true. This would take a lock on the table, not
the constraint itself.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Avoid orphaned objects dependencies, take 3

From
Bertrand Drouvot
Date:
Hi,

On Thu, Jun 13, 2024 at 10:49:34AM -0400, Robert Haas wrote:
> On Fri, Jun 7, 2024 at 4:41 AM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> > Do you still find the code hard to maintain with v9?
> 
> I don't think it substantially changes my concerns as compared with
> the earlier version.

Thanks for the feedback, I'll give it more thoughts.

> 
> > > but we're not similarly careful about other operations e.g.
> > > ConstraintSetParentConstraint is called by DefineIndex which calls
> > > table_open(childRelId, ...) first, but there's no logic in DefineIndex
> > > to lock the constraint.
> >
> > table_open(childRelId, ...) would lock any "ALTER TABLE <childRelId> DROP CONSTRAINT"
> > already. Not sure I understand your concern here.
> 
> I believe this is not true. This would take a lock on the table, not
> the constraint itself.

I agree that it would not lock the constraint itself. What I meant to say is that
, nevertheless, the constraint can not be dropped. Indeed, the "ALTER TABLE"
necessary to drop the constraint (ALTER TABLE <childRelId> DROP CONSTRAINT) would
be locked by the table_open(childRelId, ...).

That's why I don't understand your concern with this particular example. But
anyway, I'll double check your related concern:

+ if (object->classId == RelationRelationId || object->classId ==
AuthMemRelationId)
+ return;

in depLockAndCheckObject(). 

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Avoid orphaned objects dependencies, take 3

From
Robert Haas
Date:
On Thu, Jun 13, 2024 at 12:52 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
> > > table_open(childRelId, ...) would lock any "ALTER TABLE <childRelId> DROP CONSTRAINT"
> > > already. Not sure I understand your concern here.
> >
> > I believe this is not true. This would take a lock on the table, not
> > the constraint itself.
>
> I agree that it would not lock the constraint itself. What I meant to say is that
> , nevertheless, the constraint can not be dropped. Indeed, the "ALTER TABLE"
> necessary to drop the constraint (ALTER TABLE <childRelId> DROP CONSTRAINT) would
> be locked by the table_open(childRelId, ...).

Ah, right. So, I was assuming that, with either this version of your
patch or the earlier version, we'd end up locking the constraint
itself. Was I wrong about that?

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Avoid orphaned objects dependencies, take 3

From
Bertrand Drouvot
Date:
Hi,

On Thu, Jun 13, 2024 at 02:27:45PM -0400, Robert Haas wrote:
> On Thu, Jun 13, 2024 at 12:52 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> > > > table_open(childRelId, ...) would lock any "ALTER TABLE <childRelId> DROP CONSTRAINT"
> > > > already. Not sure I understand your concern here.
> > >
> > > I believe this is not true. This would take a lock on the table, not
> > > the constraint itself.
> >
> > I agree that it would not lock the constraint itself. What I meant to say is that
> > , nevertheless, the constraint can not be dropped. Indeed, the "ALTER TABLE"
> > necessary to drop the constraint (ALTER TABLE <childRelId> DROP CONSTRAINT) would
> > be locked by the table_open(childRelId, ...).
> 
> Ah, right. So, I was assuming that, with either this version of your
> patch or the earlier version, we'd end up locking the constraint
> itself. Was I wrong about that?

The child contraint itself is not locked when going through
ConstraintSetParentConstraint().

While at it, let's look at a full example and focus on your concern.

Let's do that with this gdb file:

"
$ cat gdb.txt
b dependency.c:1542
command 1
        printf "Will return for: classId %d and objectId %d\n", object->classId, object->objectId
        c
end
b dependency.c:1547 if object->classId == 2606
command 2
        printf "Will lock constraint: classId %d and objectId %d\n", object->classId, object->objectId
        c
end
"

knowing that:

"
Line 1542 is the return here in depLockAndCheckObject() (your concern):

    if (object->classId == RelationRelationId || object->classId == AuthMemRelationId)
        return;

Line 1547 is the lock here in depLockAndCheckObject():

    /* assume we should lock the whole object not a sub-object */
    LockDatabaseObject(object->classId, object->objectId, 0, AccessShareLock);
"

So, with gdb attached to a session let's:

1. Create the parent relation

CREATE TABLE upsert_test (
    a   INT,
    b   TEXT
) PARTITION BY LIST (a);

gdb produces:

---
Will return for: classId 1259 and objectId 16384
Will return for: classId 1259 and objectId 16384
---

Oid 16384 is upsert_test, so I think the return (dependency.c:1542) is fine as
we are creating the object (it can't be dropped as not visible to anyone else).

2. Create another relation (will be the child)

CREATE TABLE upsert_test_2 (b TEXT, a int);

gdb produces:

---
Will return for: classId 1259 and objectId 16391
Will return for: classId 1259 and objectId 16394
Will return for: classId 1259 and objectId 16394
Will return for: classId 1259 and objectId 16391
---

Oid 16391 is upsert_test_2
Oid 16394 is pg_toast_16391

so I think the return (dependency.c:1542) is fine as we are creating those
objects (can't be dropped as not visible to anyone else).

3. Attach the partition

ALTER TABLE upsert_test ATTACH PARTITION upsert_test_2 FOR VALUES IN (2);

gdb produces:

---
Will return for: classId 1259 and objectId 16384
---

That's fine because we'd already had locked the relation 16384 through
AlterTableLookupRelation()->RangeVarGetRelidExtended()->LockRelationOid().

4. Add a constraint on the child relation

ALTER TABLE upsert_test_2 add constraint bdtc2 UNIQUE (a);

gdb produces:

---
Will return for: classId 1259 and objectId 16391
Will lock constraint: classId 2606 and objectId 16397
---

That's fine because we'd already had locked the relation 16391 through
AlterTableLookupRelation()->RangeVarGetRelidExtended()->LockRelationOid().

Oid 16397 is the constraint we're creating (bdtc2).

5. Add a constraint on the parent relation (this goes through
ConstraintSetParentConstraint())

ALTER TABLE upsert_test add constraint bdtc1 UNIQUE (a);

gdb produces:

---
Will return for: classId 1259 and objectId 16384
Will lock constraint: classId 2606 and objectId 16399
Will return for: classId 1259 and objectId 16398
Will return for: classId 1259 and objectId 16391
Will lock constraint: classId 2606 and objectId 16399
Will return for: classId 1259 and objectId 16391
---

Regarding "Will return for: classId 1259 and objectId 16384":
That's fine because we'd already had locked the relation 16384 through
AlterTableLookupRelation()->RangeVarGetRelidExtended()->LockRelationOid().

Regarding "Will lock constraint: classId 2606 and objectId 16399":
Oid 16399 is the constraint that we're creating.

Regarding "Will return for: classId 1259 and objectId 16398":
That's fine because Oid 16398 is an index that we're creating while creating
the constraint (so the index can't be dropped as not visible to anyone else).

Regarding "Will return for: classId 1259 and objectId 16391":
That's fine because 16384 we'd be already locked (as mentioned above). And
I think that's fine because trying to drop "upsert_test_2" (aka 16391) would produce
RemoveRelations()->RangeVarGetRelidExtended()->RangeVarCallbackForDropRelation()
->LockRelationOid(relid=16384, lockmode=8) and so would be locked.

Regarding this example, I don't think that the return in depLockAndCheckObject():

"
if (object->classId == RelationRelationId || object->classId == AuthMemRelationId)
     return;
"

is an issue. Indeed, the example above shows it would return for an object that
we'd be creating (so not visible to anyone else) or for an object that we'd
already have locked.

Is it an issue outside of this example?: I've the feeling it's not as we're
already careful when we deal with relations. That said, to be on the safe side
we could get rid of this return and make use of LockRelationOid() instead.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Avoid orphaned objects dependencies, take 3

From
Bertrand Drouvot
Date:
Hi,

On Thu, Jun 13, 2024 at 04:52:09PM +0000, Bertrand Drouvot wrote:
> Hi,
> 
> On Thu, Jun 13, 2024 at 10:49:34AM -0400, Robert Haas wrote:
> > On Fri, Jun 7, 2024 at 4:41 AM Bertrand Drouvot
> > <bertranddrouvot.pg@gmail.com> wrote:
> > > Do you still find the code hard to maintain with v9?
> > 
> > I don't think it substantially changes my concerns as compared with
> > the earlier version.
> 
> Thanks for the feedback, I'll give it more thoughts.

Please find attached v10 that puts the object locking outside of the dependency
code.

It's done that way except for:

recordDependencyOnExpr() 
recordDependencyOnSingleRelExpr()
makeConfigurationDependencies()

The reason is that I think that it would need part of the logic that his inside
the above functions to be duplicated and I'm not sure that's worth it.

For example, we would probably need to:

- make additional calls to find_expr_references_walker() 
- make additional scan on the config map

It's also not done outside of recordDependencyOnCurrentExtension() as:

1. I think it is clear enough that way (as it is clear that the lock is taken on
a ExtensionRelationId object class).
2. why to include "commands/extension.h" in more places (locking would
depend of "creating_extension" and "CurrentExtensionObject"), while 1.?

Remarks:

1. depLockAndCheckObject() and friends in v9 have been renamed to
LockNotPinnedObject() and friends (as the vast majority of their calls are now
done outside of the dependency code).

2. regarding the concern around RelationRelationId (discussed in [1]), v10 adds
a comment "XXX Do we need a lock for RelationRelationId?" at the places we
may want to lock this object class. I did not think about it yet (but will do),
I only added this comment at multiple places.

I think that v10 is easier to follow (as compare to v9) as we can easily see for
which object class we'll put a lock on.

Thoughts?

[1]: https://www.postgresql.org/message-id/Zmv3TPfJAyQXhIdu%40ip-10-97-1-34.eu-west-3.compute.internal

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

Re: Avoid orphaned objects dependencies, take 3

From
Robert Haas
Date:
On Fri, Jun 14, 2024 at 3:54 AM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
> > Ah, right. So, I was assuming that, with either this version of your
> > patch or the earlier version, we'd end up locking the constraint
> > itself. Was I wrong about that?
>
> The child contraint itself is not locked when going through
> ConstraintSetParentConstraint().
>
> While at it, let's look at a full example and focus on your concern.

I'm not at the point of having a concern yet, honestly. I'm trying to
understand the design ideas. The commit message just says that we take
a conflicting lock, but it doesn't mention which object types that
principle does or doesn't apply to. I think the idea of skipping it
for cases where it's redundant with the relation lock could be the
right idea, but if that's what we're doing, don't we need to explain
the principle somewhere? And shouldn't we also apply it across all
object types that have the same property?

Along the same lines:

+ /*
+ * Those don't rely on LockDatabaseObject() when being dropped (see
+ * AcquireDeletionLock()). Also it looks like they can not produce
+ * orphaned dependent objects when being dropped.
+ */
+ if (object->classId == RelationRelationId || object->classId ==
AuthMemRelationId)
+ return;

"It looks like X cannot happen" is not confidence-inspiring. At the
very least, a better comment is needed here. But also, that relation
has no exception for AuthMemRelationId, only for RelationRelationId.
And also, the exception for RelationRelationId doesn't imply that we
don't need a conflicting lock here: the special case for
RelationRelationId in AcquireDeletionLock() is necessary because the
lock tag format is different for relations than for other database
objects, not because we don't need a lock at all. If the handling here
were really symmetric with what AcquireDeletionLock(), the coding
would be to call either LockRelationOid() or LockDatabaseObject()
depending on whether classid == RelationRelationId. Now, that isn't
actually necessary, because we already have relation-locking calls
elsewhere, but my point is that the rationale this commit gives for
WHY it isn't necessary seems to me to be wrong in multiple ways.

So to try to sum up here: I'm not sure I agree with this design. But I
also feel like the design is not as clear and consistently implemented
as it could be. So even if I just ignored the question of whether it's
the right design, it feels like we're a ways from having something
potentially committable here, because of issues like the ones I
mentioned in the last paragraph.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Avoid orphaned objects dependencies, take 3

From
Bertrand Drouvot
Date:
Hi,

On Mon, Jun 17, 2024 at 12:24:46PM -0400, Robert Haas wrote:
> On Fri, Jun 14, 2024 at 3:54 AM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> > > Ah, right. So, I was assuming that, with either this version of your
> > > patch or the earlier version, we'd end up locking the constraint
> > > itself. Was I wrong about that?
> >
> > The child contraint itself is not locked when going through
> > ConstraintSetParentConstraint().
> >
> > While at it, let's look at a full example and focus on your concern.
> 
> I'm not at the point of having a concern yet, honestly. I'm trying to
> understand the design ideas. The commit message just says that we take
> a conflicting lock, but it doesn't mention which object types that
> principle does or doesn't apply to. I think the idea of skipping it
> for cases where it's redundant with the relation lock could be the
> right idea, but if that's what we're doing, don't we need to explain
> the principle somewhere? And shouldn't we also apply it across all
> object types that have the same property?

Yeah, I still need to deeply study this area and document it. 

> Along the same lines:
> 
> + /*
> + * Those don't rely on LockDatabaseObject() when being dropped (see
> + * AcquireDeletionLock()). Also it looks like they can not produce
> + * orphaned dependent objects when being dropped.
> + */
> + if (object->classId == RelationRelationId || object->classId ==
> AuthMemRelationId)
> + return;
> 
> "It looks like X cannot happen" is not confidence-inspiring.

Yeah, it is not. It is just a "feeling" that I need to work on to remove
any ambiguity and/or adjust the code as needed.

> At the
> very least, a better comment is needed here. But also, that relation
> has no exception for AuthMemRelationId, only for RelationRelationId.
> And also, the exception for RelationRelationId doesn't imply that we
> don't need a conflicting lock here: the special case for
> RelationRelationId in AcquireDeletionLock() is necessary because the
> lock tag format is different for relations than for other database
> objects, not because we don't need a lock at all. If the handling here
> were really symmetric with what AcquireDeletionLock(), the coding
> would be to call either LockRelationOid() or LockDatabaseObject()
> depending on whether classid == RelationRelationId.

Agree.

> Now, that isn't
> actually necessary, because we already have relation-locking calls
> elsewhere, but my point is that the rationale this commit gives for
> WHY it isn't necessary seems to me to be wrong in multiple ways.

Agree. I'm not done with that part yet (should have made it more clear).

> So to try to sum up here: I'm not sure I agree with this design. But I
> also feel like the design is not as clear and consistently implemented
> as it could be. So even if I just ignored the question of whether it's
> the right design, it feels like we're a ways from having something
> potentially committable here, because of issues like the ones I
> mentioned in the last paragraph.
> 

Agree. I'll now move on with the "XXX Do we need a lock for RelationRelationId?"
comments that I put in v10 (see [1]) and study all the cases around them.

Once done, I think that it will easier to 1.remove ambiguity, 2.document and
3.do the "right" thing regarding the RelationRelationId object class.

[1]: https://www.postgresql.org/message-id/ZnAVEBhlGvpDDVOD%40ip-10-97-1-34.eu-west-3.compute.internal

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Avoid orphaned objects dependencies, take 3

From
Ashutosh Sharma
Date:
Hi,

If the dependency is more, this can hit max_locks_per_transaction
limit very fast. Won't it? I just tried this little experiment with
and without patch.

1) created some UDTs (I have just chosen some random number, 15)
do $$
declare
    i int := 1;
    type_name text;
begin
    while i <= 15 loop
        type_name := format('ct_%s', i);

        -- check if the type already exists
        if not exists (
            select 1
            from pg_type
            where typname = type_name
        ) then
            execute format('create type %I as (f1 INT, f2 TEXT);', type_name);
        end if;

        i := i + 1;
    end loop;
end $$;

2) started a transaction and tried creating a table that uses all udts
created above:
begin;
create table dep_tab(a ct_1, b ct_2, c ct_3, d ct_4, e ct_5, f ct_6, g
ct_7, h ct_8, i ct_9, j ct_10, k ct_11, l ct_12, m ct_13, n ct_14, o
ct_15);

3) checked the pg_locks entries inside the transaction both with and
without patch:

-- with patch:
select count(*) from pg_locks;
 count
-------
    23
(1 row)

-- without patch:
select count(*) from pg_locks;
 count
-------
     7
(1 row)

With patch, it increased by 3 times. Won't that create a problem if
many concurrent sessions engage in similar activity?

--
With Regards,
Ashutosh Sharma.



Re: Avoid orphaned objects dependencies, take 3

From
Robert Haas
Date:
On Wed, Jun 19, 2024 at 7:49 AM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
> If the dependency is more, this can hit max_locks_per_transaction
> limit very fast.

Your experiment doesn't support this conclusion. Very few users would
have 15 separate user-defined types in the same table, and even if
they did, and dropped the table, using 23 locks is no big deal. By
default, max_locks_per_transaction is 64, so the user would need to
have more like 45 separate user-defined types in the same table in
order to use more than 64 locks. So, yes, it is possible that if every
backend in the system were simultaneously trying to drop a table and
all of those tables had an average of at least 45 or so user-defined
types, all different from each other, you might run out of lock table
space.

But probably nobody will ever do that in real life, and if they did,
they could just raise max_locks_per_transaction.

When posting about potential problems like this, it is a good idea to
first do a careful thought experiment to assess how realistic the
problem is. I would consider an issue like this serious if there were
a realistic scenario under which a small number of backends could
exhaust the lock table for the whole system, but I think you can see
that this isn't the case here. Even your original scenario is more
extreme than what most people are likely to hit in real life, and it
only uses 23 locks.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Avoid orphaned objects dependencies, take 3

From
Ashutosh Sharma
Date:
Hi Robert,

On Wed, Jun 19, 2024 at 5:50 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Wed, Jun 19, 2024 at 7:49 AM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
> > If the dependency is more, this can hit max_locks_per_transaction
> > limit very fast.
>
> Your experiment doesn't support this conclusion. Very few users would
> have 15 separate user-defined types in the same table, and even if
> they did, and dropped the table, using 23 locks is no big deal. By
> default, max_locks_per_transaction is 64, so the user would need to
> have more like 45 separate user-defined types in the same table in
> order to use more than 64 locks. So, yes, it is possible that if every
> backend in the system were simultaneously trying to drop a table and
> all of those tables had an average of at least 45 or so user-defined
> types, all different from each other, you might run out of lock table
> space.
>
> But probably nobody will ever do that in real life, and if they did,
> they could just raise max_locks_per_transaction.
>
> When posting about potential problems like this, it is a good idea to
> first do a careful thought experiment to assess how realistic the
> problem is. I would consider an issue like this serious if there were
> a realistic scenario under which a small number of backends could
> exhaust the lock table for the whole system, but I think you can see
> that this isn't the case here. Even your original scenario is more
> extreme than what most people are likely to hit in real life, and it
> only uses 23 locks.
>

I agree that based on the experiment I shared (which is somewhat
unrealistic), this doesn't seem to have any significant implications.
However, I was concerned that it could potentially increase the usage
of max_locks_per_transaction, which is why I wanted to mention it
here. Nonetheless, my experiment did not reveal any serious issues
related to this. Sorry for the noise.

--
With Regards,
Ashutosh Sharma.



Re: Avoid orphaned objects dependencies, take 3

From
Bertrand Drouvot
Date:
Hi,

On Mon, Jun 17, 2024 at 05:57:05PM +0000, Bertrand Drouvot wrote:
> Hi,
> 
> On Mon, Jun 17, 2024 at 12:24:46PM -0400, Robert Haas wrote:
> > So to try to sum up here: I'm not sure I agree with this design. But I
> > also feel like the design is not as clear and consistently implemented
> > as it could be. So even if I just ignored the question of whether it's
> > the right design, it feels like we're a ways from having something
> > potentially committable here, because of issues like the ones I
> > mentioned in the last paragraph.
> > 
> 
> Agree. I'll now move on with the "XXX Do we need a lock for RelationRelationId?"
> comments that I put in v10 (see [1]) and study all the cases around them.

A. I went through all of them, did some testing for all, and reached the
conclusion that we must be in one of the two following cases that would already
prevent the relation to be dropped:

1. The relation is already locked (could be an existing relation or a relation
that we are creating).

2. The relation is protected indirectly (i.e an index protected by a lock on
its table, a table protected by a lock on a function that depends of
the table...).

So we don't need to add a lock if this is a RelationRelationId object class for
the cases above.

As a consequence, I replaced the "XXX" related comments that were in v10 by
another set of comments in v11 (attached) like "No need to call LockRelationOid()
(through LockNotPinnedObject())....". Reason is to make it clear in the code
and also to ease the review.

B. I explained in [1] (while sharing v10) that the object locking is now outside
of the dependency code except for (and I explained why):

recordDependencyOnExpr() 
recordDependencyOnSingleRelExpr()
makeConfigurationDependencies()

So I also did some testing, on the RelationRelationId case, for those and I
reached the same conclusion as the one shared above.

For A. and B. the testing has been done by adding a "ereport(WARNING.." at
those places when a RelationRelationId is involved. Then I run "make check"
and went to the failed tests (output were not the expected ones due to the
extra "WARNING"), reproduced them with gdb and checked for the lock on the
relation producing the "WARNING". All of those were linked to 1. or 2.

Note that adding an assertion on an existing lock would not work for the cases
described in 2.

So, I'm now confident that we must be in 1. or 2. but it's also possible
that I've missed some cases (though I think the way the testing has been done is
not that weak).

To sum up, I did not see any cases that did not lead to 1. or 2., so I think
it's safe to not add an extra lock for the RelationRelationId case. If, for any
reason, there is still cases that are outside 1. or 2. then they may lead to
orphaned dependencies linked to the RelationRelationId class. I think that's
fine to take that "risk" given that a. that would not be worst than currently
and b. I did not see any of those in our fleet currently (while I have seen a non
negligible amount outside of the RelationRelationId case).

> Once done, I think that it will easier to 1.remove ambiguity, 2.document and
> 3.do the "right" thing regarding the RelationRelationId object class.
> 

Please find attached v11, where I added more detailed comments in the commit
message and also in the code (I also removed the useless check on
AuthMemRelationId).

[1]: https://www.postgresql.org/message-id/ZnAVEBhlGvpDDVOD%40ip-10-97-1-34.eu-west-3.compute.internal

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

Re: Avoid orphaned objects dependencies, take 3

From
Bertrand Drouvot
Date:
Hi,

On Wed, Jun 19, 2024 at 02:11:50PM +0000, Bertrand Drouvot wrote:
> To sum up, I did not see any cases that did not lead to 1. or 2., so I think
> it's safe to not add an extra lock for the RelationRelationId case. If, for any
> reason, there is still cases that are outside 1. or 2. then they may lead to
> orphaned dependencies linked to the RelationRelationId class. I think that's
> fine to take that "risk" given that a. that would not be worst than currently
> and b. I did not see any of those in our fleet currently (while I have seen a non
> negligible amount outside of the RelationRelationId case).

Another thought for the RelationRelationId class case: we could check if there
is a lock first and if there is no lock then acquire one. That way that would
ensure the relation is always locked (so no "risk" anymore), but OTOH it may
add "unecessary" locking (see 2. mentioned previously).

I think I do prefer this approach to be on the safe side of thing, what do
you think?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Avoid orphaned objects dependencies, take 3

From
Bertrand Drouvot
Date:
Hi,

On Fri, Jun 21, 2024 at 01:22:43PM +0000, Bertrand Drouvot wrote:
> Another thought for the RelationRelationId class case: we could check if there
> is a lock first and if there is no lock then acquire one. That way that would
> ensure the relation is always locked (so no "risk" anymore), but OTOH it may
> add "unecessary" locking (see 2. mentioned previously).

Please find attached v12 implementing this idea for the RelationRelationId class
case. As mentioned, it may add unnecessary locking for 2. but I think that's
worth it to ensure that we are always on the safe side of thing. This idea is
implemented in LockNotPinnedObjectById().

A few remarks:

- there is one place where the relation is not visible (even if
CommandCounterIncrement() is used). That's in TypeCreate(), because the new 
relation Oid is _not_ added to pg_class yet.
Indeed, in heap_create_with_catalog(), AddNewRelationType() is called before
AddNewRelationTuple()). I put a comment in this part of the code explaining why
it's not necessary to call LockRelationOid() here.

- some namespace related stuff is removed from "test_oat_hooks/expected/alter_table.out".
That's due to the logic in cachedNamespacePath() and the fact that the same
namespace related stuff is added prior in alter_table.out.

- the patch touches 37 .c files, but that's mainly due to the fact that
LockNotPinnedObjectById() has to be called in a lot of places.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment