Thread: Patch to avoid orphaned dependencies

Patch to avoid orphaned dependencies

From
"Drouvot, Bertrand"
Date:

Hi,

This new thread is a follow-up of [1].

Problem description:

We have occasionally observed objects having an orphaned dependency, the most common case we have seen (if not the only one) is functions not linked to any namespaces.
A patch has been initially proposed to fix this particular (function-to-namespace) dependency (see [1]), but there could be much more scenarios (like the function-to-datatype one highlighted by Gilles in [1] that could lead to a function having an invalid parameter datatype).
As Tom said there are dozens more cases that would need to be considered, and a global approach to avoid those race conditions should be considered instead.

The attached patch is avoiding those race conditions globally by changing the dependency mechanism: we are using a dirty snapshot any time we’re about to create a pg_depend or pg_shdepend entry.
That way we can check if there is in-flight transactions that are affecting the dependency: if that’s the case, an error is being reported.

This approach has been chosen over another one that would have make use of the locking machinery.
The reason for this choice is to avoid possible slow down of typical DDL command, risk of deadlock, number of locks taken by transaction...

Implementation overview:

  • A new catalog snapshot is added: DirtyCatalogSnapshot.
  • This catalog snapshot is a dirty one to be able to look for in-flight dependencies.
  • Its usage is controlled by a new UseDirtyCatalogSnapshot variable.
  • Any time this variable is being set to true, then the next call to GetNonHistoricCatalogSnapshot() is returning the dirty snapshot.
  • This snapshot is being used to check for in-flight dependencies and also to get the objects description to generate the error messages.

Testing:

Test 1

Session1:

postgres=# create schema tobeorph;
CREATE SCHEMA
postgres=# create table tobeorph.bdt (a int);
CREATE TABLE
postgres=# begin;
BEGIN
postgres=*# CREATE OR REPLACE FUNCTION tobeorph.bdttime() RETURNS TIMESTAMP AS $$
DECLARE   outTS TIMESTAMP;
BEGIN   perform pg_sleep(10);   RETURN CURRENT_TIMESTAMP;
END;
$$ LANGUAGE plpgsql volatile;
CREATE FUNCTION

Session 1 does not commit, then session 2:

postgres=# drop schema tobeorph;
ERROR:  cannot drop schema tobeorph because other objects depend on it
DETAIL:  table tobeorph.bdt depends on schema tobeorph
function tobeorph.bdttime() (not yet committed) depends on schema tobeorph
HINT:  DROP and DROP CASCADE won't work when there are uncommitted dependencies.

Test 2

Session 1:

postgres=# create schema toinsert;
CREATE SCHEMA
postgres=# begin;
BEGIN
postgres=*# drop schema toinsert;
DROP SCHEMA

Session 1 does not commit, then session 2:

postgres=# CREATE OR REPLACE FUNCTION toinsert.bdttime() RETURNS TIMESTAMP AS $$
DECLARE   outTS TIMESTAMP;
BEGIN   perform pg_sleep(10);   RETURN CURRENT_TIMESTAMP;
END;
$$ LANGUAGE plpgsql volatile;
ERROR:  cannot create function toinsert.bdttime() because it depends of other objects uncommitted dependencies
DETAIL:  function toinsert.bdttime() depends on schema toinsert (dependency not yet committed)
HINT:  CREATE won't work as long as there is uncommitted dependencies.

Test3

Session1:
psql -U toorph postgres
psql (14devel)
Type "help" for help.

postgres=> begin;
BEGIN
postgres=*> CREATE OR REPLACE FUNCTION bdttime() RETURNS TIMESTAMP AS $$
DECLARE   outTS TIMESTAMP;
BEGIN   perform pg_sleep(10);   RETURN CURRENT_TIMESTAMP;
END;
$$ LANGUAGE plpgsql volatile;
CREATE FUNCTION

Session 1 does not commit, then session 2:

postgres=# drop owned by toorph;
ERROR:  cannot drop objects owned by role toorph because other uncommitted objects depend on it
DETAIL:  function public.bdttime() (not yet committed) depends on role toorph
HINT:  Commit or rollback function public.bdttime() creation.


I'm creating a new commitfest entry for this patch.

Thanks

Bertrand

[1]: https://www.postgresql.org/message-id/flat/a4f55089-7cbd-fe5d-a9bb-19adc6418ae9%40darold.net#9af5cdaa9e80879beb1def3604c976e8

Attachment

Re: Patch to avoid orphaned dependencies

From
Ronan Dunklau
Date:
Hello Bertrand,

Le mardi 4 mai 2021, 11:55:43 CEST Drouvot, Bertrand a écrit :
>
>         Implementation overview:
>
>   * A new catalog snapshot is added: DirtyCatalogSnapshot.
>   * This catalog snapshot is a dirty one to be able to look for
>     in-flight dependencies.
>   * Its usage is controlled by a new UseDirtyCatalogSnapshot variable.
>   * Any time this variable is being set to true, then the next call to
>     GetNonHistoricCatalogSnapshot() is returning the dirty snapshot.
>   * This snapshot is being used to check for in-flight dependencies and
>     also to get the objects description to generate the error messages.
>

I quickly tested the patch, it behaves as advertised, and passes tests.

Isolation tests should be added to demonstrate the issues it is solving.

However, I am bit wary of this behaviour of setting the DirtyCatalogSnapshot
global variable which is then reset after each snapshot acquisition: I'm
having trouble understanding all the implications of that, if it would be
possible to introduce an unforeseen snapshot before the one we actually want
to be dirty.

I don't want to derail this thread, but couldn't predicate locks on the
pg_depend index pages corresponding to the dropped object / referenced objects
work as a different approach ? I'm not familiar enough with them so maybe there
is some fundamental misunderstanding on my end.

--
Ronan Dunklau





Re: Patch to avoid orphaned dependencies

From
"Drouvot, Bertrand"
Date:
Hi Ronan,

On 9/17/21 10:09 AM, Ronan Dunklau wrote:
> Hello Bertrand,
>
> Le mardi 4 mai 2021, 11:55:43 CEST Drouvot, Bertrand a écrit :
>>          Implementation overview:
>>
>>    * A new catalog snapshot is added: DirtyCatalogSnapshot.
>>    * This catalog snapshot is a dirty one to be able to look for
>>      in-flight dependencies.
>>    * Its usage is controlled by a new UseDirtyCatalogSnapshot variable.
>>    * Any time this variable is being set to true, then the next call to
>>      GetNonHistoricCatalogSnapshot() is returning the dirty snapshot.
>>    * This snapshot is being used to check for in-flight dependencies and
>>      also to get the objects description to generate the error messages.
>>
> I quickly tested the patch, it behaves as advertised, and passes tests.

Thanks for looking at it!

>
> Isolation tests should be added to demonstrate the issues it is solving.

Good point. I'll have a look.

>
> However, I am bit wary of this behaviour of setting the DirtyCatalogSnapshot
> global variable which is then reset after each snapshot acquisition: I'm
> having trouble understanding all the implications of that, if it would be
> possible to introduce an unforeseen snapshot before the one we actually want
> to be dirty.

I don't think that could be possible as long as:

- this is a per backend variable

- we pay attention where we set it to true

But i might be missing something.

Do you have any corner cases in mind?

> I don't want to derail this thread, but couldn't predicate locks on the
> pg_depend index pages corresponding to the dropped object / referenced objects
> work as a different approach ?

I'm fine to have a look at another approach if needed, but does it mean 
we are not happy with the current approach proposal?

Thanks

Bertrand




Re: Patch to avoid orphaned dependencies

From
Daniel Gustafsson
Date:
> On 20 Sep 2021, at 12:50, Drouvot, Bertrand <bdrouvot@amazon.com> wrote:
>
> Hi Ronan,
>
> On 9/17/21 10:09 AM, Ronan Dunklau wrote:
>> Hello Bertrand,
>>
>> Le mardi 4 mai 2021, 11:55:43 CEST Drouvot, Bertrand a écrit :
>>>         Implementation overview:
>>>
>>>   * A new catalog snapshot is added: DirtyCatalogSnapshot.
>>>   * This catalog snapshot is a dirty one to be able to look for
>>>     in-flight dependencies.
>>>   * Its usage is controlled by a new UseDirtyCatalogSnapshot variable.
>>>   * Any time this variable is being set to true, then the next call to
>>>     GetNonHistoricCatalogSnapshot() is returning the dirty snapshot.
>>>   * This snapshot is being used to check for in-flight dependencies and
>>>     also to get the objects description to generate the error messages.
>>>
>> I quickly tested the patch, it behaves as advertised, and passes tests.
>
> Thanks for looking at it!
>
>>
>> Isolation tests should be added to demonstrate the issues it is solving.
>
> Good point. I'll have a look.
>
>>
>> However, I am bit wary of this behaviour of setting the DirtyCatalogSnapshot
>> global variable which is then reset after each snapshot acquisition: I'm
>> having trouble understanding all the implications of that, if it would be
>> possible to introduce an unforeseen snapshot before the one we actually want
>> to be dirty.
>
> I don't think that could be possible as long as:
>
> - this is a per backend variable
>
> - we pay attention where we set it to true
>
> But i might be missing something.
>
> Do you have any corner cases in mind?
>
>> I don't want to derail this thread, but couldn't predicate locks on the
>> pg_depend index pages corresponding to the dropped object / referenced objects
>> work as a different approach ?
>
> I'm fine to have a look at another approach if needed, but does it mean we are not happy with the current approach
proposal?

This patch fails to apply as a whole, with the parts applying showing quite
large offsets.  Have you had the chance to look at the isolation test asked for
above?

--
Daniel Gustafsson        https://vmware.com/




Re: Patch to avoid orphaned dependencies

From
"Drouvot, Bertrand"
Date:
Hi,

On 11/17/21 2:25 PM, Daniel Gustafsson wrote:
>
> This patch fails to apply as a whole, with the parts applying showing quite
> large offsets.
Thanks for the warning, please find attached a rebase of it.
> Have you had the chance to look at the isolation test asked for
> above?

Not yet, but I'll look at it for sure.

Thanks

Bertrand

Attachment

Re: Patch to avoid orphaned dependencies

From
"Drouvot, Bertrand"
Date:
Hi,

On 11/23/21 4:22 PM, Drouvot, Bertrand wrote:
> Hi,
>
> On 11/17/21 2:25 PM, Daniel Gustafsson wrote:
>>
>> This patch fails to apply as a whole, with the parts applying showing 
>> quite
>> large offsets.
> Thanks for the warning, please find attached a rebase of it.
>> Have you had the chance to look at the isolation test asked for
>> above?
>
> Not yet, but I'll look at it for sure.
>
Please find enclosed v1-0003-orphaned-dependencies.patch, that contains:

- a mandatory rebase

- a few isolation tests added in src/test/modules/test_dependencies (but 
I'm not sure at all that's the right place to add them, is it?)

Thanks

Bertrand

Attachment

Re: Patch to avoid orphaned dependencies

From
Andres Freund
Date:
Hi,

On 2021-12-17 14:19:18 +0100, Drouvot, Bertrand wrote:
> Please find enclosed v1-0003-orphaned-dependencies.patch, that contains:
> 
> - a mandatory rebase
> 
> - a few isolation tests added in src/test/modules/test_dependencies (but I'm
> not sure at all that's the right place to add them, is it?)

This fails on windows w/ msvc:

https://cirrus-ci.com/task/5368174125252608?logs=configure#L102
https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.157904#L12

Greetings,

Andres Freund



Re: Patch to avoid orphaned dependencies

From
"Drouvot, Bertrand"
Date:
Hi,

On 12/31/21 7:03 AM, Andres Freund wrote:
> Hi,
>
> On 2021-12-17 14:19:18 +0100, Drouvot, Bertrand wrote:
>> Please find enclosed v1-0003-orphaned-dependencies.patch, that contains:
>>
>> - a mandatory rebase
>>
>> - a few isolation tests added in src/test/modules/test_dependencies (but I'm
>> not sure at all that's the right place to add them, is it?)
> This fails on windows w/ msvc:
>
> https://cirrus-ci.com/task/5368174125252608?logs=configure#L102
> https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.157904#L12

Thanks Andres for the warning.

Please find enclosed v1-0004-orphaned-dependencies.patch that addresses 
the issue.

Thanks

Bertrand

Attachment

Re: Patch to avoid orphaned dependencies

From
Zhihong Yu
Date:
Hi,

For genam.c:

+   UseDirtyCatalogSnapshot = dirtysnap;
+
Does the old value of UseDirtyCatalogSnapshot need to be restored at the end of the func ?

+systable_recheck_tuple(SysScanDesc sysscan, HeapTuple tup, bool dirtysnap)

Considering that parameter dirtysnap is a bool, I think it should be named isdirtysnap so that its meaning can be distinguished from:

+   Snapshot dirtySnapshot;

+   UseDirtyCatalogSnapshot = true;
+
+   dirtySnapshot = GetCatalogSnapshot(RelationGetRelid(*depRel));

I tend to think that passing usedirtysnap (bool parameter) to GetCatalogSnapshot() would be more flexible than setting global variable.

Cheers

Re: Patch to avoid orphaned dependencies

From
Nathan Bossart
Date:
Bertand, do you think this has any chance of making it into v15?  If not,
are you alright with adjusting the commitfest entry to v16 and moving it to
the next commitfest?

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: Patch to avoid orphaned dependencies

From
Tom Lane
Date:
Nathan Bossart <nathandbossart@gmail.com> writes:
> Bertand, do you think this has any chance of making it into v15?  If not,
> are you alright with adjusting the commitfest entry to v16 and moving it to
> the next commitfest?

I looked this over briefly, and IMO it should have no chance of being
committed in anything like this form.

The lesser problem is that (as already noted) the reliance on a global
variable that changes catalog lookup semantics is just unbelievably
scary.  An example of the possible consequences here is that a syscache
entry could get made while that's set, containing a row that we should
not be able to see yet, and indeed might never get committed at all.
Perhaps that could be addressed by abandoning the patch's ambition to tell
you the details of an uncommitted object (which would have race conditions
anyway), so that *only* reads of pg_[sh]depend itself need be dirty.

The bigger problem is that it fails to close the race condition that
it's intending to solve.  This patch will catch a race like this:

Session doing DROP                Session doing CREATE

DROP begins

                                  CREATE makes a dependent catalog entry

DROP scans for dependent objects

                                  CREATE commits

DROP removes catalog entry

DROP commits

However, it will not catch this slightly different timing:

Session doing DROP                Session doing CREATE

DROP begins

DROP scans for dependent objects

                                  CREATE makes a dependent catalog entry

                                  CREATE commits

DROP removes catalog entry

DROP commits

So I don't see that we've moved the goalposts very far at all.

Realistically, if we want to prevent this type of problem, then all
creation DDL will have to take a lock on each referenced object that'd
conflict with a lock taken by DROP.  This might not be out of reach:
I think we do already take such locks while dropping objects.  The
reference-side lock could be taken by the recordDependency mechanism
itself, ensuring that we don't miss anything; and that would also
allow us to not bother taking such a lock on pinned objects, which'd
greatly cut the cost (though not to zero).

            regards, tom lane



Re: Patch to avoid orphaned dependencies

From
Nathan Bossart
Date:
On Wed, Mar 23, 2022 at 12:49:06PM -0400, Tom Lane wrote:
> Nathan Bossart <nathandbossart@gmail.com> writes:
>> Bertand, do you think this has any chance of making it into v15?  If not,
>> are you alright with adjusting the commitfest entry to v16 and moving it to
>> the next commitfest?
> 
> I looked this over briefly, and IMO it should have no chance of being
> committed in anything like this form.

I marked the commitfest entry as waiting-on-author, set the target version
to v16, and moved it to the next commitfest.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: Patch to avoid orphaned dependencies

From
Jacob Champion
Date:
This entry has been waiting on author input for a while (our current
threshold is roughly two weeks), so I've marked it Returned with
Feedback.

Once you think the patchset is ready for review again, you (or any
interested party) can resurrect the patch entry by visiting

    https://commitfest.postgresql.org/38/3106/

and changing the status to "Needs Review", and then changing the
status again to "Move to next CF". (Don't forget the second step;
hopefully we will have streamlined this in the near future!)

Thanks,
--Jacob



Re: Patch to avoid orphaned dependencies

From
Bertrand Drouvot
Date:
Hi,

On Wed, Mar 23, 2022 at 12:49:06PM -0400, Tom Lane wrote:
> Realistically, if we want to prevent this type of problem, then all
> creation DDL will have to take a lock on each referenced object that'd
> conflict with a lock taken by DROP.  This might not be out of reach:
> I think we do already take such locks while dropping objects.  The
> reference-side lock could be taken by the recordDependency mechanism
> itself, ensuring that we don't miss anything; and that would also
> allow us to not bother taking such a lock on pinned objects, which'd
> greatly cut the cost (though not to zero).

Thanks for the idea (and sorry for the delay replying to it)! I had a look at it
and just created a new thread [1] based on your proposal.

[1]: https://www.postgresql.org/message-id/flat/ZiYjn0eVc7pxVY45%40ip-10-97-1-34.eu-west-3.compute.internal

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com