Thread: How about using dirty snapshots to locate dependent objects?
Hello everyone,
At present, we use MVCC snapshots to identify dependent objects. This implies that if a new dependent object is inserted within a transaction that is still ongoing, our search for dependent objects won't include this recently added one. Consequently, if someone attempts to drop the referenced object, it will be dropped, and when the ongoing transaction completes, we will end up having an entry for a referenced object that has already been dropped. This situation can lead to an inconsistent state. Below is an example illustrating this scenario:
Session 1:
- create table t1(a int);
- insert into t1 select i from generate_series(1, 10000000) i;
- create extension btree_gist;
- create index i1 on t1 using gist( a );
Session 2: (While the index creation in session 1 is in progress, drop the btree_gist extension)
- drop extension btree_gist;
Above command succeeds and so does the create index command running in session 1, post this, if we try running anything on table t1, i1, it fails with an error: "cache lookup failed for opclass ..."
Attached is the patch that I have tried, which seems to be working for me. It's not polished and thoroughly tested, but just sharing here to clarify what I am trying to suggest. Please have a look and let me know your thoughts.
At present, we use MVCC snapshots to identify dependent objects. This implies that if a new dependent object is inserted within a transaction that is still ongoing, our search for dependent objects won't include this recently added one. Consequently, if someone attempts to drop the referenced object, it will be dropped, and when the ongoing transaction completes, we will end up having an entry for a referenced object that has already been dropped. This situation can lead to an inconsistent state. Below is an example illustrating this scenario:
Session 1:
- create table t1(a int);
- insert into t1 select i from generate_series(1, 10000000) i;
- create extension btree_gist;
- create index i1 on t1 using gist( a );
Session 2: (While the index creation in session 1 is in progress, drop the btree_gist extension)
- drop extension btree_gist;
Above command succeeds and so does the create index command running in session 1, post this, if we try running anything on table t1, i1, it fails with an error: "cache lookup failed for opclass ..."
Attached is the patch that I have tried, which seems to be working for me. It's not polished and thoroughly tested, but just sharing here to clarify what I am trying to suggest. Please have a look and let me know your thoughts.
--
With Regards,
Ashutosh Sharma.
Attachment
On Thu, Jun 6, 2024 at 5:59 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote: > > Hello everyone, > > At present, we use MVCC snapshots to identify dependent objects. This implies that if a new dependent object is insertedwithin a transaction that is still ongoing, our search for dependent objects won't include this recently added one.Consequently, if someone attempts to drop the referenced object, it will be dropped, and when the ongoing transactioncompletes, we will end up having an entry for a referenced object that has already been dropped. This situationcan lead to an inconsistent state. Below is an example illustrating this scenario: I don't think it's correct to allow the index to be dropped while a transaction is creating it. Instead, the right solution should be for the create index operation to protect the object it is using from being dropped. Specifically, the create index operation should acquire a shared lock on the Access Method (AM) to ensure it doesn't get dropped concurrently while the transaction is still in progress. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Thu, Jun 06, 2024 at 05:59:00PM +0530, Ashutosh Sharma wrote: > Hello everyone, > > At present, we use MVCC snapshots to identify dependent objects. This > implies that if a new dependent object is inserted within a transaction > that is still ongoing, our search for dependent objects won't include this > recently added one. Consequently, if someone attempts to drop the > referenced object, it will be dropped, and when the ongoing transaction > completes, we will end up having an entry for a referenced object that has > already been dropped. This situation can lead to an inconsistent state. > Below is an example illustrating this scenario: > > Session 1: > - create table t1(a int); > - insert into t1 select i from generate_series(1, 10000000) i; > - create extension btree_gist; > - create index i1 on t1 using gist( a ); > > Session 2: (While the index creation in session 1 is in progress, drop the > btree_gist extension) > - drop extension btree_gist; > > Above command succeeds and so does the create index command running in > session 1, post this, if we try running anything on table t1, i1, it fails > with an error: "cache lookup failed for opclass ..." > > Attached is the patch that I have tried, which seems to be working for me. > It's not polished and thoroughly tested, but just sharing here to clarify > what I am trying to suggest. Please have a look and let me know your > thoughts. Thanks for the patch proposal! The patch does not fix the other way around: - session 1: BEGIN; DROP extension btree_gist; - session 2: create index i1 on t1 using gist( a ); - session 1: commits while session 2 is creating the index and does not address all the possible orphaned dependencies cases. There is an ongoing thread (see [1]) to fix the orphaned dependencies issue. v9 attached in [1] fixes the case you describe here. [1]: https://www.postgresql.org/message-id/flat/ZiYjn0eVc7pxVY45%40ip-10-97-1-34.eu-west-3.compute.internal Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Thu, Jun 6, 2024 at 6:20 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Thu, Jun 6, 2024 at 5:59 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
>
> Hello everyone,
>
> At present, we use MVCC snapshots to identify dependent objects. This implies that if a new dependent object is inserted within a transaction that is still ongoing, our search for dependent objects won't include this recently added one. Consequently, if someone attempts to drop the referenced object, it will be dropped, and when the ongoing transaction completes, we will end up having an entry for a referenced object that has already been dropped. This situation can lead to an inconsistent state. Below is an example illustrating this scenario:
I don't think it's correct to allow the index to be dropped while a
transaction is creating it. Instead, the right solution should be for
the create index operation to protect the object it is using from
being dropped. Specifically, the create index operation should acquire
a shared lock on the Access Method (AM) to ensure it doesn't get
dropped concurrently while the transaction is still in progress.
If I'm following you correctly, that's exactly what the patch is trying to do; while the index creation is in progress, if someone tries to drop the object referenced by the index under creation, the referenced object being dropped is able to know about the dependent object (in this case the index being created) using dirty snapshot and hence, it is unable to acquire the lock on the dependent object, and as a result of that, it is unable to drop it.
--
With Regards,
Ashutosh Sharma.
On Thu, Jun 6, 2024 at 6:57 PM Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote:
On Thu, Jun 06, 2024 at 05:59:00PM +0530, Ashutosh Sharma wrote:
> Hello everyone,
>
> At present, we use MVCC snapshots to identify dependent objects. This
> implies that if a new dependent object is inserted within a transaction
> that is still ongoing, our search for dependent objects won't include this
> recently added one. Consequently, if someone attempts to drop the
> referenced object, it will be dropped, and when the ongoing transaction
> completes, we will end up having an entry for a referenced object that has
> already been dropped. This situation can lead to an inconsistent state.
> Below is an example illustrating this scenario:
>
> Session 1:
> - create table t1(a int);
> - insert into t1 select i from generate_series(1, 10000000) i;
> - create extension btree_gist;
> - create index i1 on t1 using gist( a );
>
> Session 2: (While the index creation in session 1 is in progress, drop the
> btree_gist extension)
> - drop extension btree_gist;
>
> Above command succeeds and so does the create index command running in
> session 1, post this, if we try running anything on table t1, i1, it fails
> with an error: "cache lookup failed for opclass ..."
>
> Attached is the patch that I have tried, which seems to be working for me.
> It's not polished and thoroughly tested, but just sharing here to clarify
> what I am trying to suggest. Please have a look and let me know your
> thoughts.
Thanks for the patch proposal!
The patch does not fix the other way around:
- session 1: BEGIN; DROP extension btree_gist;
- session 2: create index i1 on t1 using gist( a );
- session 1: commits while session 2 is creating the index
and does not address all the possible orphaned dependencies cases.
There is an ongoing thread (see [1]) to fix the orphaned dependencies issue.
v9 attached in [1] fixes the case you describe here.
[1]: https://www.postgresql.org/message-id/flat/ZiYjn0eVc7pxVY45%40ip-10-97-1-34.eu-west-3.compute.internal
I see. Thanks for sharing this. I can take a look at this and help in whatever way I can.
With Regards,
Ashutosh Sharma.
On Thu, Jun 6, 2024 at 7:39 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote: > > On Thu, Jun 6, 2024 at 6:20 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: >> >> On Thu, Jun 6, 2024 at 5:59 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote: >> > >> > Hello everyone, >> > >> > At present, we use MVCC snapshots to identify dependent objects. This implies that if a new dependent object is insertedwithin a transaction that is still ongoing, our search for dependent objects won't include this recently added one.Consequently, if someone attempts to drop the referenced object, it will be dropped, and when the ongoing transactioncompletes, we will end up having an entry for a referenced object that has already been dropped. This situationcan lead to an inconsistent state. Below is an example illustrating this scenario: >> >> I don't think it's correct to allow the index to be dropped while a >> transaction is creating it. Instead, the right solution should be for >> the create index operation to protect the object it is using from >> being dropped. Specifically, the create index operation should acquire >> a shared lock on the Access Method (AM) to ensure it doesn't get >> dropped concurrently while the transaction is still in progress. > > > If I'm following you correctly, that's exactly what the patch is trying to do; while the index creation is in progress,if someone tries to drop the object referenced by the index under creation, the referenced object being droppedis able to know about the dependent object (in this case the index being created) using dirty snapshot and hence,it is unable to acquire the lock on the dependent object, and as a result of that, it is unable to drop it. You are aiming for the same outcome, but not in the conventional way. In my opinion, the correct approach is not to find objects being created using a dirty snapshot. Instead, when creating an object, you should acquire a proper lock on any dependent objects to prevent them from being dropped during the creation process. For instance, when creating an index that depends on the btree_gist access method, the create index operation should protect btree_gist from being dropped by acquiring the appropriate lock. It is not the responsibility of the drop extension to identify in-progress index creations. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Fri, Jun 7, 2024 at 10:06 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Thu, Jun 6, 2024 at 7:39 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote: > > > > On Thu, Jun 6, 2024 at 6:20 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > >> > >> On Thu, Jun 6, 2024 at 5:59 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote: > >> > > >> > Hello everyone, > >> > > >> > At present, we use MVCC snapshots to identify dependent objects. This implies that if a new dependent object is insertedwithin a transaction that is still ongoing, our search for dependent objects won't include this recently added one.Consequently, if someone attempts to drop the referenced object, it will be dropped, and when the ongoing transactioncompletes, we will end up having an entry for a referenced object that has already been dropped. This situationcan lead to an inconsistent state. Below is an example illustrating this scenario: > >> > >> I don't think it's correct to allow the index to be dropped while a > >> transaction is creating it. Instead, the right solution should be for > >> the create index operation to protect the object it is using from > >> being dropped. Specifically, the create index operation should acquire > >> a shared lock on the Access Method (AM) to ensure it doesn't get > >> dropped concurrently while the transaction is still in progress. > > > > > > If I'm following you correctly, that's exactly what the patch is trying to do; while the index creation is in progress,if someone tries to drop the object referenced by the index under creation, the referenced object being droppedis able to know about the dependent object (in this case the index being created) using dirty snapshot and hence,it is unable to acquire the lock on the dependent object, and as a result of that, it is unable to drop it. > > You are aiming for the same outcome, but not in the conventional way. > In my opinion, the correct approach is not to find objects being > created using a dirty snapshot. Instead, when creating an object, you > should acquire a proper lock on any dependent objects to prevent them > from being dropped during the creation process. For instance, when > creating an index that depends on the btree_gist access method, the > create index operation should protect btree_gist from being dropped by > acquiring the appropriate lock. It is not the responsibility of the > drop extension to identify in-progress index creations. Thanks for sharing your thoughts, I appreciate your inputs and completely understand your perspective, but I wonder if that is feasible? For example, if an object (index in this case) has dependency on lets say 'n' number of objects, and those 'n' number of objects belong to say 'n' different catalog tables, so should we acquire locks on each of them until the create index command succeeds, or, should we just check for the presence of dependent objects and record their dependency inside the pg_depend table. Talking about this particular case, we are trying to create gist index that has dependency on gist_int4 opclass, it is one of the tuple inside pg_opclass catalog table, so should acquire lock in this tuple/table until the create index command succeeds and is that the thing to be done for all the dependent objects? -- With Regards, Ashutosh Sharma.
On Fri, Jun 7, 2024 at 11:53 AM Ashutosh Sharma <ashu.coek88@gmail.com> wrote: > > On Fri, Jun 7, 2024 at 10:06 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Thu, Jun 6, 2024 at 7:39 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote: > > > > > > On Thu, Jun 6, 2024 at 6:20 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > >> > > >> On Thu, Jun 6, 2024 at 5:59 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote: > > >> > > > >> > Hello everyone, > > >> > > > >> > At present, we use MVCC snapshots to identify dependent objects. This implies that if a new dependent object isinserted within a transaction that is still ongoing, our search for dependent objects won't include this recently addedone. Consequently, if someone attempts to drop the referenced object, it will be dropped, and when the ongoing transactioncompletes, we will end up having an entry for a referenced object that has already been dropped. This situationcan lead to an inconsistent state. Below is an example illustrating this scenario: > > >> > > >> I don't think it's correct to allow the index to be dropped while a > > >> transaction is creating it. Instead, the right solution should be for > > >> the create index operation to protect the object it is using from > > >> being dropped. Specifically, the create index operation should acquire > > >> a shared lock on the Access Method (AM) to ensure it doesn't get > > >> dropped concurrently while the transaction is still in progress. > > > > > > > > > If I'm following you correctly, that's exactly what the patch is trying to do; while the index creation is in progress,if someone tries to drop the object referenced by the index under creation, the referenced object being droppedis able to know about the dependent object (in this case the index being created) using dirty snapshot and hence,it is unable to acquire the lock on the dependent object, and as a result of that, it is unable to drop it. > > > > You are aiming for the same outcome, but not in the conventional way. > > In my opinion, the correct approach is not to find objects being > > created using a dirty snapshot. Instead, when creating an object, you > > should acquire a proper lock on any dependent objects to prevent them > > from being dropped during the creation process. For instance, when > > creating an index that depends on the btree_gist access method, the > > create index operation should protect btree_gist from being dropped by > > acquiring the appropriate lock. It is not the responsibility of the > > drop extension to identify in-progress index creations. > > Thanks for sharing your thoughts, I appreciate your inputs and > completely understand your perspective, but I wonder if that is > feasible? For example, if an object (index in this case) has > dependency on lets say 'n' number of objects, and those 'n' number of > objects belong to say 'n' different catalog tables, so should we > acquire locks on each of them until the create index command succeeds, > or, should we just check for the presence of dependent objects and > record their dependency inside the pg_depend table. Talking about this > particular case, we are trying to create gist index that has > dependency on gist_int4 opclass, it is one of the tuple inside > pg_opclass catalog table, so should acquire lock in this tuple/table > until the create index command succeeds and is that the thing to be > done for all the dependent objects? I am not sure what is the best way to do it, but if you are creating an object which is dependent on the other object then you need to check the existence of those objects, record dependency on those objects, and also lock them so that those object doesn't get dropped while you are creating your object. I haven't looked into the patch but something similar is being achieved in the thread Bertrand has pointed out by locking the database object while recording the dependency on those. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com