Thread: BUG #18371: There are wrong constraint residues when detach hash partiton concurrently
BUG #18371: There are wrong constraint residues when detach hash partiton concurrently
From
PG Bug reporting form
Date:
The following bug has been logged on the website: Bug reference: 18371 Logged by: Fei Changhong Email address: feichanghong@qq.com PostgreSQL version: 16.2 Operating system: Operating system: centos 7,Kernel version: 5.10.13 Description: Hi all, When I tried to insert data into a table that had just been detached, I encountered an error: "ERROR: could not open relation with OID 16384". The environment information is as follows: PostgreSQL branch: master. commit: bcdfa5f2e2f274caeed20b2f986012a9cb6a259c This issue can be reproduced with the following steps: ``` create table t(a int) partition by hash (a); create table t_p1 partition of t for values with (modulus 2, remainder 0); alter table t detach partition t_p1 concurrently ; drop table t; insert into t_p1 select 1; ``` When detaching a partition concurrently, the function DetachAddConstraintIfNeeded is called to convert the partition constraint of the detached partition into a regular constraint, and it is not dropped at the end of the detach process. This can work normally on range partitions. However, the constraint on hash partitions uses satisfies_hash_partition with the OID of the parent table, and the newly created constraint does not take effect. For example, in the following case, although there is a t_p1_a_check constraint on t_p1, it is still possible to perform an insert: ``` create table t(a int) partition by hash (a); create table t_p1 partition of t for values with (modulus 2, remainder 0); alter table t detach partition t_p1 concurrently ; postgres=# \d+ t_p1 Table "public.t_p1" Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description --------+---------+-----------+----------+---------+---------+-------------+--------------+------------- a | integer | | | | plain | | | Check constraints: "t_p1_a_check" CHECK (satisfies_hash_partition('16384'::oid, 2, 0, a)) Access method: heap postgres=# insert into t_p1 select 1; INSERT 0 1 postgres=# ``` If the parent table has already been dropped, an error will occur during constraint checking. Based on the analysis above, should the added constraint for a hash partition be dropped after detachment?
Re:BUG #18371: There are wrong constraint residues when detach hash partiton concurrently
From
"feichanghong"
Date:
> This can work normally on range partitions. However, the constraint on hash
> partitions uses satisfies_hash_partition with the OID of the parent table, and
> the newly created constraint does not take effect. For example, in the following
> case, although there is a t_p1_a_check constraint on t_p1, it is still possible
> to perform an insert:
What I said here is wrong, the constraints on the hash partition will also take
effect. But this constraint depends on the oid of the parent partition.
> Based on the analysis above, should the added constraint for a hash partition
> be dropped after detachment?
I have initially implemented this logic in the attached patch and added a
testcase. I really hope that developers can give me some suggestions.
Best Regards,
Fei Changhong
Alibaba Cloud Computing Ltd.
Attachment
Re: BUG #18371: There are wrong constraint residues when detach hash partiton concurrently
From
Michael Paquier
Date:
On Thu, Feb 29, 2024 at 02:21:58PM +0800, feichanghong wrote: > What I said here is wrong, the constraints on the hash partition will also take > effect. But this constraint depends on the oid of the parent partition. Something is weird with the format of the messages you have sent, by the way. > Based on the analysis above, should the added constraint for a hash partition > be dropped after detachment?I have initially implemented this logic > in the attached patch and added a testcase. I really hope that > developers can give me some suggestions. I am not much a fan of relying on ATExecDropConstraint(), where the logic introduced is a copy-paste of get_constraint_name() to retrieve the name of the constraint to drop. If any, we ought to rely more on dropconstraint_internal() instead using the pg_constraint tuple based on the OID of the constraint, and not do a cross-operation with the constraint name. But actually, why do you do a drop of the constraint at all? DetachAddConstraintIfNeeded() re-adds the constraint as it is, so instead of re-adding it as-is and then dropping it again, wouldn't it be better to just *not* re-add to begin with it if we don't need it at all? This reproduces down to 14, for what I am assuming is an issue introduced around 7b357cc6ae55. Alvaro, what do you think? -- Michael
Attachment
Re: BUG #18371: There are wrong constraint residues when detach hash partiton concurrently
From
alvherre
Date:
On 2024-Mar-05, Michael Paquier wrote: > This reproduces down to 14, for what I am assuming is an issue > introduced around 7b357cc6ae55. Alvaro, what do you think? Yeah, I have this on my list of things to fix this month. I haven't looked at it closely, but I will soon. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "Los cuentos de hadas no dan al niño su primera idea sobre los monstruos. Lo que le dan es su primera idea de la posible derrota del monstruo." (G. K. Chesterton)
Re: BUG #18371: There are wrong constraint residues when detach hash partiton concurrently
From
费长红
Date:
On 2024/3/5, 11:13, "Michael Paquier" <michael@paquier.xyz> wrote: Something is weird with the format of the messages you have sent, by the way. Ah ha! This should be a problem with my email client, I will try another client. I am not much a fan of relying on ATExecDropConstraint(), where the logic introduced is a copy-paste of get_constraint_name() to retrieve the name of the constraint to drop. If any, we ought to rely more on dropconstraint_internal() instead using the pg_constraint tuple based on the OID of the constraint, and not do a cross-operation with the constraint name. But actually, why do you do a drop of the constraint at all? DetachAddConstraintIfNeeded() re-adds the constraint as it is, so instead of re-adding it as-is and then dropping it again, wouldn't it be better to just *not* re-add to begin with it if we don't need it at all? After verification, the partition constraints on the partition being detached remain in effect until the second transaction committed. This newly added constraint should only be for the purpose of speeding up the re-attach operation. Perhaps I've missed something? So, for hash partitioning, it should be better not to re-add new constraints. Perhaps we should filter out hash partitions in DetachAddConstraintIfNeeded(). -- Best Regards, Fei Changhong Alibaba Cloud Computing Ltd.
Re: BUG #18371: There are wrong constraint residues when detach hash partiton concurrently
From
Michael Paquier
Date:
On Tue, Mar 05, 2024 at 08:37:27AM +0100, Alvaro Herrera wrote: > Yeah, I have this on my list of things to fix this month. I haven't > looked at it closely, but I will soon. Cool, thanks! -- Michael
Attachment
Re: BUG #18371: There are wrong constraint residues when detach hash partiton concurrently
From
alvherre
Date:
On 2024-Feb-29, feichanghong wrote: > > This can work normally on range partitions. However, the constraint on hash > > partitions uses satisfies_hash_partition with the OID of the parent table, and > > the newly created constraint does not take effect. For example, in the following > > case, although there is a t_p1_a_check constraint on t_p1, it is still possible > > to perform an insert: > What I said here is wrong, the constraints on the hash partition will also take > effect. But this constraint depends on the oid of the parent partition. We should definitely not have this constraint on hash-partition tables after the detach. However, I wonder if instead of adding it and later removing it as you propose, it wouldn't be better to just not add it in the first place. As a first step, I tried commenting out and found that no interesting test fails (only alter_table.sql fails but only because the constraint is not there when looking for it specifically.) The current code does not explain *why* we have to add this constraint, and I had forgotten, so I went to look at the first patch submission in that thread [1] and saw this comment there: + /* + * Concurrent mode has to work harder; first we add a new constraint to the + * partition that matches the partition constraint. The reason for this is + * that the planner may have made optimizations that depend on the + * constraint. XXX Isn't it sufficient to invalidate the partition's + * relcache entry? I'm trying to figure out whether it's possible that the planner would make optimizations based on the hashing function. Quite possibly it won't. If that's so, then we should just not make the constraint at all, which would make the fix even simpler. I also wonder, maybe that XXX comment (which I removed before committing the patch) is right and we don't actually need the constraint with _any_ partition strategy, not just hash. [1] https://postgr.es/m/20200803234854.GA24158@alvherre.pgsql -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
Re: BUG #18371: There are wrong constraint residues when detach hash partiton concurrently
From
feichanghong
Date:
Hi Alvaro,
On Jul 16, 2024, at 03:35, alvherre <alvherre@alvh.no-ip.org> wrote:
We should definitely not have this constraint on hash-partition tables
after the detach. However, I wonder if instead of adding it and later
removing it as you propose, it wouldn't be better to just not add it in
the first place. As a first step, I tried commenting out and found that
no interesting test fails (only alter_table.sql fails but only because
the constraint is not there when looking for it specifically.)
The current code does not explain *why* we have to add this constraint,
and I had forgotten, so I went to look at the first patch submission in
that thread [1] and saw this comment there:
+ /*
+ * Concurrent mode has to work harder; first we add a new constraint to the
+ * partition that matches the partition constraint. The reason for this is
+ * that the planner may have made optimizations that depend on the
+ * constraint. XXX Isn't it sufficient to invalidate the partition's
+ * relcache entry?
I'm trying to figure out whether it's possible that the planner would
make optimizations based on the hashing function. Quite possibly it
won't. If that's so, then we should just not make the constraint at
all, which would make the fix even simpler. I also wonder, maybe that
XXX comment (which I removed before committing the patch) is right and
we don't actually need the constraint with _any_ partition strategy, not
just hash.
I agree with your point of view. There is no need to add an extra partition
constraint for "DETACH PARTITION CONCURRENTLY". In the first transaction, we
merely set inhdetachpending to true without actually dropping the existing
partition constraint.
At the start of the second transaction, we wait for all queries that were
planned with the partition to finish. Then, we acquire an AccessExclusiveLock
on the sub-partition and drop the original partition constraint. Most likely,
the newly added constraint is useless for the optimizer.
The only useful scenario I can think of is that, if we were to attach this
sub-partition back to the parent table using the same strategy, it might reduce
the checks for partition constraints.
Best Regards,
Fei Changhong
Fei Changhong
Re: BUG #18371: There are wrong constraint residues when detach hash partiton concurrently
From
Tender Wang
Date:
alvherre <alvherre@alvh.no-ip.org> 于2024年7月16日周二 03:35写道:
On 2024-Feb-29, feichanghong wrote:
> > This can work normally on range partitions. However, the constraint on hash
> > partitions uses satisfies_hash_partition with the OID of the parent table, and
> > the newly created constraint does not take effect. For example, in the following
> > case, although there is a t_p1_a_check constraint on t_p1, it is still possible
> > to perform an insert:
> What I said here is wrong, the constraints on the hash partition will also take
> effect. But this constraint depends on the oid of the parent partition.
We should definitely not have this constraint on hash-partition tables
after the detach. However, I wonder if instead of adding it and later
removing it as you propose, it wouldn't be better to just not add it in
the first place. As a first step, I tried commenting out and found that
no interesting test fails (only alter_table.sql fails but only because
the constraint is not there when looking for it specifically.)
The current code does not explain *why* we have to add this constraint,
and I had forgotten, so I went to look at the first patch submission in
that thread [1] and saw this comment there:
+ /*
+ * Concurrent mode has to work harder; first we add a new constraint to the
+ * partition that matches the partition constraint. The reason for this is
+ * that the planner may have made optimizations that depend on the
+ * constraint. XXX Isn't it sufficient to invalidate the partition's
+ * relcache entry?
I'm trying to figure out whether it's possible that the planner would
make optimizations based on the hashing function. Quite possibly it
won't. If that's so, then we should just not make the constraint at
all, which would make the fix even simpler. I also wonder, maybe that
XXX comment (which I removed before committing the patch) is right and
we don't actually need the constraint with _any_ partition strategy, not
just hash.
The planner will do partition pruning based on partbound, if there is no the
new adding constraint when detaching concurrently, can we make sure that
the query see the consistent result?
Tender Wang
Re: BUG #18371: There are wrong constraint residues when detach hash partiton concurrently
From
Tender Wang
Date:
alvherre <alvherre@alvh.no-ip.org> 于2024年7月16日周二 03:35写道:
On 2024-Feb-29, feichanghong wrote:
> > This can work normally on range partitions. However, the constraint on hash
> > partitions uses satisfies_hash_partition with the OID of the parent table, and
> > the newly created constraint does not take effect. For example, in the following
> > case, although there is a t_p1_a_check constraint on t_p1, it is still possible
> > to perform an insert:
> What I said here is wrong, the constraints on the hash partition will also take
> effect. But this constraint depends on the oid of the parent partition.
We should definitely not have this constraint on hash-partition tables
after the detach. However, I wonder if instead of adding it and later
removing it as you propose, it wouldn't be better to just not add it in
the first place. As a first step, I tried commenting out and found that
no interesting test fails (only alter_table.sql fails but only because
the constraint is not there when looking for it specifically.)
The current code does not explain *why* we have to add this constraint,
and I had forgotten, so I went to look at the first patch submission in
that thread [1] and saw this comment there:
+ /*
+ * Concurrent mode has to work harder; first we add a new constraint to the
+ * partition that matches the partition constraint. The reason for this is
+ * that the planner may have made optimizations that depend on the
+ * constraint. XXX Isn't it sufficient to invalidate the partition's
+ * relcache entry?
I'm trying to figure out whether it's possible that the planner would
make optimizations based on the hashing function. Quite possibly it
won't. If that's so, then we should just not make the constraint at
all, which would make the fix even simpler. I also wonder, maybe that
XXX comment (which I removed before committing the patch) is right and
we don't actually need the constraint with _any_ partition strategy, not
just hash.
I pick up this issue again. I skim through the codes and I agree with your point.
We don't acually need the constraint with any partition strategy, not just hash.
If we have old queries when we are doing detach, the detach session will be stopped at
WaitForLockersMultiple(). Other sessions can't insert tuples that violate partition constraint.
Because the detached partition still has partition constranit.
Thanks,
Tender Wang