> I thought further about the lockmode calling find_inheritance_children > in ATPrepAddPrimaryKey. > What we do here? We first get oids of children, then check the if the > column of children has marked not-null, if not, report an error. > No operation here on children. I check other places that call > find_inheritance_children, if we have operation on children, we usually pass > Lockmode to find_inheritance_children, otherwise pass NoLock.
Hmm, I'm wary of doing this, although you're perhaps right that there's no harm. If we do need to add a not-null constraint on the children, surely we'll acquire a stronger lock further down the execution chain. In principle this sounds a good idea though. (I'm not sure about doing SearchSysCacheAttName() on a relation that might be dropped concurrently; does dropping the child acquire lock on its parent? I suppose so, in which case this is okay; but still icky. What about DETACH CONCURRENTLY?)
Yes, I'm also wary of doing this.
Although NoLock may fix this issue, I feel it will trigger other problems, such as the scenario you listed above.
However, I've also been looking at this and realized that this code can have different structure which may allows us to skip the find_inheritance_children() altogether. The reason is that we already scan the parent's list of columns searching for not-null constraints on each of them; we only need to run this verification on children for columns where there is none in the parent, and then only in the case where recursion is turned off.
So I propose the attached patch, which also has some comments to hopefully explain what is going on and why. I ran Tom's test script a few hundred times in a loop and I see no deadlock anymore.
No objection from me. The comments may need a little polishing.