On 2/5/25 05:37, Peter Eisentraut wrote:
> On 29.01.25 07:34, Paul Jungwirth wrote:
>> Is it possible to commit an RI_PLAN_NO_ACTION addition and see if that makes the buildfarm
>> failures go away? Here is a proposed patch for that (v48.1). I would understand if this is too
>> questionable a practice---but it would be nice to get sufficient test exposure to see if it makes
>> a difference. Since I still haven't reproduced this locally (despite running continuously for
>> almost a week), it's not an experiment I can do myself. If it *does* make the failures go away,
>> then it suggests there is still some latent problem somewhere.
>
> I'm tempted to give this a try. But the cfbot is currently in a bit of a mess, so I'll wait until
> that is clean again so that we can have a usable baseline to work against.
Okay, thanks! I've been spending some more time on this, but I haven't made much progress.
It's surely not as simple as just oid wrapround. Here is a bpftrace script to show when we change
TransamVariables->nextOid:
BEGIN {
@setnext = 0
}
u:/home/paul/local/bin/postgres:GetNewObjectId {
@newoids[tid] += 1
}
u:/home/paul/local/bin/postgres:SetNextObjectId {
@setnext += 1
}
When I run this during `make installcheck` I get only 29608 total calls to GetNewObjectId, and none
for SetNextObjectId.
I've also been looking at the dynahash code a bit. With gdb I can give two constraint oids a hash
collision, but of course that isn't sufficient, since we memcmp the whole key as well.
Last night I started looking at ri_constraint_cache, which is maybe a little more interesting due to
the syscache invalidation code. A parallel test could cause an invalidation between lines of the
without_overlaps test. Getting the wrong riinfo could make us treat a RESTRICT constraint as NO
ACTION. But I don't see any way for that to happen yet.
I have too much confidence in the Postgres codebase to really expect to find bugs in any of these
places. And yet I don't see how 1772d554b0 could make a RESTRICT test fail, since all its changes
are wrapped in `if (is_no_action)`---except if the RESTRICT constraint is somehow executing the NO
ACTION query by mistake.
Anyway I'll keep at it!
Yours,
--
Paul ~{:-)
pj@illuminatedcomputing.com