Re: SQL:2011 application time - Mailing list pgsql-hackers

From Paul Jungwirth
Subject Re: SQL:2011 application time
Date
Msg-id adbb0b45-2ff1-41a8-a119-b332f9d4bdc4@illuminatedcomputing.com
Whole thread Raw
In response to SQL:2011 application time  (Paul A Jungwirth <pj@illuminatedcomputing.com>)
List pgsql-hackers
On 2/5/25 05:37, Peter Eisentraut wrote:
> On 29.01.25 07:34, Paul Jungwirth wrote:
>> Is it possible to commit an RI_PLAN_NO_ACTION addition and see if that makes the buildfarm 
>> failures go away? Here is a proposed patch for that (v48.1). I would understand if this is too 
>> questionable a practice---but it would be nice to get sufficient test exposure to see if it makes 
>> a difference. Since I still haven't reproduced this locally (despite running continuously for 
>> almost a week), it's not an experiment I can do myself. If it *does* make the failures go away, 
>> then it suggests there is still some latent problem somewhere.
> 
> I'm tempted to give this a try.  But the cfbot is currently in a bit of a mess, so I'll wait until 
> that is clean again so that we can have a usable baseline to work against.

Okay, thanks! I've been spending some more time on this, but I haven't made much progress.

It's surely not as simple as just oid wrapround. Here is a bpftrace script to show when we change 
TransamVariables->nextOid:

BEGIN {
   @setnext = 0
}

u:/home/paul/local/bin/postgres:GetNewObjectId {
   @newoids[tid] += 1
}

u:/home/paul/local/bin/postgres:SetNextObjectId {
   @setnext += 1
}

When I run this during `make installcheck` I get only 29608 total calls to GetNewObjectId, and none 
for SetNextObjectId.

I've also been looking at the dynahash code a bit. With gdb I can give two constraint oids a hash 
collision, but of course that isn't sufficient, since we memcmp the whole key as well.

Last night I started looking at ri_constraint_cache, which is maybe a little more interesting due to 
the syscache invalidation code. A parallel test could cause an invalidation between lines of the 
without_overlaps test. Getting the wrong riinfo could make us treat a RESTRICT constraint as NO 
ACTION. But I don't see any way for that to happen yet.

I have too much confidence in the Postgres codebase to really expect to find bugs in any of these 
places. And yet I don't see how 1772d554b0 could make a RESTRICT test fail, since all its changes 
are wrapped in `if (is_no_action)`---except if the RESTRICT constraint is somehow executing the NO 
ACTION query by mistake.

Anyway I'll keep at it!

Yours,

-- 
Paul              ~{:-)
pj@illuminatedcomputing.com




pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: Make COPY format extendable: Extract COPY TO format implementations
Next
From: Tom Lane
Date:
Subject: Re: Show WAL write and fsync stats in pg_stat_io