Thread: CREATE INDEX CONCURRENTLY does not index prepared xact's data

CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Andrey Borodin
Date:
Hi hackers!

$subj.

Steps to reproduce:
create extension if not exists amcheck;
create table if not exists t1(i int);
begin;
insert into t1 values(1);
prepare transaction 'x';
create index concurrently i1 on t1(i);
commit prepared 'x';
select bt_index_check('i1', true);

I observe:
NOTICE:  heap tuple (1,8) from table "t1" lacks matching index tuple within index "i1"
I expect: awaiting 'x' commit before index is created, correct index after.

This happens because WaitForLockersMultiple() does not take prepared xacts into account. Meanwhile CREATE INDEX
CONCURRENTLYexpects that locks are dropped only when transaction commit is visible. 

This issue affects pg_repack and similar machinery based on CIC.

PFA draft of a fix.

Best regards, Andrey Borodin.


Attachment

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Victor Yegorov
Date:
сб, 19 дек. 2020 г. в 18:13, Andrey Borodin <x4mmm@yandex-team.ru>:
Steps to reproduce:
create extension if not exists amcheck;
create table if not exists t1(i int);
begin;
insert into t1 values(1);
prepare transaction 'x';
create index concurrently i1 on t1(i);
commit prepared 'x';
select bt_index_check('i1', true);

I observe:
NOTICE:  heap tuple (1,8) from table "t1" lacks matching index tuple within index "i1"
I expect: awaiting 'x' commit before index is created, correct index after.

CREATE INDEX CONCURRENTLY isn't supposed to be run inside a transaction?.. 


--
Victor Yegorov

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Andrey Borodin
Date:

> 19 дек. 2020 г., в 22:22, Victor Yegorov <vyegorov@gmail.com> написал(а):
>
> CREATE INDEX CONCURRENTLY isn't supposed to be run inside a transaction?..

CREATE INDEX CONCURRENTLY cannot run inside a transaction block.

Best regards, Andrey Borodin.


Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Victor Yegorov
Date:
сб, 19 дек. 2020 г. в 18:13, Andrey Borodin <x4mmm@yandex-team.ru>:
I observe:
NOTICE:  heap tuple (1,8) from table "t1" lacks matching index tuple within index "i1"
I expect: awaiting 'x' commit before index is created, correct index after.

I agree, that behaviour is unexpected. But getting a notice that requires me
to re-create the index some time later is not better (from DBA perspective).

Maybe it'd be better to wait on prepared xacts like on other open ordinary transactions?


--
Victor Yegorov

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Tom Lane
Date:
Andrey Borodin <x4mmm@yandex-team.ru> writes:
> This happens because WaitForLockersMultiple() does not take prepared
> xacts into account.

Ugh, clearly an oversight.

> Meanwhile CREATE INDEX CONCURRENTLY expects that locks are dropped only
> when transaction commit is visible.

Don't follow your point here --- I'm pretty sure that prepared xacts
continue to hold their locks.

> PFA draft of a fix.

Haven't you completely broken VirtualXactLock()?  Certainly, whether the
target is a normal or prepared transaction shouldn't alter the meaning
of the "wait" flag.

In general, I wonder whether WaitForLockersMultiple and GetLockConflicts
need to gain an additional parameter indicating whether to consider
prepared xacts.  It's not clear to me that their current behavior is wrong
for all possible uses.

            regards, tom lane



Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Andrey Borodin
Date:

> 19 дек. 2020 г., в 22:48, Victor Yegorov <vyegorov@gmail.com> написал(а):
>
> сб, 19 дек. 2020 г. в 18:13, Andrey Borodin <x4mmm@yandex-team.ru>:
> I observe:
> NOTICE:  heap tuple (1,8) from table "t1" lacks matching index tuple within index "i1"
> I expect: awaiting 'x' commit before index is created, correct index after.
>
> I agree, that behaviour is unexpected. But getting a notice that requires me
> to re-create the index some time later is not better (from DBA perspective).
>
> Maybe it'd be better to wait on prepared xacts like on other open ordinary transactions?

This is not a real NOTICE. I just used a bit altered amcheck to diagnose the problem. It's an incorrect index. It lacks
sometuples. It will not find existing data, failing to provide "read committed" consistency guaranties. 
Fix waits for prepared xacts just like any other transaction.



> 19 дек. 2020 г., в 22:57, Tom Lane <tgl@sss.pgh.pa.us> написал(а):
>
>> Meanwhile CREATE INDEX CONCURRENTLY expects that locks are dropped only
>> when transaction commit is visible.
>
> Don't follow your point here --- I'm pretty sure that prepared xacts
> continue to hold their locks.
Uhmm, yes, locks are there. Relation is locked with RowExclusiveLock, but this lock is ignored by
WaitForLockers(heaplocktag,ShareLock, true). Locking relation with ShareLock works as expected. 

>> PFA draft of a fix.
>
> Haven't you completely broken VirtualXactLock()?  Certainly, whether the
> target is a normal or prepared transaction shouldn't alter the meaning
> of the "wait" flag.
You are right, the patch has a bug here.

> In general, I wonder whether WaitForLockersMultiple and GetLockConflicts
> need to gain an additional parameter indicating whether to consider
> prepared xacts.  It's not clear to me that their current behavior is wrong
> for all possible uses.
I don't see usages besides indexing stuff. But maybe it worth to analyse each case...

BTW do we need a test for this? Will isolation test be good at checking this?

Thanks!

Best regards, Andrey Borodin.


Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Andrey Borodin
Date:

> 19 дек. 2020 г., в 23:25, Andrey Borodin <x4mmm@yandex-team.ru> написал(а):
>
>
> BTW do we need a test for this? Will isolation test be good at checking this?

PFA patch set with isolation test for the $subj and fix for VirtualXactLock() bug.

I think I'll register the thread on January CF.

Thanks!

Best regards, Andrey Borodin.


Attachment

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Michael Paquier
Date:
On Sat, Dec 19, 2020 at 12:57:41PM -0500, Tom Lane wrote:
> Andrey Borodin <x4mmm@yandex-team.ru> writes:
>> This happens because WaitForLockersMultiple() does not take prepared
>> xacts into account.
>
> Ugh, clearly an oversight.

This looks to be the case since 295e639 where virtual XIDs have been
introduced.  So this is an old bug.

> Don't follow your point here --- I'm pretty sure that prepared xacts
> continue to hold their locks.

Yes, that's what I recall as well.

> Haven't you completely broken VirtualXactLock()?  Certainly, whether the
> target is a normal or prepared transaction shouldn't alter the meaning
> of the "wait" flag.

Yep.

> In general, I wonder whether WaitForLockersMultiple and GetLockConflicts
> need to gain an additional parameter indicating whether to consider
> prepared xacts.  It's not clear to me that their current behavior is wrong
> for all possible uses.

WaitForLockers is used only by REINDEX and CREATE/DROP CONCURRENTLY,
where it seems to me we need to care about all the cases related to
concurrent build, validation and index drop.  The other caller of
GetLockConflicts() is for conflict resolution in standbys where it is
fine to ignore 2PC transactions as these cannot be cancelled.  So I
agree that we are going to need more control with a new option
argument to be able to control if 2PC transactions are ignored or
not.

Hmm.  The approach taken by the patch looks to be back-patchable.
Based on the lack of complaints on the matter, we could consider
instead putting an error in WaitForLockersMultiple() if there is at
least one numPrepXact which would at least avoid inconsistent data.
But I don't think what's proposed here is bad either.

VirtualTransactionIdIsValidOrPreparedXact() is confusing IMO, knowing
that VirtualTransactionIdIsPreparedXact() combined with
LocalTransactionIdIsValid() would be enough to do the job.

-       Assert(VirtualTransactionIdIsValid(vxid));
+       Assert(VirtualTransactionIdIsValidOrPreparedXact(vxid));
+
+       if (VirtualTransactionIdIsPreparedXact(vxid))
[...]
#define VirtualTransactionIdIsPreparedXact(vxid) \
    ((vxid).backendId == InvalidBackendId)
This would allow the case where backendId and localTransactionId are
both invalid.  So it would be better to also check in
VirtualTransactionIdIsPreparedXact() that the XID is not invalid, no?
--
Michael

Attachment

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Andrey Borodin
Date:

> 21 дек. 2020 г., в 10:40, Michael Paquier <michael@paquier.xyz> написал(а):
>
>> In general, I wonder whether WaitForLockersMultiple and GetLockConflicts
>> need to gain an additional parameter indicating whether to consider
>> prepared xacts.  It's not clear to me that their current behavior is wrong
>> for all possible uses.
>
> WaitForLockers is used only by REINDEX and CREATE/DROP CONCURRENTLY,
> where it seems to me we need to care about all the cases related to
> concurrent build, validation and index drop.  The other caller of
> GetLockConflicts() is for conflict resolution in standbys where it is
> fine to ignore 2PC transactions as these cannot be cancelled.

I don't think that fact that we cannot cancel transaction is sufficient here to ignore prepared transaction. I think
thereshould not exist any prepared transaction that we need to cancel in standby conflict resolution. And if it exists
-it's a sign of corruption, we could emit warning or something like that. 
But I'm really not an expert here, just a common sense that prepared transaction is just like regular transaction that
survivescrash. If we wait for any transaction - probably we should wait for prepared too. I'm not insisting on anything
though.

>  So I
> agree that we are going to need more control with a new option
> argument to be able to control if 2PC transactions are ignored or
> not.
>
> Hmm.  The approach taken by the patch looks to be back-patchable.
> Based on the lack of complaints on the matter, we could consider
> instead putting an error in WaitForLockersMultiple() if there is at
> least one numPrepXact which would at least avoid inconsistent data.
> But I don't think what's proposed here is bad either.
>
> VirtualTransactionIdIsValidOrPreparedXact() is confusing IMO, knowing
> that VirtualTransactionIdIsPreparedXact() combined with
> LocalTransactionIdIsValid() would be enough to do the job.
>
> -       Assert(VirtualTransactionIdIsValid(vxid));
> +       Assert(VirtualTransactionIdIsValidOrPreparedXact(vxid));
> +
> +       if (VirtualTransactionIdIsPreparedXact(vxid))
> [...]
> #define VirtualTransactionIdIsPreparedXact(vxid) \
>    ((vxid).backendId == InvalidBackendId)
> This would allow the case where backendId and localTransactionId are
> both invalid.  So it would be better to also check in
> VirtualTransactionIdIsPreparedXact() that the XID is not invalid, no?
Seems valid. Removed VirtualTransactionIdIsValidOrPreparedXact() from patch.

Thanks!

Best regards, Andrey Borodin.

Attachment

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Andrey Borodin
Date:

> 21 дек. 2020 г., в 12:24, Andrey Borodin <x4mmm@yandex-team.ru> написал(а):
>
>
> Seems valid. Removed VirtualTransactionIdIsValidOrPreparedXact() from patch.

Sorry for the noise, removal was not complete.

Best regards, Andrey Borodin.


Attachment

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Andrey Borodin
Date:

> 21 дек. 2020 г., в 10:40, Michael Paquier <michael@paquier.xyz> написал(а):
>
> Hmm.  The approach taken by the patch looks to be back-patchable.
I was checking that patch\test works in supported branches and everything seems to be fine down to REL_10_STABLE.

But my machines (Ubuntu18 and MacOS) report several fails in isolation tests on REL9_6_20\REL9_6_STABLE. Particularly
eval-plan-qual-trigger,drop-index-concurrently-1, and async-notify. I do not observe problems with regular isolation
teststhough. 

Do I understand correctly that check-world tests on buildfarm 'make check-prepared-txns' and the problem is somewhere
inmy machines? Or something is actually broken\outdated? 

Thanks!

Best regards, Andrey Borodin.


Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Michael Paquier
Date:
On Wed, Dec 23, 2020 at 12:23:28PM +0500, Andrey Borodin wrote:
> Do I understand correctly that check-world tests on buildfarm 'make
> check-prepared-txns' and the problem is somewhere in my machines? Or
> something is actually broken\outdated?

The buildfarm code has no trace of check-prepared-txns.  And, FWIW, I
see no failures at the top of REL9_6_STABLE.  Do you mean that this
happens only with your patch?  Or do you mean that you see failures
using the stable branch of upstream?  I have not tested the former,
but the latter works fine on my end.
--
Michael

Attachment

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Andrey Borodin
Date:

> 23 дек. 2020 г., в 12:52, Michael Paquier <michael@paquier.xyz> написал(а):
>
> On Wed, Dec 23, 2020 at 12:23:28PM +0500, Andrey Borodin wrote:
>> Do I understand correctly that check-world tests on buildfarm 'make
>> check-prepared-txns' and the problem is somewhere in my machines? Or
>> something is actually broken\outdated?
>
> FWIW, I
> see no failures at the top of REL9_6_STABLE.

Thanks, Michael! The problem was indeed within my machines. maintainer-cleanup is not enough for make
check-prepared-txns.Fresh real installation is necessary. 

I've checked that test works down to REL9_5_STABLE with patch.

Thanks!

Best regards, Andrey Borodin.


Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Andrey Borodin
Date:
Thanks for looking into this!

> 17 янв. 2021 г., в 12:24, Noah Misch <noah@leadboat.com> написал(а):
>
>> --- a/src/backend/storage/lmgr/lock.c
>> +++ b/src/backend/storage/lmgr/lock.c
>> @@ -2931,15 +2929,17 @@ GetLockConflicts(const LOCKTAG *locktag, LOCKMODE lockmode, int *countp)
>>
>>     /*
>>      * Allocate memory to store results, and fill with InvalidVXID.  We only
>> -     * need enough space for MaxBackends + a terminator, since prepared xacts
>> -     * don't count. InHotStandby allocate once in TopMemoryContext.
>> +     * need enough space for MaxBackends + max_prepared_xacts + a
>> +     * terminator. InHotStandby allocate once in TopMemoryContext.
>>      */
>>     if (InHotStandby)
>>     {
>>         if (vxids == NULL)
>>             vxids = (VirtualTransactionId *)
>>                 MemoryContextAlloc(TopMemoryContext,
>> -                                   sizeof(VirtualTransactionId) * (MaxBackends + 1));
>> +                                   sizeof(VirtualTransactionId) * (MaxBackends
>> +                                   + max_prepared_xacts
>> +                                   + 1));
>
> PostgreSQL typically puts the operator before the newline.  Also, please note
> the whitespace error that "git diff --check origin/master" reports.
Fixed.
>
>>     }
>>     else
>>         vxids = (VirtualTransactionId *)
>
> This is updating only the InHotStandby branch.  Both branches need the change.
Fixed.
>
>> @@ -4461,9 +4462,21 @@ bool
>> VirtualXactLock(VirtualTransactionId vxid, bool wait)
>> {
>>     LOCKTAG        tag;
>> -    PGPROC       *proc;
>> +    PGPROC       *proc = NULL;
>>
>> -    Assert(VirtualTransactionIdIsValid(vxid));
>> +    Assert(VirtualTransactionIdIsValid(vxid) ||
>> +            VirtualTransactionIdIsPreparedXact(vxid));
>> +
>> +    if (VirtualTransactionIdIsPreparedXact(vxid))
>> +    {
>> +        LockAcquireResult lar;
>> +        /* If it's prepared xact - just wait for it */
>> +        SET_LOCKTAG_TRANSACTION(tag, vxid.localTransactionId);
>> +        lar = LockAcquire(&tag, ShareLock, false, !wait);
>> +        if (lar == LOCKACQUIRE_OK)
>
> This should instead test "!= LOCKACQUIRE_NOT_AVAIL", lest
> LOCKACQUIRE_ALREADY_HELD happen.  (It perhaps can't happen, but skipping the
> LockRelease() would be wrong if it did.)
I think that code that successfully acquired lock should release it. Other way we risk to release someone's else lock
heldfor a reason. We only acquire lock to release it instantly anyway. 

>
>> +            LockRelease(&tag, ShareLock, false);
>> +        return lar != LOCKACQUIRE_NOT_AVAIL;
>> +    }
>>
>>     SET_LOCKTAG_VIRTUALTRANSACTION(tag, vxid);
>>
>> diff --git a/src/include/storage/lock.h b/src/include/storage/lock.h
>> index 1c3e9c1999..cedb9d6d2a 100644
>> --- a/src/include/storage/lock.h
>> +++ b/src/include/storage/lock.h
>> @@ -70,6 +70,8 @@ typedef struct
>> #define VirtualTransactionIdIsValid(vxid) \
>>     (((vxid).backendId != InvalidBackendId) && \
>>      LocalTransactionIdIsValid((vxid).localTransactionId))
>> +#define VirtualTransactionIdIsPreparedXact(vxid) \
>> +    ((vxid).backendId == InvalidBackendId)
>
> Your patch is introducing VirtualTransactionId values that represent prepared
> xacts, and it has VirtualTransactionIdIsValid() return false for those values.
> Let's make VirtualTransactionIdIsValid() return true for them, since they're
> as meaningful as any other value.  The GetLockConflicts() header comment says
> "The result array is palloc'd and is terminated with an invalid VXID."  Patch
> v4 falsifies that comment.  The array can contain many of these new "invalid"
> VXIDs, and an all-zeroes VXID terminates the array.  Rather than change the
> comment, let's change VirtualTransactionIdIsValid() to render the comment true
> again.  Most (perhaps all) VirtualTransactionIdIsValid() callers won't need to
> distinguish the prepared-transaction case.
Makes sense, fixed. I was afraid that there's something I'm not aware about. I've iterated over
VirtualTransactionIdIsValid()calls and did not find suspicious cases. 

> An alternative to redefining VXID this way would be to have some new type,
> each instance of which holds either a valid VXID or a valid
> prepared-transaction XID.  That alternative feels inferior to me, though.
> What do you think?
I think we should not call valid vxids "invalid".

By the way maybe rename "check-prepared-txns" to "check-prepared-xacts" for consistency?

Thanks!

Best regards, Andrey Borodin.


Attachment

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Andrey Borodin
Date:

> 22 янв. 2021 г., в 10:44, Andrey Borodin <x4mmm@yandex-team.ru> написал(а):
>
> <v5-0001-Wait-for-prepared-xacts-in-CREATE-INDEX-CONCURREN.patch><v5-0002-Add-test-for-CIC-with-prepared-xacts.patch>
Uh, vscode did not save files and I've send incorrect version. Disregard v5.
Sorry for the noise.

Best regards, Andrey Borodin.


Attachment

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Andrey Borodin
Date:

> 24 янв. 2021 г., в 07:27, Noah Misch <noah@leadboat.com> написал(а):
>
> I changed that, updated comments, and fixed pgindent damage.  I plan to push
> the attached version.

I see that patch was pushed. I'll flip status of CF entry. Many thanks!

Best regards, Andrey Borodin.


Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Andrey Borodin
Date:

> 30 янв. 2021 г., в 21:06, Andrey Borodin <x4mmm@yandex-team.ru> написал(а):
>
>
>
>> 24 янв. 2021 г., в 07:27, Noah Misch <noah@leadboat.com> написал(а):
>>
>> I changed that, updated comments, and fixed pgindent damage.  I plan to push
>> the attached version.
>
> I see that patch was pushed. I'll flip status of CF entry. Many thanks!

FWIW I have 2 new reported cases on 12.6. I've double-checked that at the moment of corruption installation run version
withthe patch. 
To the date I could not reproduce the problem myself, but I'll continue working on this.

Thanks!

Best regards, Andrey Borodin.


Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Andrey Borodin
Date:

> 1 мая 2021 г., в 17:42, Andrey Borodin <x4mmm@yandex-team.ru> написал(а):
>
>
> FWIW I have 2 new reported cases on 12.6.

Sorry to say that, but $subj persists. Here's a simple reproduction.
To get corrupted index you need 3 psql sessions A, B and C. By default command is executed in A.

create extension amcheck ;
create table t1(i int);
create index on t1(i);

begin;
insert into t1 values(0);

-- session C: reindex table concurrently t1;

prepare transaction 'a';
begin;
insert into t1 values(0);
-- session B: commit prepared 'a';
prepare transaction 'b';
begin;
insert into t1 values(0);
-- session B: commit prepared 'b';
prepare transaction 'c';

begin;
insert into t1 values(0);
-- session B: commit prepared 'c';
prepare transaction 'd';
commit prepared 'd';

-- session C: postgres=# select bt_index_check('i1',true);
ERROR:  heap tuple (0,2) from table "t1" lacks matching index tuple within index "i1"
HINT:  Retrying verification using the function bt_index_parent_check() might provide a more specific error.

The problem is WaitForLockersMultiple() gathers running vxids and 2pc xids. Then it waits, but if vxid is converted to
2pcit fails to wait. 
I could not compose isolation test for this, because we need to do "prepare transaction 'a';" only when "reindex table
concurrentlyt1;" is already working for some time. 

To fix it we can return locking xids along with vxids from GetLockConflicts() like in attached diff. But this approach
seemsugly. 


Best regards, Andrey Borodin.

Attachment

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Andrey Borodin
Date:

> 18 июля 2021 г., в 01:12, Noah Misch <noah@leadboat.com> написал(а):
>
> Suppose some transaction has a vxid but no xid.  Shortly after
> GetLockConflicts(), it acquires an xid, modifies the table, and issues PREPARE
> TRANSACTION.  Could that cause a corrupt index even with this diff?

Firstly I've tried to stress things out. This little go program [0] easily reproduces corruption on patched code.
Meanwhile vxid->xid->2px program does not [1] (both patched and unpatched).

I think CIC does not care much about VXIDs at all. It's only important when real XID started: before GetLockConflicts()
orafter. 

Thanks!


Best regards, Andrey Borodin.

[0] https://gist.github.com/x4m/8b6025995eedf29bf588727375014dfc#file-stress-xid-2px
[1] https://gist.github.com/x4m/8b6025995eedf29bf588727375014dfc#file-stress-vxid-xid-2px


Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Andrey Borodin
Date:

> 19 июля 2021 г., в 05:30, Noah Misch <noah@leadboat.com> написал(а):
>
> To fix $SUBJECT, it sounds like we need a way to identify a transaction,
> usable as early as the transaction's first catalog access and remaining valid
> until COMMIT PREPARED finishes.  We may initially see a transaction as having
> a VXID and no XID, then later need to wait for that transaction when it has
> entered prepared state, having an XID and no VXID.  How might we achieve that?

PFA draft with vxid->xid mapping and subsequent wait for it. The patch, obviously, lacks a ton of comments explaining
whatis going on. 
We write actual VXID into dummy proc entries of prepared xact.
When we wait for vxid we try to convert it to xid through real proc entry. If we cannot do so - we lookup in shared 2pc
state.If vxid is not there - it means it is already gone and there's nothing to wait. 

Thanks!

Best regards, Andrey Borodin.



Attachment

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Andrey Borodin
Date:
> 19 июля 2021 г., в 23:10, Noah Misch <noah@leadboat.com> написал(а):
>
> On Mon, Jul 19, 2021 at 12:10:52PM +0500, Andrey Borodin wrote:
>>>> 19 июля 2021 г., в 05:30, Noah Misch <noah@leadboat.com> написал(а):
>>>
>>> To fix $SUBJECT, it sounds like we need a way to identify a transaction,
>>> usable as early as the transaction's first catalog access and remaining valid
>>> until COMMIT PREPARED finishes.  We may initially see a transaction as having
>>> a VXID and no XID, then later need to wait for that transaction when it has
>>> entered prepared state, having an XID and no VXID.  How might we achieve that?
>>
>> PFA draft with vxid->xid mapping and subsequent wait for it. The patch, obviously, lacks a ton of comments
explainingwhat is going on. 
>> We write actual VXID into dummy proc entries of prepared xact.
>> When we wait for vxid we try to convert it to xid through real proc entry. If we cannot do so - we lookup in shared
2pcstate. If vxid is not there - it means it is already gone and there's nothing to wait. 
>
> When the system reuses BackendId values, it reuses VXID values.  In the
> general case, two prepared transactions could exist simultaneously with the
> same BackendId+LocalTransactionId.  Hmm.  It could be okay to have a small
> probability that CIC waits on more transactions than necessary.  Suppose we
> have three PGPROC entries with the same VXID, two prepared transactions and
> one regular transaction.  Waiting for all three could be tolerable, though
> avoiding that would be nice.  Should we follow transactions differently to
> avoid that?

We don’t have to wait for regular Xid in this case at all. Because it would be finished with VXID. But I think that we
haveto wait for all 2PCs with the same VXID. 

We are looking for transaction that was only VXID during GetLockConflicts(). In conflicts array we may have each VXID
onlyonce. 
Other 2PCs with same VXID may be older or newer than target 2PC.
Older 2PCs must be with XID in conflicts array. So we might wait for all 2PC with known XIDs. Then for each ambiguous
VXID->XIDmapping choose oldest XID. 

But this logic seem to me overly complicated. Or isn’t it?

Best regards, Andrey Borodin.






Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Andrey Borodin
Date:

> 19 июля 2021 г., в 23:41, Andrey Borodin <x4mmm@yandex-team.ru> написал(а):
>
>
> We are looking for transaction that was only VXID during GetLockConflicts(). In conflicts array we may have each VXID
onlyonce. 
> Other 2PCs with same VXID may be older or newer than target 2PC.
> Older 2PCs must be with XID in conflicts array. So we might wait for all 2PC with known XIDs. Then for each ambiguous
VXID->XIDmapping choose oldest XID. 
>
> But this logic seem to me overly complicated. Or isn’t it?

Here's the PoC to asses complexity of this solution.

Best regards, Andrey Borodin.


Attachment

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Andrey Borodin
Date:

> 19 июля 2021 г., в 23:41, Andrey Borodin <x4mmm@yandex-team.ru> написал(а):
>
>
> We are looking for transaction that was only VXID during GetLockConflicts(). In conflicts array we may have each VXID
onlyonce. 
> Other 2PCs with same VXID may be older or newer than target 2PC.
> Older 2PCs must be with XID in conflicts array. So we might wait for all 2PC with known XIDs. Then for each ambiguous
VXID->XIDmapping choose oldest XID. 
>
> But this logic seem to me overly complicated. Or isn’t it?

> Other 2PCs with same VXID may be older or newer than target 2PC. Older 2PCs must be with XID in conflicts array.
Unfortunately, this is just wrong. Older 2PC with same VXID don't have to be in conflicts array. They might be of some
otherunrelated 2PC working with different relations. 

Sorry for the noise.

Best regards, Andrey Borodin.


Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Andrey Borodin
Date:

> 21 июля 2021 г., в 02:49, Noah Misch <noah@leadboat.com> написал(а):
>
> On Wed, Jul 21, 2021 at 12:38:25AM +0500, Andrey Borodin wrote:
>>> 19 июля 2021 г., в 23:41, Andrey Borodin <x4mmm@yandex-team.ru> написал(а):
>>>> 19 июля 2021 г., в 23:10, Noah Misch <noah@leadboat.com> написал(а):
>>>> Suppose we
>>>> have three PGPROC entries with the same VXID, two prepared transactions and
>>>> one regular transaction.  Waiting for all three could be tolerable, though
>>>> avoiding that would be nice.  Should we follow transactions differently to
>>>> avoid that?
>>>
>>> We don’t have to wait for regular Xid in this case at all. Because it would be finished with VXID.
>
> I don't understand those two sentences.  Could you write more?
Suppose we have a VXID conflicting with reindexed relation lock, and have a PGPROC with regular Xid (not 2PC) for this
VXID.
We do not need to return this xid from TwoPhaseGetXidByVXid() for extra wait. This situation is covered by normal vxid
handlingin VirtualXactLock(). 
+    /* Save the xid to test if transaction coverted to 2pc later */
+    xid = proc->xid;


>>> We are looking for transaction that was only VXID during GetLockConflicts(). In conflicts array we may have each
VXIDonly once. 
>>> Other 2PCs with same VXID may be older or newer than target 2PC.
>>> Older 2PCs must be with XID in conflicts array. So we might wait for all 2PC with known XIDs. Then for each
ambiguousVXID->XID mapping choose oldest XID. 
>
>> Unfortunately, this is just wrong. Older 2PC with same VXID don't have to be in conflicts array. They might be of
someother unrelated 2PC working with different relations. 
>
> I agree.  Would it work to do the following sequence in WaitForLockers()?
>
> 1. In GetLockConflicts(), record a list of conflicting XIDs.  Also record a
>   list of conflicting LXIDs having no XID.
> 2. Wait on all LXIDs recorded in (1).  They have either ended or converted to
>   prepared transactions.
> 3. Inner-join the present list of prepared transactions with the list of
>   LXIDs from (1).  Record the XIDs of matched entries.
> 4. Wait on all XIDs recorded in (1) and (3).
>
> While that may wait on some prepared XIDs needlessly, it can't degrade into
> persistent starvation.  We could reduce the chance of a needless XID wait by
> remembering the list of old prepared XIDs before (1) and not adding any of
> those remembered, old XIDs in (3).  That last bit probably would be
> unjustified complexity, but maybe not.
I think this protocol is equivalent to waiting on all Xids with VXid.
I consider this protocol safe. FPA implementation.
Patch 0001 is intact version of previous patch.
There are two additions:
1. Prefer xids to vxids in GetLockConflicts()
2. Wait on all 2PCs with given VXid.

Thanks!

Best regards, Andrey Borodin.


Attachment

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Andrey Borodin
Date:

> 21 июля 2021 г., в 22:55, Noah Misch <noah@leadboat.com> написал(а):
>
> These drafts use reasonable concepts. Would you develop them into a
> ready-to-commit patch?  You may be able to translate your probabilistic test
> procedures from https://gist.github.com/x4m/8b6025995eedf29bf588727375014dfc
> into something like the src/bin/pgbench/t/001_pgbench_with_server.pl test of
> pg_enum_oid_index.

I'm working on it. Is it OK to use amcheck in pgbench tests? Or is it better to add the test to amcheck?

Best regards, Andrey Borodin.


Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Peter Geoghegan
Date:
On Fri, Jul 23, 2021 at 3:30 PM Noah Misch <noah@leadboat.com> wrote:
> It could be okay, but I think it's better to add the test under amcheck.  You
> could still use pgbench in the test.

+1. Writing the test that way seems more robust. If somebody writes a
patch that reintroduces a similar though distinct bug, then it seems
more likely to be caught. In other words, there is less of a danger of
the test over-specifying what incorrect means if it uses amcheck
directly.

-- 
Peter Geoghegan



Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Andrey Borodin
Date:

> 24 июля 2021 г., в 03:30, Noah Misch <noah@leadboat.com> написал(а):
>
> It could be okay, but I think it's better to add the test under amcheck.  You
> could still use pgbench in the test.
Currently it's still WIP.
I've added two tests: deterministic with psql and probabilistic with pgbench.
And I really do not like pgbench test:
1. It does not seem stable enough, it can turn buildfarm red as a good watermelon.
2. Names for 2PCs are choosen at random and have probability of collision.
3. It still breaks the fix and I have no idea why.

Can you please take a look on added TAP test? Probably I'm doing wrong a lot of things, it's the very first program on
Perlwritten by me... 
background_pgbench is a local invention. sub pgbench is a copy from nearby test. Should I refactor it somewhere?

Thanks!

Best regards, Andrey Borodin.


Attachment

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Andrey Borodin
Date:

> 27 июля 2021 г., в 17:50, Andrey Borodin <x4mmm@yandex-team.ru> написал(а):
>
>
> 3. It still breaks the fix and I have no idea why.

I'm still coping with the bug. I have a stable reproduction, but to the date unable to identify roots of the problem.
Here's the sample trace:
2021-07-29 17:14:28.996 +05 [61148] 002_cic_2pc.pl LOG:  statement: REINDEX INDEX CONCURRENTLY idx;
2021-07-29 17:14:28.996 +05 [61148] 002_cic_2pc.pl WARNING:  Phase 1
2021-07-29 17:14:28.997 +05 [61148] 002_cic_2pc.pl WARNING:  Phase 2
2021-07-29 17:14:28.997 +05 [61148] 002_cic_2pc.pl WARNING:  XXX: VirtualXactLock vxid -1/3493
2021-07-29 17:14:28.997 +05 [61148] 002_cic_2pc.pl WARNING:  XXX: PreparedXactLock xid 3493, vxid -1/3493
2021-07-29 17:14:28.997 +05 [61148] 002_cic_2pc.pl WARNING:  XXX: VirtualXactLock vxid -1/3490
2021-07-29 17:14:28.997 +05 [61148] 002_cic_2pc.pl WARNING:  XXX: PreparedXactLock xid 3490, vxid -1/3490
2021-07-29 17:14:28.997 +05 [61148] 002_cic_2pc.pl WARNING:  XXX: VirtualXactLock vxid -1/3492
2021-07-29 17:14:28.997 +05 [61148] 002_cic_2pc.pl WARNING:  XXX: PreparedXactLock xid 3492, vxid -1/3492
2021-07-29 17:14:28.999 +05 [61148] 002_cic_2pc.pl WARNING:  Phase 3
2021-07-29 17:14:28.999 +05 [61148] 002_cic_2pc.pl WARNING:  Phase 4
2021-07-29 17:14:29.000 +05 [61148] 002_cic_2pc.pl WARNING:  Phase 5
2021-07-29 17:14:29.005 +05 [61148] 002_cic_2pc.pl WARNING:  XXX: VirtualXactLock vxid -1/3512
2021-07-29 17:14:29.005 +05 [61148] 002_cic_2pc.pl WARNING:  XXX: PreparedXactLock xid 3512, vxid -1/3512
2021-07-29 17:14:29.006 +05 [61148] 002_cic_2pc.pl WARNING:  XXX: VirtualXactLock vxid -1/3513
2021-07-29 17:14:29.006 +05 [61148] 002_cic_2pc.pl WARNING:  XXX: PreparedXactLock xid 3513, vxid -1/3513
2021-07-29 17:14:29.006 +05 [61148] 002_cic_2pc.pl WARNING:  Phase 6
2021-07-29 17:14:29.006 +05 [61148] 002_cic_2pc.pl WARNING:  XXX: VirtualXactLock vxid -1/3516
2021-07-29 17:14:29.006 +05 [61148] 002_cic_2pc.pl WARNING:  XXX: PreparedXactLock xid 3516, vxid -1/3516
2021-07-29 17:14:29.006 +05 [61148] 002_cic_2pc.pl WARNING:  XXX: VirtualXactLock vxid -1/3515
2021-07-29 17:14:29.007 +05 [61148] 002_cic_2pc.pl WARNING:  XXX: PreparedXactLock xid 3515, vxid -1/3515
2021-07-29 17:14:29.007 +05 [61148] 002_cic_2pc.pl WARNING:  XXX: VirtualXactLock vxid -1/3517
2021-07-29 17:14:29.007 +05 [61148] 002_cic_2pc.pl WARNING:  XXX: PreparedXactLock xid 3517, vxid -1/3517
2021-07-29 17:14:29.007 +05 [61148] 002_cic_2pc.pl WARNING:  XXX: VirtualXactLock vxid 4/1002
2021-07-29 17:14:29.007 +05 [61148] 002_cic_2pc.pl WARNING:  Backend is doing something else
2021-07-29 17:14:29.007 +05 [61148] 002_cic_2pc.pl WARNING:  XXX: PreparedXactLock xid 0, vxid 4/1002
2021-07-29 17:14:29.007 +05 [61148] 002_cic_2pc.pl WARNING:  XXX: Sucessfully found xid by vxid 0
2021-07-29 17:14:29.007 +05 [61148] 002_cic_2pc.pl WARNING:  Phase Final
2021-07-29 17:14:29.007 +05 [61148] 002_cic_2pc.pl LOG:  statement: SELECT bt_index_check('idx',true);
2021-07-29 17:14:29.009 +05 [61148] 002_cic_2pc.pl ERROR:  heap tuple (18,74) from table "tbl" lacks matching index
tuplewithin index "idx" xmin 3504 xmax 0 
2021-07-29 17:14:29.009 +05 [61148] 002_cic_2pc.pl STATEMENT:  SELECT bt_index_check('idx',true);

Rogue tuple was committed by xid 3504 which was never returned by GetLockConflicts(). I'm attaching patch set just for
thereference of the trace, not expecting code review now. 

I've fixed unrelated bug in previous patchset.
-                       SET_LOCKTAG_TRANSACTION(tag, xid);
+                       SET_LOCKTAG_TRANSACTION(tag, xidlist.xid);
                        lar = LockAcquire(&tag, ShareLock, false, !wait);
                        if (lar != LOCKACQUIRE_NOT_AVAIL)
                                LockRelease(&tag, ShareLock, false);
-                       return lar != LOCKACQUIRE_NOT_AVAIL;
+                       if (lar == LOCKACQUIRE_NOT_AVAIL)
+                               return false;

But it does not help. Maybe I've broken something by my fix...Searching further.

Thanks for reading! Would be happy to hear any ideas why xid was not locked by GetLockConflicts() bug committed a tuple
whichwas not indexed. 

Best regards, Andrey Borodin.

Attachment

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Andrey Borodin
Date:

> 29 июля 2021 г., в 17:30, Andrey Borodin <x4mmm@yandex-team.ru> написал(а):
>
> Searching further.

Currently CIC test breaks indexes even without 2PC. How is it supposed to work if vxid stays vxid through
GetLockConflicts()\WaitForLockersMultiple()barrier and then suddenly converts to xid and commits before index
validated?

Best regards, Andrey Borodin.






Attachment

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Andrey Borodin
Date:

> 30 июля 2021 г., в 07:25, Noah Misch <noah@leadboat.com> написал(а):
> What alternative fix designs should we consider?

I observe that provided patch fixes CIC under normal transactions, but test with 2PC still fails similarly.
Unindexed tuple was committed somewhere at the end of Phase 3 or 4.
2021-07-30 15:35:31.806 +05 [25987] 002_cic_2pc.pl LOG:  statement: REINDEX INDEX CONCURRENTLY idx;
2021-07-30 15:35:31.806 +05 [25987] 002_cic_2pc.pl WARNING:  Phase 1
2021-07-30 15:35:31.806 +05 [25987] 002_cic_2pc.pl WARNING:  Phase 2
2021-07-30 15:35:31.806 +05 [25987] 002_cic_2pc.pl WARNING:  XXX: VirtualXactLock vxid -1/6735
2021-07-30 15:35:31.807 +05 [25987] 002_cic_2pc.pl WARNING:  XXX: VirtualXactLock vxid -1/6736
2021-07-30 15:35:31.808 +05 [25987] 002_cic_2pc.pl WARNING:  Phase 3
2021-07-30 15:35:31.808 +05 [25987] 002_cic_2pc.pl WARNING:  XXX: VirtualXactLock vxid -1/6750
2021-07-30 15:35:31.809 +05 [25987] 002_cic_2pc.pl WARNING:  Phase 4
2021-07-30 15:35:31.809 +05 [25987] 002_cic_2pc.pl WARNING:  Phase 5
2021-07-30 15:35:31.809 +05 [25987] 002_cic_2pc.pl WARNING:  XXX: VirtualXactLock vxid -1/6762
2021-07-30 15:35:31.809 +05 [25987] 002_cic_2pc.pl WARNING:  XXX: VirtualXactLock vxid -1/6763
2021-07-30 15:35:31.810 +05 [25987] 002_cic_2pc.pl WARNING:  Phase 6
2021-07-30 15:35:31.810 +05 [25987] 002_cic_2pc.pl WARNING:  XXX: VirtualXactLock vxid 6/2166
2021-07-30 15:35:31.810 +05 [25987] 002_cic_2pc.pl WARNING:  XXX: VirtualXactLock vxid -1/6767
2021-07-30 15:35:31.810 +05 [25987] 002_cic_2pc.pl WARNING:  XXX: VirtualXactLock vxid -1/6764
2021-07-30 15:35:31.810 +05 [25987] 002_cic_2pc.pl WARNING:  XXX: VirtualXactLock vxid -1/6765
2021-07-30 15:35:31.811 +05 [25987] 002_cic_2pc.pl WARNING:  Phase Final
2021-07-30 15:35:31.811 +05 [25987] 002_cic_2pc.pl LOG:  statement: SELECT bt_index_check('idx',true);
2021-07-30 15:35:31.813 +05 [25987] 002_cic_2pc.pl ERROR:  heap tuple (46,16) from table "tbl" lacks matching index
tuplewithin index "idx" xmin 6751 xmax 0 

Attaching debug logging patch, amcheck is modified to return xmin. Trace is gathered by "grep -e ERROR -e REINDEX -e
WARN-e SELECT tmp_check/log/002_cic_2pc_CIC_2PC_test.log". 

How deep the rabbit hole goes?

Thanks!

Best regards, Andrey Borodin.


Attachment

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Andrey Borodin
Date:

> 30 июля 2021 г., в 23:41, Noah Misch <noah@leadboat.com> написал(а):
>
> On Fri, Jul 30, 2021 at 03:42:10PM +0500, Andrey Borodin wrote:
>>> 30 июля 2021 г., в 07:25, Noah Misch <noah@leadboat.com> написал(а):
>>> What alternative fix designs should we consider?
>>
>> I observe that provided patch fixes CIC under normal transactions, but test with 2PC still fails similarly.
>> Unindexed tuple was committed somewhere at the end of Phase 3 or 4.
>> 2021-07-30 15:35:31.806 +05 [25987] 002_cic_2pc.pl LOG:  statement: REINDEX INDEX CONCURRENTLY idx;
>> 2021-07-30 15:35:31.806 +05 [25987] 002_cic_2pc.pl WARNING:  Phase 1
>> 2021-07-30 15:35:31.806 +05 [25987] 002_cic_2pc.pl WARNING:  Phase 2
>> 2021-07-30 15:35:31.806 +05 [25987] 002_cic_2pc.pl WARNING:  XXX: VirtualXactLock vxid -1/6735
>> 2021-07-30 15:35:31.807 +05 [25987] 002_cic_2pc.pl WARNING:  XXX: VirtualXactLock vxid -1/6736
>> 2021-07-30 15:35:31.808 +05 [25987] 002_cic_2pc.pl WARNING:  Phase 3
>> 2021-07-30 15:35:31.808 +05 [25987] 002_cic_2pc.pl WARNING:  XXX: VirtualXactLock vxid -1/6750
>> 2021-07-30 15:35:31.809 +05 [25987] 002_cic_2pc.pl WARNING:  Phase 4
>> 2021-07-30 15:35:31.809 +05 [25987] 002_cic_2pc.pl WARNING:  Phase 5
>> 2021-07-30 15:35:31.809 +05 [25987] 002_cic_2pc.pl WARNING:  XXX: VirtualXactLock vxid -1/6762
>> 2021-07-30 15:35:31.809 +05 [25987] 002_cic_2pc.pl WARNING:  XXX: VirtualXactLock vxid -1/6763
>> 2021-07-30 15:35:31.810 +05 [25987] 002_cic_2pc.pl WARNING:  Phase 6
>> 2021-07-30 15:35:31.810 +05 [25987] 002_cic_2pc.pl WARNING:  XXX: VirtualXactLock vxid 6/2166
>> 2021-07-30 15:35:31.810 +05 [25987] 002_cic_2pc.pl WARNING:  XXX: VirtualXactLock vxid -1/6767
>> 2021-07-30 15:35:31.810 +05 [25987] 002_cic_2pc.pl WARNING:  XXX: VirtualXactLock vxid -1/6764
>> 2021-07-30 15:35:31.810 +05 [25987] 002_cic_2pc.pl WARNING:  XXX: VirtualXactLock vxid -1/6765
>> 2021-07-30 15:35:31.811 +05 [25987] 002_cic_2pc.pl WARNING:  Phase Final
>> 2021-07-30 15:35:31.811 +05 [25987] 002_cic_2pc.pl LOG:  statement: SELECT bt_index_check('idx',true);
>> 2021-07-30 15:35:31.813 +05 [25987] 002_cic_2pc.pl ERROR:  heap tuple (46,16) from table "tbl" lacks matching index
tuplewithin index "idx" xmin 6751 xmax 0 
>
> I see a failure, too.  Once again, "i:" lines are events within the INSERT
> backend, and "r:" lines are events within the REINDEX CONCURRENTLY backend:
>
> r: Phase 2 begins.
> i: INSERT.  Start PREPARE.
> r: Phase 2 commits indisready=t for idx_ccnew.
> r: Start waiting for the INSERT to finish.
> i: PREPARE finishes.
> r: Wake up and start validate_index().  This is a problem.  It needed to wait
>   for COMMIT PREPARED to finish.
I'l investigate this scenario. I've tried to sprinkle some more WaitForLockersMultiple() yet without success.

> This may have a different explanation than the failure you saw, because my
> INSERT transaction already had a permanent XID before the start of phase 3.  I
> won't have time to study this further in the next several days.  Can you find
> out where things go wrong?
I'll try. This bug is #1 priority for me. We repack ~pb of indexes each weekend (only bloated, many in fact are
bloated).And seems like they all are endangered. 

>  The next thing I would study is VirtualXactLock(),
> specifically what happens if the lock holder is a normal backend (with or
> without an XID) when VirtualXactLock() starts but becomes a prepared
> transaction (w/ different PGPROC) before VirtualXactLock() ends.

PreparedXactLock() will do the trick. If we have xid - we always take a lock on xid. If we have vxid - we try to
convertit to xid and look in all PGPROCs for 2PCs. And then again - wait for xid. 
At this point I'm certain that if any transaction is reported by GetLockConflicts() it will get awaited by
VirtualXactLock().
The problem is that rogue transaction was never reported by GetLockConflicts().

Thanks!

Best regards, Andrey Borodin.


Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Peter Geoghegan
Date:
On Sat, Jul 31, 2021 at 11:37 AM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
> I'll try. This bug is #1 priority for me. We repack ~pb of indexes each weekend (only bloated, many in fact are
bloated).And seems like they all are endangered.
 

I have a tip, which might make life a little easier: maybe use rr for this? See:


https://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD#Recording_Postgres_using_rr_Record_and_Replay_Framework

Having a stable recording with fixed logical times *vastly* simplifies
debugging in practice. I found this very useful when debugging the SSI
Jepsen bug, which was a complex race condition that could only be
reproduced through a complex procedure. Which is also what we have
here now, more or less. Your personal notes on the bug can sensibly
track specific pointer values, logical times, etc.

You can even "pack" a recording, making it into a self-contained thing
that doesn't rely on having the original binaries. This makes the
recording stable enough to keep for weeks or months, share with other
people, etc.

-- 
Peter Geoghegan



Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Andrey Borodin
Date:

> 31 июля 2021 г., в 13:51, Peter Geoghegan <pg@bowt.ie> написал(а):
>
> maybe use rr for this?
Thanks, Peter!
I'll give it a try.

Currently I observe that during PrepareTransaction() there might be a moment when the transaction is not reported by
GetLockConflicts()with relevant lock. 

I observe that PrepareTransaction() for violating xid is somewhere between
CallXactCallbacks(XACT_EVENT_PREPARE);
and
PostPrepare_PgStat();

And GetLockConflicts() fails to detect this transaction.

Thanks!

Best regards, Andrey Borodin.


Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Andrey Borodin
Date:

> 31 июля 2021 г., в 13:51, Peter Geoghegan <pg@bowt.ie> написал(а):
>
> maybe use rr for this?

rr helps to explain why something happened.
But how to get why something did not happen?
During GetLockConflicts() proclocks SHMQueue do not include locks of concurrently running xid within
PrepareTransaction()[seems like it's with PostPrepare_locks(), but I'm not sure]. 
The problem is reproducible within 1 second by the script leading to amcheck-detected corruption.

I'd appreciate one more hint...
Thanks!

Best regards, Andrey Borodin.


Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Andrey Borodin
Date:

> 6 авг. 2021 г., в 23:03, Andrey Borodin <x4mmm@yandex-team.ru> написал(а):
>
> But how to get why something did not happen?
> During GetLockConflicts() proclocks SHMQueue do not include locks of concurrently running xid within
PrepareTransaction()[seems like it's with PostPrepare_locks(), but I'm not sure]. 
> The problem is reproducible within 1 second by the script leading to amcheck-detected corruption.

Ok, finally I've figured out this charade.

AtPrepare_Locks() transfers fast-path locks to normal locks pointing to PGPROC with xid in a process of conversion to
2PC.
ProcArrayClearTransaction(MyProc) resets xid int PGROCs.
GetLockConflicts() sees empty xid and vxid.
PostPrepare_Locks(xid) hids things with new PGPROC for 2PC.

PFA PoC fix. But I'm entirely not sure it's OK to do reorder things at PrepareTransaction() this way.

Thanks!

Best regards, Andrey Borodin.


Attachment

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Andrey Borodin
Date:

> 7 авг. 2021 г., в 20:33, Andrey Borodin <x4mmm@yandex-team.ru> написал(а):
>
> <v9-0001-Introduce-TAP-test-for-2PC-with-CIC-behavior.patch>

> <v9-0004-PoC-fix-clear-xid.patch>

> <v9-0003-Fix-CREATE-INDEX-CONCURRENTLY-in-precence-of-vxid.patch>

> <v9-0002-PoC-fix-for-race-in-RelationBuildDesc-and-relcach.patch>

Here's v10.
Changes:
1. Added assert in step 2 (fix for missed invalidation message). I wonder how deep possibly could be
RelationBuildDesc()inside RelationBuildDesc() inside RelationBuildDesc() ... ? If the depth is unlimited we, probably,
needa better data structure. 
2. Editorialised step 3 (vxid->xid lookup). Fixing typos and some small bugs.

Tomorrow I'll try to cleanup step 1 (tap tests).

Thanks!

Best regards, Andrey Borodin.

Attachment

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Noah Misch
Date:
On Sun, Aug 08, 2021 at 12:00:55AM +0500, Andrey Borodin wrote:
> Changes:
> 1. Added assert in step 2 (fix for missed invalidation message). I wonder how deep possibly could be
RelationBuildDesc()inside RelationBuildDesc() inside RelationBuildDesc() ... ? If the depth is unlimited we, probably,
needa better data structure.
 

I don't know either, hence that quick data structure to delay the question.
debug_discard_caches=3 may help answer the question.  RelationBuildDesc()
reads pg_constraint, which is !rd_isnailed.  Hence, I expect one can at least
get RelationBuildDesc("pg_constraint") inside RelationBuildDesc("user_table").

> Tomorrow I'll try to cleanup step 1 (tap tests).

For the final commits, I would like to have two commits.  The first will fix
the non-2PC bug and add the test for that fix.  The second will fix the 2PC
bug and add the additional test.  Feel free to structure your changes into
more patch files, so long as it's straightforward to squash them into two
commits that way.

A few things I noticed earlier:

> --- a/contrib/amcheck/Makefile
> +++ b/contrib/amcheck/Makefile
> @@ -12,7 +12,7 @@ PGFILEDESC = "amcheck - function for verifying relation integrity"
>  
>  REGRESS = check check_btree check_heap
>  
> -TAP_TESTS = 1
> +TAP_TESTS = 2

TAP_TESTS is a Boolean flag, not a count.  Leave it set to 1.

> --- /dev/null
> +++ b/contrib/amcheck/t/002_cic_2pc.pl

> +$node->append_conf('postgresql.conf', 'lock_timeout = 5000');

> +my $main_timer = IPC::Run::timeout(5);

> +my $reindex_timer = IPC::Run::timeout(5);

Values this short cause spurious buildfarm failures.  Generally, use 180s for
timeouts that do not expire in a successful run.

> +# Run background pgbench with CIC. We cannot mix-in this script into single pgbench:
> +# CIC will deadlock with itself occasionally.

Consider fixing that by taking an advisory lock before running CIC.  However,
if use of separate pgbench executions works better, feel free to keep that.

> +pgbench(
> +    '--no-vacuum --client=5 --time=1',

Consider using a transaction count instead of --time.  That way, slow machines
still catch the bug, and fast machines waste less time.  For the "concurrent
OID generation" test, I tuned the transaction count so the test failed about
half the time[1] on a buggy build.

> +            \set txid random(1, 1000000000)
> +            BEGIN;
> +            INSERT INTO tbl VALUES(0);
> +            PREPARE TRANSACTION 'tx:txid';

Try "PREPARE TRANSACTION 'c:client_id'" (or c:client_id:txid) to eliminate the
chance of collision.


[1] https://postgr.es/m/20160215171129.GA347322@tornado.leadboat.com



Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Andrey Borodin
Date:

> 8 авг. 2021 г., в 03:19, Noah Misch <noah@leadboat.com> написал(а):
>
> On Sun, Aug 08, 2021 at 12:00:55AM +0500, Andrey Borodin wrote:
>> Changes:
>> 1. Added assert in step 2 (fix for missed invalidation message). I wonder how deep possibly could be
RelationBuildDesc()inside RelationBuildDesc() inside RelationBuildDesc() ... ? If the depth is unlimited we, probably,
needa better data structure. 
>
> I don't know either, hence that quick data structure to delay the question.
> debug_discard_caches=3 may help answer the question.  RelationBuildDesc()
> reads pg_constraint, which is !rd_isnailed.  Hence, I expect one can at least
> get RelationBuildDesc("pg_constraint") inside RelationBuildDesc("user_table").
I've toyed around with
$node->append_conf('postgresql.conf', 'debug_invalidate_system_caches_always = 3'); in test.
I had failures only at most with
Assert(in_progress_offset < 4);
See [0] for stack trace. But I do not think that it proves that deeper calls are impossible with other DB schemas.

>> Tomorrow I'll try to cleanup step 1 (tap tests).
>
> For the final commits, I would like to have two commits.  The first will fix
> the non-2PC bug and add the test for that fix.  The second will fix the 2PC
> bug and add the additional test.  Feel free to structure your changes into
> more patch files, so long as it's straightforward to squash them into two
> commits that way.
Done so.
Step 1. Test for CIC with regular transactions.
Step 2. Fix
Step 3. Test for CIC with 2PC
Step 4. Part of the fix that I'm sure about
Step 5. Dubious part of fix...

I had to refactor subs pgbench and pgbench_background to be reused among tests. Though I did not refactor pgbench tests
touse this functions: they have two different versions for sub pgbench. 

> A few things I noticed earlier:
>
>> --- a/contrib/amcheck/Makefile
>> +++ b/contrib/amcheck/Makefile
>> @@ -12,7 +12,7 @@ PGFILEDESC = "amcheck - function for verifying relation integrity"
>>
>> REGRESS = check check_btree check_heap
>>
>> -TAP_TESTS = 1
>> +TAP_TESTS = 2
>
> TAP_TESTS is a Boolean flag, not a count.  Leave it set to 1.
Fixed.

>> --- /dev/null
>> +++ b/contrib/amcheck/t/002_cic_2pc.pl
>
>> +$node->append_conf('postgresql.conf', 'lock_timeout = 5000');
>
>> +my $main_timer = IPC::Run::timeout(5);
>
>> +my $reindex_timer = IPC::Run::timeout(5);
>
> Values this short cause spurious buildfarm failures.  Generally, use 180s for
> timeouts that do not expire in a successful run.
Fixed.

>> +# Run background pgbench with CIC. We cannot mix-in this script into single pgbench:
>> +# CIC will deadlock with itself occasionally.
>
> Consider fixing that by taking an advisory lock before running CIC.  However,
> if use of separate pgbench executions works better, feel free to keep that.
I've tried this approach. Advisory lock owns vxid, thus deadlocking with CIC.

>> +pgbench(
>> +    '--no-vacuum --client=5 --time=1',
>
> Consider using a transaction count instead of --time.  That way, slow machines
> still catch the bug, and fast machines waste less time.  For the "concurrent
> OID generation" test, I tuned the transaction count so the test failed about
> half the time[1] on a buggy build.
I've tuned tests for my laptop to observe probabilistic test to pass sometimes.

>> +            \set txid random(1, 1000000000)
>> +            BEGIN;
>> +            INSERT INTO tbl VALUES(0);
>> +            PREPARE TRANSACTION 'tx:txid';
>
> Try "PREPARE TRANSACTION 'c:client_id'" (or c:client_id:txid) to eliminate the
> chance of collision.
Fixed.

How to you think, do we have a chance to fix things before next release on August 12th?

Thanks!

Best regards, Andrey Borodin.

Attachment

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Noah Misch
Date:
On Sun, Aug 08, 2021 at 04:31:07PM +0500, Andrey Borodin wrote:
> > 8 авг. 2021 г., в 03:19, Noah Misch <noah@leadboat.com> написал(а):
> > On Sun, Aug 08, 2021 at 12:00:55AM +0500, Andrey Borodin wrote:
> >> Changes:
> >> 1. Added assert in step 2 (fix for missed invalidation message). I wonder how deep possibly could be
RelationBuildDesc()inside RelationBuildDesc() inside RelationBuildDesc() ... ? If the depth is unlimited we, probably,
needa better data structure.
 
> > 
> > I don't know either, hence that quick data structure to delay the question.
> > debug_discard_caches=3 may help answer the question.  RelationBuildDesc()
> > reads pg_constraint, which is !rd_isnailed.  Hence, I expect one can at least
> > get RelationBuildDesc("pg_constraint") inside RelationBuildDesc("user_table").
> I've toyed around with
> $node->append_conf('postgresql.conf', 'debug_invalidate_system_caches_always = 3'); in test.
> I had failures only at most with
> Assert(in_progress_offset < 4);
> See [0] for stack trace. But I do not think that it proves that deeper calls are impossible with other DB schemas.

I didn't find the [0] link in your message; can you send it again?  I suspect
no rel can appear more than once in the stack, and all but one rel will be a
system catalog.  That said, dynamically growing the array is reasonable, even
if a small maximum depth theoretically exists.

> Step 1. Test for CIC with regular transactions.
> Step 2. Fix
> Step 3. Test for CIC with 2PC
> Step 4. Part of the fix that I'm sure about
> Step 5. Dubious part of fix...

> How to you think, do we have a chance to fix things before next release on August 12th?

No.  At a minimum, we still need to convince ourselves that step 5 is correct.
Even if that were already done, commits to released branches get more cautious
in the few days before the wrap date (tomorrow).  Once it's committed, you
could contact pgsql-release@postgresql.org to propose an out-of-cycle release.



Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Noah Misch
Date:
On Sun, Aug 08, 2021 at 09:37:52AM -0700, Noah Misch wrote:
> On Sun, Aug 08, 2021 at 04:31:07PM +0500, Andrey Borodin wrote:
> > Step 1. Test for CIC with regular transactions.
> > Step 2. Fix
> > Step 3. Test for CIC with 2PC
> > Step 4. Part of the fix that I'm sure about
> > Step 5. Dubious part of fix...

When I applied all five to commit b33259e^ (shortly before the PostgresNode
compatibility break) and ran your tests in a loop, I got more 2PC "lacks
matching index tuple" within one minute.  Here's how I ran the tests:

  make -j20 && while make -C contrib/amcheck check REGRESS= PROVE_FLAGS=--timer PROVE_TESTS='t/002_cic.pl
t/003_cic_2pc.pl'NO_TEMP_INSTALL=1; do date; done
 

Do you see failures with that loop?  If so, can you diagnose them?  (So far,
I've not seen a failure from the 1PC test.)

On Thu, Jul 29, 2021 at 10:25:48PM -0400, Noah Misch wrote:
> I haven't checked whether other inval message types have the same hazard.

I will look into this next.  (It probably doesn't explain 2PC failures.)



Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Noah Misch
Date:
On Sat, Aug 14, 2021 at 03:13:28PM -0700, Noah Misch wrote:
> On Sun, Aug 08, 2021 at 09:37:52AM -0700, Noah Misch wrote:
> > On Sun, Aug 08, 2021 at 04:31:07PM +0500, Andrey Borodin wrote:
> > > Step 1. Test for CIC with regular transactions.
> > > Step 2. Fix
> > > Step 3. Test for CIC with 2PC
> > > Step 4. Part of the fix that I'm sure about
> > > Step 5. Dubious part of fix...
> 
> When I applied all five to commit b33259e^ (shortly before the PostgresNode
> compatibility break) and ran your tests in a loop, I got more 2PC "lacks
> matching index tuple" within one minute.  Here's how I ran the tests:
> 
>   make -j20 && while make -C contrib/amcheck check REGRESS= PROVE_FLAGS=--timer PROVE_TESTS='t/002_cic.pl
t/003_cic_2pc.pl'NO_TEMP_INSTALL=1; do date; done
 
> 
> Do you see failures with that loop?  If so, can you diagnose them?

See below.

> (So far, I've not seen a failure from the 1PC test.)

I eventually did see one.  Just one 1PC failure in six hours of 1PC test
runtime, though.

> On Thu, Jul 29, 2021 at 10:25:48PM -0400, Noah Misch wrote:
> > I haven't checked whether other inval message types have the same hazard.
> 
> I will look into this next.  (It probably doesn't explain 2PC failures.)

Now that I've looked, other inval hazards might indeed explain the remaining
2PC and 1PC failures.  I'll plug those holes and see what failures, if any,
continue to happen.



Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Andrey Borodin
Date:

> 15 авг. 2021 г., в 13:45, Noah Misch <noah@leadboat.com> написал(а):
>>
>> Do you see failures with that loop?  If so, can you diagnose them?
>
> See below.
I do not observe failure on my laptop, though reproduced it on a linux server.
I've fixed one bug in TwoPhaseGetXidByVXid(). Also rebased on actual master.

>
>> (So far, I've not seen a failure from the 1PC test.)
>
> I eventually did see one.  Just one 1PC failure in six hours of 1PC test
> runtime, though.
I've attached a patch that reproduces the problem in 30sec on my server.


Thanks!

Best regards, Andrey Borodin.


Attachment

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Andres Freund
Date:
Hi,

On 2021-08-15 16:09:37 +0500, Andrey Borodin wrote:
> From 929736512ebf8eb9ac6ddaaf49b9e6148d72cfbb Mon Sep 17 00:00:00 2001
> From: Andrey Borodin <amborodin@acm.org>
> Date: Fri, 30 Jul 2021 14:40:16 +0500
> Subject: [PATCH v12 2/6] PoC fix for race in RelationBuildDesc() and relcache
>  invalidation
> 
> ---
>  src/backend/utils/cache/relcache.c | 28 ++++++++++++++++++++++++++++
>  1 file changed, 28 insertions(+)
> 
> diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
> index 13d9994af3..7eec7b1f41 100644
> --- a/src/backend/utils/cache/relcache.c
> +++ b/src/backend/utils/cache/relcache.c
> @@ -997,9 +997,16 @@ equalRSDesc(RowSecurityDesc *rsdesc1, RowSecurityDesc *rsdesc2)
>   *        (suggesting we are trying to access a just-deleted relation).
>   *        Any other error is reported via elog.
>   */
> +typedef struct InProgressRels {
> +    Oid relid;
> +    bool invalidated;
> +} InProgressRels;
> +static InProgressRels inProgress[100];
> +
>  static Relation
>  RelationBuildDesc(Oid targetRelId, bool insertIt)
>  {
> +    int in_progress_offset;
>      Relation    relation;
>      Oid            relid;
>      HeapTuple    pg_class_tuple;
> @@ -1033,6 +1040,14 @@ RelationBuildDesc(Oid targetRelId, bool insertIt)
>      }
>  #endif
>  
> +    for (in_progress_offset = 0;
> +         OidIsValid(inProgress[in_progress_offset].relid);
> +         in_progress_offset++)
> +        ;
> +    inProgress[in_progress_offset].relid = targetRelId;
> +retry:
> +    inProgress[in_progress_offset].invalidated = false;
> +
>      /*
>       * find the tuple in pg_class corresponding to the given relation id
>       */
> @@ -1213,6 +1228,12 @@ RelationBuildDesc(Oid targetRelId, bool insertIt)
>       */
>      heap_freetuple(pg_class_tuple);
>  
> +    if (inProgress[in_progress_offset].invalidated)
> +        goto retry;                /* TODO free old one */
> +    /* inProgress is in fact the stack, we can safely remove it's top */
> +    inProgress[in_progress_offset].relid = InvalidOid;
> +    Assert(inProgress[in_progress_offset + 1].relid == InvalidOid);
> +
>      /*
>       * Insert newly created relation into relcache hash table, if requested.
>       *
> @@ -2805,6 +2826,13 @@ RelationCacheInvalidateEntry(Oid relationId)
>          relcacheInvalsReceived++;
>          RelationFlushRelation(relation);
>      }
> +    else
> +    {
> +        int i;
> +        for (i = 0; OidIsValid(inProgress[i].relid); i++)
> +            if (inProgress[i].relid == relationId)
> +                inProgress[i].invalidated = true;
> +    }
>  }

Desperately needs comments. Without a commit message and without
comments it's hard to review this without re-reading the entire thread -
which approximately nobody will do.


> From 7e47dae2828d88ddb2161fda0c3b08a158c6cf37 Mon Sep 17 00:00:00 2001
> From: Andrey Borodin <amborodin@acm.org>
> Date: Sat, 7 Aug 2021 20:27:14 +0500
> Subject: [PATCH v12 5/6] PoC fix clear xid
> 
> ---
>  src/backend/access/transam/xact.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
> index 387f80419a..9b19d939eb 100644
> --- a/src/backend/access/transam/xact.c
> +++ b/src/backend/access/transam/xact.c
> @@ -2500,7 +2500,6 @@ PrepareTransaction(void)
>       * done *after* the prepared transaction has been marked valid, else
>       * someone may think it is unlocked and recyclable.
>       */
> -    ProcArrayClearTransaction(MyProc);
>  
>      /*
>       * In normal commit-processing, this is all non-critical post-transaction
> @@ -2535,6 +2534,8 @@ PrepareTransaction(void)
>      PostPrepare_MultiXact(xid);
>  
>      PostPrepare_Locks(xid);
> +
> +    ProcArrayClearTransaction(MyProc);
>      PostPrepare_PredicateLocks(xid);

The comment above ProcArrayClearTransaction would need to be moved and
updated...

> From 6db9cafd146db1a645bb6885157b0e1f032765e0 Mon Sep 17 00:00:00 2001
> From: Andrey Borodin <amborodin@acm.org>
> Date: Mon, 19 Jul 2021 11:50:02 +0500
> Subject: [PATCH v12 4/6] Fix CREATE INDEX CONCURRENTLY in precence of vxids
>  converted to 2pc
...

> +/*
> + * TwoPhaseGetXidByVXid
> + *        Try to lookup for vxid among prepared xacts
> + */
> +XidListEntry
> +TwoPhaseGetXidByVXid(VirtualTransactionId vxid)
> +{
> +    int                i;
> +    XidListEntry    result;
> +    result.next = NULL;
> +    result.xid = InvalidTransactionId;
> +
> +    LWLockAcquire(TwoPhaseStateLock, LW_SHARED);
> +
> +    for (i = 0; i < TwoPhaseState->numPrepXacts; i++)
> +    {
> +        GlobalTransaction    gxact = TwoPhaseState->prepXacts[i];
> +        PGPROC               *proc = &ProcGlobal->allProcs[gxact->pgprocno];
> +        VirtualTransactionId proc_vxid;
> +        GET_VXID_FROM_PGPROC(proc_vxid, *proc);
> +
> +        if (VirtualTransactionIdEquals(vxid, proc_vxid))
> +        {
> +            if (result.xid != InvalidTransactionId)
> +            {
> +                /* Already has candidate - need to alloc some space */
> +                XidListEntry *copy = palloc(sizeof(XidListEntry));
> +                copy->next = result.next;
> +                copy->xid = result.xid;
> +                result.next = copy;
> +            }
> +            result.xid = gxact->xid;
> +        }
> +    }
> +
> +    LWLockRelease(TwoPhaseStateLock);
> +
> +    return result;
> +}

Dynamic memory allocations while holding a fairly central lock - one
which is now going to be more contended - doesn't seem great.

Is the memory context this is called in guaranteed to be of a proper
duration?  Including being reset in case there's an error at some point
before the memory is freed?


> +/*
> + *        WaitXact
> + *
> + *        Wait for xid completition if have xid. Otherwise try to find xid among
> + *        two-phase procarray entries.
> + */
> +static bool WaitXact(VirtualTransactionId vxid, TransactionId xid, bool wait)
> +{
> +    LockAcquireResult    lar;
> +    LOCKTAG                tag;
> +    XidListEntry        xidlist;
> +    XidListEntry       *xidlist_ptr = NULL; /* pointer to TwoPhaseGetXidByVXid()s pallocs */
> +    bool                result;
> +
> +    if (TransactionIdIsValid(xid))
> +    {
> +        /* We know exact xid - no need to search in 2PC state */
> +        xidlist.xid = xid;
> +        xidlist.next = NULL;
> +    }
> +    else
> +    {
> +        /* You cant have vxid among 2PCs if you have no 2PCs */
> +        if (max_prepared_xacts == 0)
> +            return true;
> +
> +        /*
> +         * We must search for vxids in 2pc state
> +         * XXX: O(N*N) complexity where N is number of prepared xacts
> +         */
> +        xidlist = TwoPhaseGetXidByVXid(vxid);
> +        /* Return if transaction is gone entirely */
> +        if (!TransactionIdIsValid(xidlist.xid))
> +            return true;
> +        xidlist_ptr = xidlist.next;
> +    }

Perhaps I missed this - but won't we constantly enter this path for
non-2pc transactions? E.g.

> @@ -4573,7 +4649,7 @@ VirtualXactLock(VirtualTransactionId vxid, bool wait)
>       */
>      proc = BackendIdGetProc(vxid.backendId);
>      if (proc == NULL)
> -        return true;
> +        return WaitXact(vxid, InvalidTransactionId, wait);
>  
>      /*
>       * We must acquire this lock before checking the backendId and lxid
> @@ -4587,9 +4663,12 @@ VirtualXactLock(VirtualTransactionId vxid, bool wait)
>          || proc->fpLocalTransactionId != vxid.localTransactionId)
>      {
>          LWLockRelease(&proc->fpInfoLock);
> -        return true;
> +        return WaitXact(vxid, InvalidTransactionId, wait);
>      }

It seems like it's going to add a substantial amount of work even when
no 2PC xacts are involved?


> diff --git a/src/include/access/twophase.h b/src/include/access/twophase.h
> index e27e1a8fe8..a5f28d3a80 100644
> --- a/src/include/access/twophase.h
> +++ b/src/include/access/twophase.h
> @@ -25,6 +25,17 @@
>   */
>  typedef struct GlobalTransactionData *GlobalTransaction;
>  
> +/* 
> + * XidListEntry is expected to be used as list very rarely. Under normal
> + * circumstances TwoPhaseGetXidByVXid() returns only one xid.
> + * But under certain conditions can return many xids or nothing.
> + */
> +typedef struct XidListEntry
> +{
> +    TransactionId xid;
> +    struct XidListEntry* next;
> +} XidListEntry;

I don't think we should invent additional ad-hoc list types.

Greetings,

Andres Freund



Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Noah Misch
Date:
On Sun, Aug 15, 2021 at 04:09:37PM +0500, Andrey Borodin wrote:
> > 15 авг. 2021 г., в 13:45, Noah Misch <noah@leadboat.com> написал(а):
> >> Do you see failures with that loop?  If so, can you diagnose them?

> I do not observe failure on my laptop, though reproduced it on a linux server.
> I've fixed one bug in TwoPhaseGetXidByVXid(). Also rebased on actual master.

> > Just one 1PC failure in six hours of 1PC test runtime, though.
> I've attached a patch that reproduces the problem in 30sec on my server.

Having studied the broader inval situation, I found just one additional gap
that seemed potentially relevant.  It didn't stop the failures under current
tests, however.  The attached patch replaces my last patch on this thread, so
it should replace
v12-0002-PoC-fix-for-race-in-RelationBuildDesc-and-relcac.patch in your
series.  (Like its predecessor, it's an unfinished proof-of-concept.)

With v12, on my machine, the same loop took 2000s to get three failures, both
of which were in the 1PC tests.  I ran out of time to study the failure
mechanism.  Would you diagnose what happens when it fails on your server?
Also see the larger review from Andres.

Attachment

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Andrey Borodin
Date:
Andres, Noah, thanks for the review.

> 16 авг. 2021 г., в 10:13, Noah Misch <noah@leadboat.com> написал(а):
>
> With v12, on my machine, the same loop took 2000s to get three failures, both
> of which were in the 1PC tests.  I ran out of time to study the failure
> mechanism.  Would you diagnose what happens when it fails on your server?
I'm still in progress of investigation. With your fix for RelationBuildDesc() invals my reproduction is also quite slow
-I get error within 10 minutes or so. 

I've observed few times that same Xid was WaitXact()'es twice. Is it possible that taking ShareLock on running xid
(takenfrom PGPGROC of vxid) is not a good way to wait for transaction completition? 
Is it possible that xid is in PGPROC before backend acquires its own lock on xid?


> Also see the larger review from Andres.

Sure, I will address Andres's notes as long as patches will finally work without errors. I agree with the idea of not
inventingdynamic list and other pointed problems. 

Best regards, Andrey Borodin.


Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Andrey Borodin
Date:

> 22 авг. 2021 г., в 17:30, Andrey Borodin <x4mmm@yandex-team.ru> написал(а):
>
> I've observed few times that same Xid was WaitXact()'es twice. Is it possible that taking ShareLock on running xid
(takenfrom PGPGROC of vxid) is not a good way to wait for transaction completition? 
> Is it possible that xid is in PGPROC before backend acquires its own lock on xid?

Oh, that's it. We first publish xid in PGPROC and only then take a lock in lock manager on it.
Ok, I know how to fix this.


Currently when testing combination of all fixes I observe things like
'error running SQL: 'psql:<stdin>:1: ERROR:  prepared transaction with identifier "a" is busy''
Looks like kind of race condition in tests.

Thanks!

Best regards, Andrey Borodin.


Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Andrey Borodin
Date:

> 22 авг. 2021 г., в 22:42, Andrey Borodin <x4mmm@yandex-team.ru> написал(а):
>
>
> Currently when testing combination of all fixes I observe things like
> 'error running SQL: 'psql:<stdin>:1: ERROR:  prepared transaction with identifier "a" is busy''
> Looks like kind of race condition in tests.

There was a race condition in deterministic 2PC test. Fixed with synchronisation points.
PFA patches that work on my machines.
I'm going to start fixing review notes if it will not break until tomorrow.

BTW are subtransaction anything special wrt CIC? is it worth to sprinkle some SAVEPOINTs here and there, just to be
sure?

Thanks!

Best regards, Andrey Borodin.


Attachment

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Noah Misch
Date:
On Mon, Aug 23, 2021 at 10:38:00PM +0500, Andrey Borodin wrote:
> > 22 авг. 2021 г., в 22:42, Andrey Borodin <x4mmm@yandex-team.ru> написал(а):
> > Currently when testing combination of all fixes I observe things like
> > 'error running SQL: 'psql:<stdin>:1: ERROR:  prepared transaction with identifier "a" is busy''
> > Looks like kind of race condition in tests.
> 
> There was a race condition in deterministic 2PC test. Fixed with synchronisation points.
> PFA patches that work on my machines.
> I'm going to start fixing review notes if it will not break until tomorrow.

That is great news.

> BTW are subtransaction anything special wrt CIC? is it worth to sprinkle some SAVEPOINTs here and there, just to be
sure?

Not especially.  The AssignTransactionId() "Ensure parent(s) have XIDs"
behavior may be important to CIC.  In the test pgbench runs for scripts
002_pgbench_concurrent_2pc and 002_pgbench_concurrent_transaction, it likely
wouldn't hurt to pass two equal-weight test scripts, one of which uses
savepoints.



Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Peter Geoghegan
Date:
On Mon, Aug 23, 2021 at 10:37 PM Noah Misch <noah@leadboat.com> wrote:
> > There was a race condition in deterministic 2PC test. Fixed with synchronisation points.
> > PFA patches that work on my machines.
> > I'm going to start fixing review notes if it will not break until tomorrow.
>
> That is great news.

+1. Great detective work.


-- 
Peter Geoghegan



Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Andrey Borodin
Date:
Hi!

> 15 авг. 2021 г., в 18:45, Andres Freund <andres@anarazel.de> написал(а):
>
> Hi,
>
> On 2021-08-15 16:09:37 +0500, Andrey Borodin wrote:
>> From 929736512ebf8eb9ac6ddaaf49b9e6148d72cfbb Mon Sep 17 00:00:00 2001
>> From: Andrey Borodin <amborodin@acm.org>
>> Date: Fri, 30 Jul 2021 14:40:16 +0500
>> Subject: [PATCH v12 2/6] PoC fix for race in RelationBuildDesc() and relcache
>> invalidation
>>
>> ---
>> src/backend/utils/cache/relcache.c | 28 ++++++++++++++++++++++++++++
>> 1 file changed, 28 insertions(+)
>>
>> diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
>> index 13d9994af3..7eec7b1f41 100644
>> --- a/src/backend/utils/cache/relcache.c
>> +++ b/src/backend/utils/cache/relcache.c
>> @@ -997,9 +997,16 @@ equalRSDesc(RowSecurityDesc *rsdesc1, RowSecurityDesc *rsdesc2)
>>  *        (suggesting we are trying to access a just-deleted relation).
>>  *        Any other error is reported via elog.
>>  */
>> +typedef struct InProgressRels {
>> +    Oid relid;
>> +    bool invalidated;
>> +} InProgressRels;
>> +static InProgressRels inProgress[100];
>> +
>> static Relation
>> RelationBuildDesc(Oid targetRelId, bool insertIt)
>> {
>> +    int in_progress_offset;
>>     Relation    relation;
>>     Oid            relid;
>>     HeapTuple    pg_class_tuple;
>> @@ -1033,6 +1040,14 @@ RelationBuildDesc(Oid targetRelId, bool insertIt)
>>     }
>> #endif
>>
>> +    for (in_progress_offset = 0;
>> +         OidIsValid(inProgress[in_progress_offset].relid);
>> +         in_progress_offset++)
>> +        ;
>> +    inProgress[in_progress_offset].relid = targetRelId;
>> +retry:
>> +    inProgress[in_progress_offset].invalidated = false;
>> +
>>     /*
>>      * find the tuple in pg_class corresponding to the given relation id
>>      */
>> @@ -1213,6 +1228,12 @@ RelationBuildDesc(Oid targetRelId, bool insertIt)
>>      */
>>     heap_freetuple(pg_class_tuple);
>>
>> +    if (inProgress[in_progress_offset].invalidated)
>> +        goto retry;                /* TODO free old one */
>> +    /* inProgress is in fact the stack, we can safely remove it's top */
>> +    inProgress[in_progress_offset].relid = InvalidOid;
>> +    Assert(inProgress[in_progress_offset + 1].relid == InvalidOid);
>> +
>>     /*
>>      * Insert newly created relation into relcache hash table, if requested.
>>      *
>> @@ -2805,6 +2826,13 @@ RelationCacheInvalidateEntry(Oid relationId)
>>         relcacheInvalsReceived++;
>>         RelationFlushRelation(relation);
>>     }
>> +    else
>> +    {
>> +        int i;
>> +        for (i = 0; OidIsValid(inProgress[i].relid); i++)
>> +            if (inProgress[i].relid == relationId)
>> +                inProgress[i].invalidated = true;
>> +    }
>> }
>
> Desperately needs comments. Without a commit message and without
> comments it's hard to review this without re-reading the entire thread -
> which approximately nobody will do.
I've added some comments. But it seems we should use dynamic allocations instead of 100-based array.

>
>
>> From 7e47dae2828d88ddb2161fda0c3b08a158c6cf37 Mon Sep 17 00:00:00 2001
>> From: Andrey Borodin <amborodin@acm.org>
>> Date: Sat, 7 Aug 2021 20:27:14 +0500
>> Subject: [PATCH v12 5/6] PoC fix clear xid
>>
>> ---
>> src/backend/access/transam/xact.c | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
>> index 387f80419a..9b19d939eb 100644
>> --- a/src/backend/access/transam/xact.c
>> +++ b/src/backend/access/transam/xact.c
>> @@ -2500,7 +2500,6 @@ PrepareTransaction(void)
>>      * done *after* the prepared transaction has been marked valid, else
>>      * someone may think it is unlocked and recyclable.
>>      */
>> -    ProcArrayClearTransaction(MyProc);
>>
>>     /*
>>      * In normal commit-processing, this is all non-critical post-transaction
>> @@ -2535,6 +2534,8 @@ PrepareTransaction(void)
>>     PostPrepare_MultiXact(xid);
>>
>>     PostPrepare_Locks(xid);
>> +
>> +    ProcArrayClearTransaction(MyProc);
>>     PostPrepare_PredicateLocks(xid);
>
> The comment above ProcArrayClearTransaction would need to be moved and
> updated...
Fixed.

>
>> From 6db9cafd146db1a645bb6885157b0e1f032765e0 Mon Sep 17 00:00:00 2001
>> From: Andrey Borodin <amborodin@acm.org>
>> Date: Mon, 19 Jul 2021 11:50:02 +0500
>> Subject: [PATCH v12 4/6] Fix CREATE INDEX CONCURRENTLY in precence of vxids
>> converted to 2pc
> ...
>
>> +/*
>> + * TwoPhaseGetXidByVXid
>> + *        Try to lookup for vxid among prepared xacts
>> + */
>> +XidListEntry
>> +TwoPhaseGetXidByVXid(VirtualTransactionId vxid)
>> +{
>> +    int                i;
>> +    XidListEntry    result;
>> +    result.next = NULL;
>> +    result.xid = InvalidTransactionId;
>> +
>> +    LWLockAcquire(TwoPhaseStateLock, LW_SHARED);
>> +
>> +    for (i = 0; i < TwoPhaseState->numPrepXacts; i++)
>> +    {
>> +        GlobalTransaction    gxact = TwoPhaseState->prepXacts[i];
>> +        PGPROC               *proc = &ProcGlobal->allProcs[gxact->pgprocno];
>> +        VirtualTransactionId proc_vxid;
>> +        GET_VXID_FROM_PGPROC(proc_vxid, *proc);
>> +
>> +        if (VirtualTransactionIdEquals(vxid, proc_vxid))
>> +        {
>> +            if (result.xid != InvalidTransactionId)
>> +            {
>> +                /* Already has candidate - need to alloc some space */
>> +                XidListEntry *copy = palloc(sizeof(XidListEntry));
>> +                copy->next = result.next;
>> +                copy->xid = result.xid;
>> +                result.next = copy;
>> +            }
>> +            result.xid = gxact->xid;
>> +        }
>> +    }
>> +
>> +    LWLockRelease(TwoPhaseStateLock);
>> +
>> +    return result;
>> +}
>
> Dynamic memory allocations while holding a fairly central lock - one
> which is now going to be more contended - doesn't seem great.
>
> Is the memory context this is called in guaranteed to be of a proper
> duration?  Including being reset in case there's an error at some point
> before the memory is freed?
I've removed custom list and all memory allocations. If there are multiple 2PCs with same vxid - we just wait for one,
thenretry. 

>> +/*
>> + *        WaitXact
>> + *
>> + *        Wait for xid completition if have xid. Otherwise try to find xid among
>> + *        two-phase procarray entries.
>> + */
>> +static bool WaitXact(VirtualTransactionId vxid, TransactionId xid, bool wait)
>> +{
>> +    LockAcquireResult    lar;
>> +    LOCKTAG                tag;
>> +    XidListEntry        xidlist;
>> +    XidListEntry       *xidlist_ptr = NULL; /* pointer to TwoPhaseGetXidByVXid()s pallocs */
>> +    bool                result;
>> +
>> +    if (TransactionIdIsValid(xid))
>> +    {
>> +        /* We know exact xid - no need to search in 2PC state */
>> +        xidlist.xid = xid;
>> +        xidlist.next = NULL;
>> +    }
>> +    else
>> +    {
>> +        /* You cant have vxid among 2PCs if you have no 2PCs */
>> +        if (max_prepared_xacts == 0)
>> +            return true;
>> +
>> +        /*
>> +         * We must search for vxids in 2pc state
>> +         * XXX: O(N*N) complexity where N is number of prepared xacts
>> +         */
>> +        xidlist = TwoPhaseGetXidByVXid(vxid);
>> +        /* Return if transaction is gone entirely */
>> +        if (!TransactionIdIsValid(xidlist.xid))
>> +            return true;
>> +        xidlist_ptr = xidlist.next;
>> +    }
>
> Perhaps I missed this - but won't we constantly enter this path for
> non-2pc transactions? E.g.
I've restored heuristics that if it's not 2PC - we just exit from WaitPreparedXact().


>
>> @@ -4573,7 +4649,7 @@ VirtualXactLock(VirtualTransactionId vxid, bool wait)
>>      */
>>     proc = BackendIdGetProc(vxid.backendId);
>>     if (proc == NULL)
>> -        return true;
>> +        return WaitXact(vxid, InvalidTransactionId, wait);
>>
>>     /*
>>      * We must acquire this lock before checking the backendId and lxid
>> @@ -4587,9 +4663,12 @@ VirtualXactLock(VirtualTransactionId vxid, bool wait)
>>         || proc->fpLocalTransactionId != vxid.localTransactionId)
>>     {
>>         LWLockRelease(&proc->fpInfoLock);
>> -        return true;
>> +        return WaitXact(vxid, InvalidTransactionId, wait);
>>     }
>
> It seems like it's going to add a substantial amount of work even when
> no 2PC xacts are involved?
Only if 2PCs are enabled.

>> diff --git a/src/include/access/twophase.h b/src/include/access/twophase.h
>> index e27e1a8fe8..a5f28d3a80 100644
>> --- a/src/include/access/twophase.h
>> +++ b/src/include/access/twophase.h
>> @@ -25,6 +25,17 @@
>>  */
>> typedef struct GlobalTransactionData *GlobalTransaction;
>>
>> +/*
>> + * XidListEntry is expected to be used as list very rarely. Under normal
>> + * circumstances TwoPhaseGetXidByVXid() returns only one xid.
>> + * But under certain conditions can return many xids or nothing.
>> + */
>> +typedef struct XidListEntry
>> +{
>> +    TransactionId xid;
>> +    struct XidListEntry* next;
>> +} XidListEntry;
>
> I don't think we should invent additional ad-hoc list types.
Fixed, removed list here entirely.

I'm attaching new version. It works fine on my machines.

Thanks!

Best regards, Andrey Borodin.


Attachment

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Andres Freund
Date:
Hi,

On 2021-08-29 12:27:31 +0500, Andrey Borodin wrote:
> > 15 авг. 2021 г., в 18:45, Andres Freund <andres@anarazel.de> написал(а):
> > On 2021-08-15 16:09:37 +0500, Andrey Borodin wrote:
> >> @@ -4573,7 +4649,7 @@ VirtualXactLock(VirtualTransactionId vxid, bool wait)
> >>      */
> >>     proc = BackendIdGetProc(vxid.backendId);
> >>     if (proc == NULL)
> >> -        return true;
> >> +        return WaitXact(vxid, InvalidTransactionId, wait);
> >> 
> >>     /*
> >>      * We must acquire this lock before checking the backendId and lxid
> >> @@ -4587,9 +4663,12 @@ VirtualXactLock(VirtualTransactionId vxid, bool wait)
> >>         || proc->fpLocalTransactionId != vxid.localTransactionId)
> >>     {
> >>         LWLockRelease(&proc->fpInfoLock);
> >> -        return true;
> >> +        return WaitXact(vxid, InvalidTransactionId, wait);
> >>     }
> > 
> > It seems like it's going to add a substantial amount of work even when
> > no 2PC xacts are involved?
> Only if 2PCs are enabled.

I don't think that's good enough. Plenty of systems have 2PC enabled but very
few if any transactions end up as 2PC ones.

Greetings,

Andres Freund



Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Andrey Borodin
Date:

> 29 авг. 2021 г., в 23:09, Andres Freund <andres@anarazel.de> написал(а):
>>>
>>> It seems like it's going to add a substantial amount of work even when
>>> no 2PC xacts are involved?
>> Only if 2PCs are enabled.
>
> I don't think that's good enough. Plenty of systems have 2PC enabled but very
> few if any transactions end up as 2PC ones.

I could not compose significantly better solution. Let's consider what we actually pay for.

GetLockConflicts() returns xids for 2PC and vxids for any other transaction. Then we wait for each other one-by-one.
GetLockConflicts() cannot return xids for regular transactions: these xids may not have xid lock yet. So we have to
workwith vxid. 

Each wait case can resort to set of three possible actions: test/wait for xid (cheap), test/wait for vxid (relatively
cheap),search xid by vxid (expensive). 

When we have to wait for known-2PC we wait on xid (cheap).
When we have vxid:
1. If backend is not on the same vxid, we have to search xid by vxid (expensive). Then wait on obtained xid if any.
2. Either way wait for vxid, then for xid (cheap * 2).

So the really slow part is searching xids for long-gone vxids that may not have a xid at all.
We simply have no good mechanics to say that this vxid did not have a xid, if vxid is long gone.

Best optimisation I can imagine here is to gather all vxids with unknown xids and search for them in one call to
TwoPhaseGetXidByVXid()with one LWLockAcquire(TwoPhaseStateLock, LW_SHARED). 

Does it worth the complexity?

Best regards, Andrey Borodin.


Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Noah Misch
Date:
On Sun, Aug 29, 2021 at 11:38:03PM +0500, Andrey Borodin wrote:
> > 29 авг. 2021 г., в 23:09, Andres Freund <andres@anarazel.de> написал(а):
> >>> It seems like it's going to add a substantial amount of work even when
> >>> no 2PC xacts are involved?
> >> Only if 2PCs are enabled.
> > 
> > I don't think that's good enough. Plenty of systems have 2PC enabled but very
> > few if any transactions end up as 2PC ones.

> Best optimisation I can imagine here is to gather all vxids with unknown xids and search for them in one call to
TwoPhaseGetXidByVXid()with one LWLockAcquire(TwoPhaseStateLock, LW_SHARED).
 
> 
> Does it worth the complexity?

https://www.postgresql.org/search/?m=1&q=TwoPhaseStateLock&l=&d=-1&s=r
suggests this is the first postgresql.org discussion of TwoPhaseStateLock as a
bottleneck.  Nonetheless, if Andres Freund finds it's worth the complexity,
then I'm content with it.  I'd certainly expect some performance benefit.
Andres, what do you think?

We could start with an unlocked scan of the twophase shared memory.  If the
unlocked scan finds a potential match, acquire TwoPhaseStateLock and repeat
the scan.  Otherwise, we don't need the locked scan, because we can deduce
that the locked scan would find nothing.  I'm not fond of relying on such
reasoning without a known-strong performance need, but it's an alternative.



Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Noah Misch
Date:
On Mon, Aug 23, 2021 at 10:38:00PM +0500, Andrey Borodin wrote:
> --- a/src/backend/access/transam/twophase.c
> +++ b/src/backend/access/transam/twophase.c
> @@ -459,14 +459,15 @@ MarkAsPreparingGuts(GlobalTransaction gxact, TransactionId xid, const char *gid,
>      proc->pgprocno = gxact->pgprocno;
>      SHMQueueElemInit(&(proc->links));
>      proc->waitStatus = PROC_WAIT_STATUS_OK;
> -    /* We set up the gxact's VXID as InvalidBackendId/XID */
> -    proc->lxid = (LocalTransactionId) xid;
> +    /* We set up the gxact's VXID as real for CIC purposes */
> +    proc->lxid = MyProc->lxid;

This breaks the case where the server restarted after PREPARE TRANSACTION.
MyProc->lxid is 0 in the startup process, and LocalTransactionIdIsValid(0) is
false.  I'm attaching a test case addition.  Can you repair this?

>      proc->xid = xid;
>      Assert(proc->xmin == InvalidTransactionId);
>      proc->delayChkpt = false;
>      proc->statusFlags = 0;
>      proc->pid = 0;
> -    proc->backendId = InvalidBackendId;
> +    /* May be backendId of startup process */
> +    proc->backendId = MyBackendId;

Incidentally, MyBackendId of startup process depends on other facts.  When
using hot standby, InitRecoveryTransactionEnvironment() sets MyBackendId=1.
Otherwise, including clean startup of a non-standby node, MyBackendId is
InvalidBackendId.  This may be harmless.  I didn't know about it.

On Tue, Sep 07, 2021 at 11:45:15PM -0700, Noah Misch wrote:
> On Sun, Aug 29, 2021 at 11:38:03PM +0500, Andrey Borodin wrote:
> > > 29 авг. 2021 г., в 23:09, Andres Freund <andres@anarazel.de> написал(а):
> > >>> It seems like it's going to add a substantial amount of work even when
> > >>> no 2PC xacts are involved?
> > >> Only if 2PCs are enabled.
> > > 
> > > I don't think that's good enough. Plenty of systems have 2PC enabled but very
> > > few if any transactions end up as 2PC ones.
> 
> > Best optimisation I can imagine here is to gather all vxids with unknown xids and search for them in one call to
TwoPhaseGetXidByVXid()with one LWLockAcquire(TwoPhaseStateLock, LW_SHARED).
 
> > 
> > Does it worth the complexity?
> 
> https://www.postgresql.org/search/?m=1&q=TwoPhaseStateLock&l=&d=-1&s=r
> suggests this is the first postgresql.org discussion of TwoPhaseStateLock as a
> bottleneck.  Nonetheless, if Andres Freund finds it's worth the complexity,
> then I'm content with it.  I'd certainly expect some performance benefit.
> Andres, what do you think?

A few more benefits (beyond lock contention) come to mind:

- Looking at the three VirtualXactLock() callers, waiting for final
  disposition of prepared transactions is necessary for
  WaitForLockersMultiple(), disadvantageous for WaitForOlderSnapshots(), and
  dead code for ResolveRecoveryConflictWithVirtualXIDs().  In
  WaitForOlderSnapshots(), PREPARE is as good as COMMIT/ABORT, because a
  prepared transaction won't do further database reads.  Waiting on the
  prepared transaction there could give CIC an arbitrarily-long, needless
  delay.  ResolveRecoveryConflictWithVirtualXIDs() will never wait on a
  prepared transaction, because prepared transactions hold no locks during
  recovery.  (If a prepared transaction originally acquired
  AccessExclusiveLock, the startup process holds that lock on its behalf.)
  Coordinating the XID search at a higher layer would let us change
  WaitForLockersMultiple() without changing the others.

- v13 WaitPreparedXact() experiences starvation when a steady stream of
  prepared transactions have the same VXID.  Since VXID reuse entails
  reconnecting, starvation will be unnoticeable in systems that follow best
  practices around connection lifespan.  The 2021-08-23 patch version didn't
  have that hazard.

None of those benefits clearly justify the complexity, but they're relevant to
the decision.

Attachment

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Andrey Borodin
Date:

> 20 сент. 2021 г., в 09:41, Noah Misch <noah@leadboat.com> написал(а):
>
> On Mon, Aug 23, 2021 at 10:38:00PM +0500, Andrey Borodin wrote:
>> --- a/src/backend/access/transam/twophase.c
>> +++ b/src/backend/access/transam/twophase.c
>> @@ -459,14 +459,15 @@ MarkAsPreparingGuts(GlobalTransaction gxact, TransactionId xid, const char *gid,
>>     proc->pgprocno = gxact->pgprocno;
>>     SHMQueueElemInit(&(proc->links));
>>     proc->waitStatus = PROC_WAIT_STATUS_OK;
>> -    /* We set up the gxact's VXID as InvalidBackendId/XID */
>> -    proc->lxid = (LocalTransactionId) xid;
>> +    /* We set up the gxact's VXID as real for CIC purposes */
>> +    proc->lxid = MyProc->lxid;
>
> This breaks the case where the server restarted after PREPARE TRANSACTION.
> MyProc->lxid is 0 in the startup process, and LocalTransactionIdIsValid(0) is
> false.  I'm attaching a test case addition.  Can you repair this?
Yup. Indeed, that's a bug. Root cause it that GetLockConflicts() does not try to extract real xid from gxact's PGPROC,
whilevxid is not valid. 
I see two ways to solve this:
1. Always set valid vxid, but fake 'vxid from xid' for gxact.
2. Teach GetLockConflicts() to use xid if vxid is invalid.
Both ways lead to identical GetLockConflicts() output.
PFA implementation of approach 1.

>>     proc->xid = xid;
>>     Assert(proc->xmin == InvalidTransactionId);
>>     proc->delayChkpt = false;
>>     proc->statusFlags = 0;
>>     proc->pid = 0;
>> -    proc->backendId = InvalidBackendId;
>> +    /* May be backendId of startup process */
>> +    proc->backendId = MyBackendId;
>
> Incidentally, MyBackendId of startup process depends on other facts.  When
> using hot standby, InitRecoveryTransactionEnvironment() sets MyBackendId=1.
> Otherwise, including clean startup of a non-standby node, MyBackendId is
> InvalidBackendId.  This may be harmless.  I didn't know about it.
I think we should avoid using backendId during startup. The backend itself has nothing to do with the transaction.

> On Tue, Sep 07, 2021 at 11:45:15PM -0700, Noah Misch wrote:
>> On Sun, Aug 29, 2021 at 11:38:03PM +0500, Andrey Borodin wrote:
>>>> 29 авг. 2021 г., в 23:09, Andres Freund <andres@anarazel.de> написал(а):
>>>>>> It seems like it's going to add a substantial amount of work even when
>>>>>> no 2PC xacts are involved?
>>>>> Only if 2PCs are enabled.
>>>>
>>>> I don't think that's good enough. Plenty of systems have 2PC enabled but very
>>>> few if any transactions end up as 2PC ones.
>>
>>> Best optimisation I can imagine here is to gather all vxids with unknown xids and search for them in one call to
TwoPhaseGetXidByVXid()with one LWLockAcquire(TwoPhaseStateLock, LW_SHARED). 
>>>
>>> Does it worth the complexity?
>>
>> https://www.postgresql.org/search/?m=1&q=TwoPhaseStateLock&l=&d=-1&s=r
>> suggests this is the first postgresql.org discussion of TwoPhaseStateLock as a
>> bottleneck.  Nonetheless, if Andres Freund finds it's worth the complexity,
>> then I'm content with it.  I'd certainly expect some performance benefit.
>> Andres, what do you think?
>
> A few more benefits (beyond lock contention) come to mind:
>
> - Looking at the three VirtualXactLock() callers, waiting for final
>  disposition of prepared transactions is necessary for
>  WaitForLockersMultiple(), disadvantageous for WaitForOlderSnapshots(), and
>  dead code for ResolveRecoveryConflictWithVirtualXIDs().  In
>  WaitForOlderSnapshots(), PREPARE is as good as COMMIT/ABORT, because a
>  prepared transaction won't do further database reads.  Waiting on the
>  prepared transaction there could give CIC an arbitrarily-long, needless
>  delay.  ResolveRecoveryConflictWithVirtualXIDs() will never wait on a
>  prepared transaction, because prepared transactions hold no locks during
>  recovery.  (If a prepared transaction originally acquired
>  AccessExclusiveLock, the startup process holds that lock on its behalf.)
>  Coordinating the XID search at a higher layer would let us change
>  WaitForLockersMultiple() without changing the others.
BTW WaitForOlderSnapshots() is used in ATExecDetachPartitionFinalize(). Probably, we could indicate to
VirtualXactLock()if we want to wait on 2PC or not. Does it make sense? 
>
>
> - v13 WaitPreparedXact() experiences starvation when a steady stream of
>  prepared transactions have the same VXID.  Since VXID reuse entails
>  reconnecting, starvation will be unnoticeable in systems that follow best
>  practices around connection lifespan.  The 2021-08-23 patch version didn't
>  have that hazard.
I think the probability of such a stream is abysmal. You not only need a stream of 2PC with the same vxid, but a stream
ofoverlapping 2PC with the same vxid. And the most critical thing that can happen - CIC waiting for the stream to
becameone-2PC-at-a-time for a moment. 

Thanks!

Best regards, Andrey Borodin.


Attachment

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Noah Misch
Date:
On Sat, Sep 25, 2021 at 10:25:05PM +0500, Andrey Borodin wrote:
> > 20 сент. 2021 г., в 09:41, Noah Misch <noah@leadboat.com> написал(а):
> > On Mon, Aug 23, 2021 at 10:38:00PM +0500, Andrey Borodin wrote:
> >> --- a/src/backend/access/transam/twophase.c
> >> +++ b/src/backend/access/transam/twophase.c
> >> @@ -459,14 +459,15 @@ MarkAsPreparingGuts(GlobalTransaction gxact, TransactionId xid, const char *gid,
> >>     proc->pgprocno = gxact->pgprocno;
> >>     SHMQueueElemInit(&(proc->links));
> >>     proc->waitStatus = PROC_WAIT_STATUS_OK;
> >> -    /* We set up the gxact's VXID as InvalidBackendId/XID */
> >> -    proc->lxid = (LocalTransactionId) xid;
> >> +    /* We set up the gxact's VXID as real for CIC purposes */
> >> +    proc->lxid = MyProc->lxid;
> > 
> > This breaks the case where the server restarted after PREPARE TRANSACTION.
> > MyProc->lxid is 0 in the startup process, and LocalTransactionIdIsValid(0) is
> > false.  I'm attaching a test case addition.  Can you repair this?
> Yup. Indeed, that's a bug. Root cause it that GetLockConflicts() does not try to extract real xid from gxact's
PGPROC,while vxid is not valid.
 
> I see two ways to solve this:
> 1. Always set valid vxid, but fake 'vxid from xid' for gxact.
> 2. Teach GetLockConflicts() to use xid if vxid is invalid.
> Both ways lead to identical GetLockConflicts() output.
> PFA implementation of approach 1.

That's reasonable.  I'll queue a task to review the rest of the patch.

> > On Tue, Sep 07, 2021 at 11:45:15PM -0700, Noah Misch wrote:
> >> On Sun, Aug 29, 2021 at 11:38:03PM +0500, Andrey Borodin wrote:
> >>>> 29 авг. 2021 г., в 23:09, Andres Freund <andres@anarazel.de> написал(а):
> >>>>>> It seems like it's going to add a substantial amount of work even when
> >>>>>> no 2PC xacts are involved?
> >>>>> Only if 2PCs are enabled.
> >>>> 
> >>>> I don't think that's good enough. Plenty of systems have 2PC enabled but very
> >>>> few if any transactions end up as 2PC ones.
> >> 
> >>> Best optimisation I can imagine here is to gather all vxids with unknown xids and search for them in one call to
TwoPhaseGetXidByVXid()with one LWLockAcquire(TwoPhaseStateLock, LW_SHARED).
 
> >>> 
> >>> Does it worth the complexity?
> >> 
> >> https://www.postgresql.org/search/?m=1&q=TwoPhaseStateLock&l=&d=-1&s=r
> >> suggests this is the first postgresql.org discussion of TwoPhaseStateLock as a
> >> bottleneck.  Nonetheless, if Andres Freund finds it's worth the complexity,
> >> then I'm content with it.  I'd certainly expect some performance benefit.
> >> Andres, what do you think?
> > 
> > A few more benefits (beyond lock contention) come to mind:
> > 
> > - Looking at the three VirtualXactLock() callers, waiting for final
> >  disposition of prepared transactions is necessary for
> >  WaitForLockersMultiple(), disadvantageous for WaitForOlderSnapshots(), and
> >  dead code for ResolveRecoveryConflictWithVirtualXIDs().  In
> >  WaitForOlderSnapshots(), PREPARE is as good as COMMIT/ABORT, because a
> >  prepared transaction won't do further database reads.  Waiting on the
> >  prepared transaction there could give CIC an arbitrarily-long, needless
> >  delay.  ResolveRecoveryConflictWithVirtualXIDs() will never wait on a
> >  prepared transaction, because prepared transactions hold no locks during
> >  recovery.  (If a prepared transaction originally acquired
> >  AccessExclusiveLock, the startup process holds that lock on its behalf.)
> >  Coordinating the XID search at a higher layer would let us change
> >  WaitForLockersMultiple() without changing the others.
> BTW WaitForOlderSnapshots() is used in ATExecDetachPartitionFinalize(). Probably, we could indicate to
VirtualXactLock()if we want to wait on 2PC or not. Does it make sense?
 

For both CIC and ATExecDetachPartitionFinalize(), one needs unlucky timing for
the operation to needlessly wait on a prepared xact.  Specifically, the
needless wait arises if a PREPARE happens while WaitForOlderSnapshots() is
running, after its GetCurrentVirtualXIDs() and before its VirtualXactLock().
I would not choose to "indicate to VirtualXactLock() if we want to wait on 2PC
or not", but code like that wouldn't be too bad.  I probably wouldn't remove
code like that if you chosen to write it.

The alternative I have in mind would work like the following pseudocode.  I'm
leaning toward thinking it's not worth doing, since none of the three benefits
are known to be important.  But maybe it is worth doing.

struct LockConflictData
{
    /* VXIDs seen to have XIDs */
    List *vxid_of_xid;
    /* VXIDs without known XIDs */
    List *vxid_no_xid;
    /*
     * XIDs.  Has one entry per entry in vxid_of_xid, and it may have up to
     * max_prepared_transactions additional entries.
     */
    List *xid;
};
void
WaitForLockersMultiple(List *locktags, LOCKMODE lockmode, bool progress)
{
    struct LockConflictData holders;
    List *latest_xid = NIL;
    List *need_mapping = NIL;
    ListCell   *lc;
    int            total = 0;
    int            done = 0;

    /* Collect the transactions we need to wait on */
    foreach(lc, locktags)
    {
        LOCKTAG    *locktag = lfirst(lc);
        int            count;

        GetLockConflicts(&holders, locktag, lockmode,
                         progress ? &count : NULL);
    }

    /* wait on VXIDs known to have XIDs, and wait on known XIDs */
    foreach(lc, holders.vxid_of_xid)
        VirtualXactLock(lfirst_int(lc), true, NULL);
    foreach(lc, holders.xid)
        XactLockTableWait(lfirst_int(lc), other_params);
    /* wait on remaining VXIDs, possibly discovering more XIDs */
    foreach(lc, holders.vxid_no_xid)
    {
        VirtualTransactionId *v = lfirst(lc);
        TransactionId xid = InvalidTransactionId;
        /*
         * Under this design, VirtualXactLock() would know nothing about 2PC,
         * but it would gain the ability to return proc->xid of the waited
         * proc.  Under this design, VirtualTransactionId is always a local
         * transaction, like it was before commit 8a54e12.
         */
        VirtualXactLock(*v, true, &xid);
        if (TransactionIdIsValid(xid))
            latest_xid = lappend_int(late_xid, xid);
        else
            need_mapping = lappend_int(need_mapping, v);
    }
    /* wait on XIDs just discovered */
    foreach(lc, latest_xid)
        XactLockTableWait(lfirst_int(lc), other_params);
    /*
     * If we never saw an XID associated with a particular VXID, check whether
     * the VXID became a prepared xact.
     */
    latest_xid = TwoPhaseGetXidByVXid(need_mapping);
    foreach(lc, latest_xid)
        XactLockTableWait(lfirst_int(lc), other_params);
}

> > - v13 WaitPreparedXact() experiences starvation when a steady stream of
> >  prepared transactions have the same VXID.  Since VXID reuse entails
> >  reconnecting, starvation will be unnoticeable in systems that follow best
> >  practices around connection lifespan.  The 2021-08-23 patch version didn't
> >  have that hazard.
> I think the probability of such a stream is abysmal. You not only need a stream of 2PC with the same vxid, but a
streamof overlapping 2PC with the same vxid. And the most critical thing that can happen - CIC waiting for the stream
tobecame one-2PC-at-a-time for a moment.
 

You're probably right about that.



Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Noah Misch
Date:
On Sat, Sep 25, 2021 at 01:10:12PM -0700, Noah Misch wrote:
> On Sat, Sep 25, 2021 at 10:25:05PM +0500, Andrey Borodin wrote:
> > > 20 сент. 2021 г., в 09:41, Noah Misch <noah@leadboat.com> написал(а):

> I'll queue a task to review the rest of the patch.

I think the attached version is ready for commit.  Notable differences
vs. v14:

- Made TwoPhaseGetXidByVXid() stop leaking TwoPhaseStateLock when it found a
  second match.

- Instead of moving the ProcArrayClearTransaction() call within
  PrepareTransaction(), move the PostPrepare_Locks() call.  I didn't find any
  bug in the other way, but it's a good principle to maximize similarity with
  CommitTransaction().  PostPrepare_Locks() has no counterpart in
  CommitTransaction(), so that principle is indifferent to moving it.

- inval-build-race-v0.1.patch was incompatible with debug_discard_caches.  The
  being-built relation would always get invalidated, making
  RelationBuildDesc() an infinite loop.  I've fixed this by making
  RelationCacheInvalidate() invalidate in-progress rels only when called for
  sinval overflow, not when called for debug_discard_caches.  This adds some
  function arguments that only matter in assert builds.  That's not great, but
  InvalidateSystemCaches() is expensive anyway.  I considered instead adding
  functions HoldDebugInval() and ResumeDebugInval(), which RelationBuildDesc()
  would use to suppress debug_discard_caches during any iteration after the
  first.  That didn't seem better.

- Discard the in-progress array after an ERROR during RelationBuildDesc().

- Made the relcache.c changes repalloc the list of in-progress rels as needed.

- Changed the background_pgbench args from ($dbname, $stdin, \$stdout, $timer,
  $files, $opts) to ($opts, $files, \$stdout, $timer).  $dbname was unused.
  pgbench doesn't make interesting use of its stdin, so I dropped that arg
  until we have a use case.  $opts and $files seem akin to the $dbname arg of
  background_psql, so I moved them to the front.  I'm not sure that last
  change was an improvement.

- Made 001_pgbench_with_server.pl use PostgresNode::pgbench(), rather than
  duplicate code.  Factored out a subroutine of PostgresNode::pgbench() and
  PostgresNode::background_pgbench().

- Lots of comment and naming changes.

One thing not done here is to change the tests to use CREATE INDEX
CONCURRENTLY instead of REINDEX CONCURRENTLY, so they're back-patchable to v11
and earlier.  I may do that before pushing, or I may just omit the tests from
older branches.

> > > - v13 WaitPreparedXact() experiences starvation when a steady stream of
> > >  prepared transactions have the same VXID.  Since VXID reuse entails
> > >  reconnecting, starvation will be unnoticeable in systems that follow best
> > >  practices around connection lifespan.  The 2021-08-23 patch version didn't
> > >  have that hazard.
> > I think the probability of such a stream is abysmal. You not only need a stream of 2PC with the same vxid, but a
streamof overlapping 2PC with the same vxid. And the most critical thing that can happen - CIC waiting for the stream
tobecame one-2PC-at-a-time for a moment.
 
> 
> You're probably right about that.

I didn't know of the "stateP->nextLXID = nextLocalTransactionId;" in
CleanupInvalidationState(), which indeed makes this all but impossible.

On Sat, Aug 07, 2021 at 03:19:57PM -0700, Noah Misch wrote:
> On Sun, Aug 08, 2021 at 12:00:55AM +0500, Andrey Borodin wrote:
> > Changes:
> > 1. Added assert in step 2 (fix for missed invalidation message). I wonder how deep possibly could be
RelationBuildDesc()inside RelationBuildDesc() inside RelationBuildDesc() ... ? If the depth is unlimited we, probably,
needa better data structure.
 
> 
> I don't know either, hence that quick data structure to delay the question.
> debug_discard_caches=3 may help answer the question.  RelationBuildDesc()
> reads pg_constraint, which is !rd_isnailed.  Hence, I expect one can at least
> get RelationBuildDesc("pg_constraint") inside RelationBuildDesc("user_table").

debug_discard_caches=5 yields a depth of eight when opening a relation having
a CHECK constraint:

my_rel_having_check_constraint
pg_constraint_conrelid_contypid_conname_index
pg_index
pg_constraint
pg_constraint
pg_constraint
pg_constraint
pg_constraint

While debug_discard_caches doesn't permit higher values, I think one could
reach depths greater than eight by, for example, having a number of sessions
invalidate pg_constraint as often as possible.  Hence, I'm glad the code no
longer relies on a depth limit.

Attachment

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Andrey Borodin
Date:

> 17 окт. 2021 г., в 20:12, Noah Misch <noah@leadboat.com> написал(а):
>
> I think the attached version is ready for commit.  Notable differences
> vs. v14:
Few notes:

1. Maybe an Assert(in_progress_list_maxlen) when in_progress_list_maxlen is used?
2.
-#define VirtualTransactionIdIsPreparedXact(vxid) \
+#define VirtualTransactionIdIsRecoveredPreparedXact(vxid) \

This is a very neat transition. Yes, the function argument will always be a xid only for recovered transactions. Maybe
adda comment here that this function is expected to be used only for results of GetLockConflicts()? 

> One thing not done here is to change the tests to use CREATE INDEX
> CONCURRENTLY instead of REINDEX CONCURRENTLY, so they're back-patchable to v11
> and earlier.  I may do that before pushing, or I may just omit the tests from
> older branches.

The tests refactors PostgresNode.pm and some tests. Back-patching this would be quite invasive.

But swapping every "REINDEX INDEX CONCURRENTLY idx;" with
            DROP INDEX CONCURRENTLY idx;
            CREATE INDEX CONCURRENTLY idx on tbl(i);
works.

> <inval-build-race-v1.patch><prepared-transactions-cic-series202107-v15nm.patch>
I've checked that this patches work for some time on my machines. I do not observe failures.

Thanks!

Best regards, Andrey Borodin.


Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Noah Misch
Date:
On Mon, Oct 18, 2021 at 06:23:05PM +0500, Andrey Borodin wrote:
> > 17 окт. 2021 г., в 20:12, Noah Misch <noah@leadboat.com> написал(а):
> > I think the attached version is ready for commit.  Notable differences
> > vs. v14:
> Few notes:
> 
> 1. Maybe an Assert(in_progress_list_maxlen) when in_progress_list_maxlen is used?

RelationCacheInitialize() initializes both in_progress_list_maxlen and the
RelationIdCache hash table, and any ERROR there is promoted to FATAL.  Without
the hash table, nothing good can happen in relcache.  Hence, I think such an
assert is excessive.

> 2. 
> -#define VirtualTransactionIdIsPreparedXact(vxid) \
> +#define VirtualTransactionIdIsRecoveredPreparedXact(vxid) \
> 
> This is a very neat transition. Yes, the function argument will always be a xid only for recovered transactions.
Maybeadd a comment here that this function is expected to be used only for results of GetLockConflicts()?
 

One can use it on any VirtualTransactionId, though some sources only return
values for which this returns false.  It can return true for the result of
GET_VXID_FROM_PGPROC().  I think it can return true for the result of
GetCurrentVirtualXIDs(limitXmin = InvalidTransactionId), but core code doesn't
make such a call.

> > One thing not done here is to change the tests to use CREATE INDEX
> > CONCURRENTLY instead of REINDEX CONCURRENTLY, so they're back-patchable to v11
> > and earlier.  I may do that before pushing, or I may just omit the tests from
> > older branches.
> 
> The tests refactors PostgresNode.pm and some tests. Back-patching this would be quite invasive.

That's fine with me.  Back-patching a fix without its tests is riskier than
back-patching test infrastructure changes.

> But swapping every "REINDEX INDEX CONCURRENTLY idx;" with
>             DROP INDEX CONCURRENTLY idx;
>             CREATE INDEX CONCURRENTLY idx on tbl(i);
> works.
> 
> > <inval-build-race-v1.patch><prepared-transactions-cic-series202107-v15nm.patch>
> I've checked that this patches work for some time on my machines. I do not observe failures.

Good.  Thanks for checking.



Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Noah Misch
Date:
On Mon, Oct 18, 2021 at 08:02:12PM -0700, Noah Misch wrote:
> On Mon, Oct 18, 2021 at 06:23:05PM +0500, Andrey Borodin wrote:
> > > 17 окт. 2021 г., в 20:12, Noah Misch <noah@leadboat.com> написал(а):
> > > I think the attached version is ready for commit.  Notable differences
> > > vs. v14:

Pushed.  Buildfarm member conchuela (DragonFly BSD 6.0) has gotten multiple
"IPC::Run: timeout on timer" in the new tests.  No other animal has.
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=conchuela&dt=2021-10-24%2003%3A05%3A09
is an example run.  The pgbench queries finished quickly, but the
$pgbench_h->finish() apparently timed out after 180s.  I guess this would be
consistent with pgbench blocking in write(), waiting for something to empty a
pipe buffer so it can write more.  I thought finish() will drain any incoming
I/O, though.  This phenomenon has been appearing regularly via
src/test/recovery/t/017_shm.pl[1], so this thread doesn't have a duty to
resolve it.  A stack trace of the stuck pgbench should be informative, though.

Compared to my last post, the push included two more test changes.  I removed
sleeps from a test.  They could add significant time on a system with coarse
sleep granularity.  This did not change test sensitivity on my system.
Second, I changed background_pgbench to include stderr lines in $stdout, as it
had documented.  This becomes important during the back-patch to v11, where
server errors don't cause a nonzero pgbench exit status.  background_psql
still has the same bug, and I can fix it later.  (The background_psql version
of the bug is not affecting current usage.)

FYI, the non-2PC test is less sensitive in older branches.  It reproduces
master's bug in 25-50% of runs, but it took about six minutes on v11 and v12.

> > > One thing not done here is to change the tests to use CREATE INDEX
> > > CONCURRENTLY instead of REINDEX CONCURRENTLY, so they're back-patchable to v11
> > > and earlier.  I may do that before pushing, or I may just omit the tests from
> > > older branches.
> > 
> > The tests refactors PostgresNode.pm and some tests. Back-patching this would be quite invasive.
> 
> That's fine with me.  Back-patching a fix without its tests is riskier than
> back-patching test infrastructure changes.

Back-patching the tests did end up tricky, for other reasons.  Before v12
(d3c09b9), a TAP suite in a pgxs module wouldn't run during check-world.
Before v11 (7f563c0), amcheck lacks the heapallindexed feature that the tests
rely on.  Hence, for v11, v10, and v9.6, I used a plpgsql implementation of
the heapallindexed check, and I moved the tests to src/bin/pgbench.

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=conchuela&dt=2021-10-19%2012%3A58%3A08



Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Andrey Borodin
Date:

> 24 окт. 2021 г., в 08:00, Noah Misch <noah@leadboat.com> написал(а):
>
> On Mon, Oct 18, 2021 at 08:02:12PM -0700, Noah Misch wrote:
>> On Mon, Oct 18, 2021 at 06:23:05PM +0500, Andrey Borodin wrote:
>>>> 17 окт. 2021 г., в 20:12, Noah Misch <noah@leadboat.com> написал(а):
>>>> I think the attached version is ready for commit.  Notable differences
>>>> vs. v14:
>
> Pushed.
Wow, that's great! Thank you!


>  Buildfarm member conchuela (DragonFly BSD 6.0) has gotten multiple
> "IPC::Run: timeout on timer" in the new tests.  No other animal has.
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=conchuela&dt=2021-10-24%2003%3A05%3A09
> is an example run.  The pgbench queries finished quickly, but the
> $pgbench_h->finish() apparently timed out after 180s.  I guess this would be
> consistent with pgbench blocking in write(), waiting for something to empty a
> pipe buffer so it can write more.  I thought finish() will drain any incoming
> I/O, though.  This phenomenon has been appearing regularly via
> src/test/recovery/t/017_shm.pl[1], so this thread doesn't have a duty to
> resolve it.  A stack trace of the stuck pgbench should be informative, though.

Some thoughts:
0. I doubt that psql\pgbench is stuck in these failures.
1. All observed similar failures seem to be related to finish() sub of IPC::Run harness
2. Finish must pump any pending data from process [0]. But it can hang if process is waiting for something.
3. There is reported bug of finish [1]. But the description is slightly different.


>
> Compared to my last post, the push included two more test changes.  I removed
> sleeps from a test.  They could add significant time on a system with coarse
> sleep granularity.  This did not change test sensitivity on my system.
> Second, I changed background_pgbench to include stderr lines in $stdout, as it
> had documented.  This becomes important during the back-patch to v11, where
> server errors don't cause a nonzero pgbench exit status.  background_psql
> still has the same bug, and I can fix it later.  (The background_psql version
> of the bug is not affecting current usage.)
>
> FYI, the non-2PC test is less sensitive in older branches.  It reproduces
> master's bug in 25-50% of runs, but it took about six minutes on v11 and v12.
It seem like loading Relation Descr to relcache becomes more expensive?

>>>> One thing not done here is to change the tests to use CREATE INDEX
>>>> CONCURRENTLY instead of REINDEX CONCURRENTLY, so they're back-patchable to v11
>>>> and earlier.  I may do that before pushing, or I may just omit the tests from
>>>> older branches.
>>>
>>> The tests refactors PostgresNode.pm and some tests. Back-patching this would be quite invasive.
>>
>> That's fine with me.  Back-patching a fix without its tests is riskier than
>> back-patching test infrastructure changes.
>
> Back-patching the tests did end up tricky, for other reasons.  Before v12
> (d3c09b9), a TAP suite in a pgxs module wouldn't run during check-world.
> Before v11 (7f563c0), amcheck lacks the heapallindexed feature that the tests
> rely on.  Hence, for v11, v10, and v9.6, I used a plpgsql implementation of
> the heapallindexed check, and I moved the tests to src/bin/pgbench.
Cool!

Thanks!

Best regards, Andrey Borodin.

[0] https://metacpan.org/dist/IPC-Run/source/lib/IPC/Run.pm#L3481
[1] https://github.com/toddr/IPC-Run/issues/57




conchuela timeouts since 2021-10-09 system upgrade

From
Noah Misch
Date:
On Sun, Oct 24, 2021 at 02:45:38PM +0300, Andrey Borodin wrote:
> > 24 окт. 2021 г., в 08:00, Noah Misch <noah@leadboat.com> написал(а):
> >  Buildfarm member conchuela (DragonFly BSD 6.0) has gotten multiple
> > "IPC::Run: timeout on timer" in the new tests.  No other animal has.
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=conchuela&dt=2021-10-24%2003%3A05%3A09
> > is an example run.  The pgbench queries finished quickly, but the
> > $pgbench_h->finish() apparently timed out after 180s.  I guess this would be
> > consistent with pgbench blocking in write(), waiting for something to empty a
> > pipe buffer so it can write more.  I thought finish() will drain any incoming
> > I/O, though.  This phenomenon has been appearing regularly via
> > src/test/recovery/t/017_shm.pl[1], so this thread doesn't have a duty to
> > resolve it.  A stack trace of the stuck pgbench should be informative, though.
> 
> Some thoughts:
> 0. I doubt that psql\pgbench is stuck in these failures.

Got it.  If pgbench is a zombie, the fault does lie in IPC::Run or the kernel.

> 1. All observed similar failures seem to be related to finish() sub of IPC::Run harness
> 2. Finish must pump any pending data from process [0]. But it can hang if process is waiting for something.
> 3. There is reported bug of finish [1]. But the description is slightly different.

Since that report is about a Perl-process child on Linux, I think we can treat
it as unrelated.

These failures started on 2021-10-09, the day conchuela updated from DragonFly
v4.4.3-RELEASE to DragonFly v6.0.0-RELEASE.  It smells like a kernel bug.
Since the theorized kernel bug seems not to affect
src/test/subscription/t/015_stream.pl, I wonder if we can borrow a workaround
from other tests.  One thing in common with src/test/recovery/t/017_shm.pl and
the newest failure sites is that they don't write anything to the child stdin.
Does writing e.g. a single byte (that the child doesn't use) work around the
problem?  If not, does passing the script via stdin, like "pgbench -f-
<script.sql", work around the problem?

> [0] https://metacpan.org/dist/IPC-Run/source/lib/IPC/Run.pm#L3481
> [1] https://github.com/toddr/IPC-Run/issues/57



Re: conchuela timeouts since 2021-10-09 system upgrade

From
Andrey Borodin
Date:

> 24 окт. 2021 г., в 19:19, Noah Misch <noah@leadboat.com> написал(а):
>
> These failures started on 2021-10-09, the day conchuela updated from DragonFly
> v4.4.3-RELEASE to DragonFly v6.0.0-RELEASE.  It smells like a kernel bug.
> Since the theorized kernel bug seems not to affect
> src/test/subscription/t/015_stream.pl, I wonder if we can borrow a workaround
> from other tests.  One thing in common with src/test/recovery/t/017_shm.pl and
> the newest failure sites is that they don't write anything to the child stdin.
> Does writing e.g. a single byte (that the child doesn't use) work around the
> problem?
Following diff did not solve the problem on my VM, finish() still hangs sometimes.

diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
index 86eb920ea1..0c550ff0dc 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -2112,6 +2112,7 @@ sub background_pgbench

        my $harness = IPC::Run::start \@cmd, '<', \$stdin, '>', $stdout, '2>&1',
          $timer;
+       $stdin .= "some bytes";

        return $harness;
 }




>  If not, does passing the script via stdin, like "pgbench -f-
> <script.sql", work around the problem?

I'll test it tomorrow, the refactoring does not seem trivial given we use many simultaneous scripts.

Best regards, Andrey Borodin.


Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Noah Misch
Date:
On Sat, Oct 23, 2021 at 10:00:28PM -0700, Noah Misch wrote:
> Pushed.

cm@enterprisedb.com, could you post a stack trace from buildfarm member
gharial failing in "make check"?  I'm looking for a trace from a SIGSEGV like
those seen today:

  https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2021-10-24%2016%3A19%3A05
  https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2021-10-24%2020%3A38%3A39

Those SIGSEGV are all in ALTER TABLE.  My commits from this thread probably
broke something, but I've run out of ideas, and my hpux boxes died.


https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=kittiwake&dt=2021-10-24%2012%3A01%3A10
got an interesting v9.6 failure, unrelated to the gharial SIGSEGV failures:

2021-10-24 14:25:29.263 CEST [34569:175] pgbench ERROR:  could not read two-phase state from xlog at 0/158F4E0
2021-10-24 14:25:29.263 CEST [34569:176] pgbench STATEMENT:  COMMIT PREPARED 'c1';

I don't see this thread's commits changing code paths relevant to that
failure, so I currently think this one is a new test showing an old bug.



Re: conchuela timeouts since 2021-10-09 system upgrade

From
Tom Lane
Date:
Andrey Borodin <x4mmm@yandex-team.ru> writes:
>> 24 окт. 2021 г., в 19:19, Noah Misch <noah@leadboat.com> написал(а):
>> These failures started on 2021-10-09, the day conchuela updated from DragonFly
>> v4.4.3-RELEASE to DragonFly v6.0.0-RELEASE.  It smells like a kernel bug.

prairiedog just reported the identical symptom:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prairiedog&dt=2021-10-24%2016%3A05%3A58

I'm thinking "timing problem" more than "kernel bug".  (I'd have also
believed "ancient IPC::Run bug", except that I assume conchuela is
running a pretty modern IPC::Run.)

I can poke into it tomorrow.

            regards, tom lane



Re: conchuela timeouts since 2021-10-09 system upgrade

From
Andrey Borodin
Date:

> 25 окт. 2021 г., в 02:53, Tom Lane <tgl@sss.pgh.pa.us> написал(а):
>
> I can poke into it tomorrow.

FWIW it's easy to make the issue reproduce faster with following diff

diff --git a/contrib/amcheck/t/003_cic_2pc.pl b/contrib/amcheck/t/003_cic_2pc.pl
index f4255c1fb8..0d68c53387 100644
--- a/contrib/amcheck/t/003_cic_2pc.pl
+++ b/contrib/amcheck/t/003_cic_2pc.pl
@@ -141,7 +141,7 @@ $node->safe_psql('postgres', q(REINDEX TABLE tbl;));
 my $pgbench_out   = '';
 my $pgbench_timer = IPC::Run::timeout(180);
 my $pgbench_h     = $node->background_pgbench(
-       '--no-vacuum --client=1 --transactions=100',
+       '--no-vacuum --client=1 --transactions=1',
        {
                '002_pgbench_concurrent_cic' => q(
                        DROP INDEX CONCURRENTLY idx;

I.e. if pgbench exits earlier than we call finish() it always hangs. I observe reproduction on my Drogonfly VM and I
canshare VM state if it's of any help. 

Best regards, Andrey Borodin.


Re: conchuela timeouts since 2021-10-09 system upgrade

From
Tom Lane
Date:
Andrey Borodin <x4mmm@yandex-team.ru> writes:
> FWIW it's easy to make the issue reproduce faster with following diff
> -       '--no-vacuum --client=1 --transactions=100',
> +       '--no-vacuum --client=1 --transactions=1',

Hmm, didn't help here.  It seems that even though prairiedog managed to
fail on its first attempt, it's not terribly reproducible there; I've
seen only one failure in about 30 manual attempts.  In the one failure,
the non-background pgbench completed fine (as determined by counting
statements in the postmaster's log); but the background one had only
finished about 90 transactions before seemingly getting stuck.  No new
SQL commands had been issued after about 10 seconds.

Nonetheless, I have a theory and a proposal.  This coding pattern
seems pretty silly:

    $pgbench_h->pump_nb;
    $pgbench_h->finish();

ISTM that if you need to call pump at all, you need a loop not just
one call.  So I'm guessing that when it fails, it's for lack of
pumping.

The other thing I noticed is that at least on prairiedog's host, the
number of invocations of the DROP/CREATE/bt_index_check transaction
is ridiculously out of proportion to the number of invocations of the
other transactions.  It can only get through seven or eight iterations
of the index transaction before the other transactions are all done,
which means the last 190 iterations of that transaction are a complete
waste of cycles.

What I think we should do in these two tests is nuke the use of
background_pgbench entirely; that looks like a solution in search
of a problem, and it seems unnecessary here.  Why not run
the DROP/CREATE/bt_index_check transaction as one of three script
options in the main pgbench run?  Aside from dodging this
maybe-its-a-bug-or-maybe-not behavior in IPC::Run, this would make the
test automatically scale the number of iterations of the different
transactions to appropriate values, so that we'd not be wasting cycles.

            regards, tom lane



Re: conchuela timeouts since 2021-10-09 system upgrade

From
Noah Misch
Date:
On Mon, Oct 25, 2021 at 04:59:42PM -0400, Tom Lane wrote:
> Andrey Borodin <x4mmm@yandex-team.ru> writes:
> > FWIW it's easy to make the issue reproduce faster with following diff
> > -       '--no-vacuum --client=1 --transactions=100',
> > +       '--no-vacuum --client=1 --transactions=1',
> 
> Hmm, didn't help here.  It seems that even though prairiedog managed to
> fail on its first attempt, it's not terribly reproducible there; I've
> seen only one failure in about 30 manual attempts.  In the one failure,
> the non-background pgbench completed fine (as determined by counting
> statements in the postmaster's log); but the background one had only
> finished about 90 transactions before seemingly getting stuck.  No new
> SQL commands had been issued after about 10 seconds.

Interesting.
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prairiedog&dt=2021-10-24%2016%3A05%3A58
also shows a short command count, just 131/200 completed.  However,
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=conchuela&dt=2021-10-25%2000%3A35%3A27
shows the full 200/200.  I'm starting to think the prairiedog failures have
only superficial similarity to the conchuela failures.

> Nonetheless, I have a theory and a proposal.  This coding pattern
> seems pretty silly:
> 
>     $pgbench_h->pump_nb;
>     $pgbench_h->finish();
> 
> ISTM that if you need to call pump at all, you need a loop not just
> one call.  So I'm guessing that when it fails, it's for lack of
> pumping.

The pump_nb() is just unnecessary.  We've not added anything destined for
stdin, and finish() takes care of pumping outputs.

> The other thing I noticed is that at least on prairiedog's host, the
> number of invocations of the DROP/CREATE/bt_index_check transaction
> is ridiculously out of proportion to the number of invocations of the
> other transactions.  It can only get through seven or eight iterations
> of the index transaction before the other transactions are all done,
> which means the last 190 iterations of that transaction are a complete
> waste of cycles.

That makes sense.

> What I think we should do in these two tests is nuke the use of
> background_pgbench entirely; that looks like a solution in search
> of a problem, and it seems unnecessary here.  Why not run
> the DROP/CREATE/bt_index_check transaction as one of three script
> options in the main pgbench run?

The author tried that and got deadlocks:
https://postgr.es/m/5E041A70-4946-489C-9B6D-764DF627A92D@yandex-team.ru


On prairiedog, the proximate trouble is pgbench getting stuck.  IPC::Run is
behaving normally given a stuck pgbench.  When pgbench stops sending queries,
does pg_stat_activity show anything at all running?  If so, are those backends
waiting on locks?  If not, what's the pgbench stack trace at that time?



Re: conchuela timeouts since 2021-10-09 system upgrade

From
Tom Lane
Date:
Noah Misch <noah@leadboat.com> writes:
> On Mon, Oct 25, 2021 at 04:59:42PM -0400, Tom Lane wrote:
>> What I think we should do in these two tests is nuke the use of
>> background_pgbench entirely; that looks like a solution in search
>> of a problem, and it seems unnecessary here.  Why not run
>> the DROP/CREATE/bt_index_check transaction as one of three script
>> options in the main pgbench run?

> The author tried that and got deadlocks:
> https://postgr.es/m/5E041A70-4946-489C-9B6D-764DF627A92D@yandex-team.ru

Hmm, I guess that's because two concurrent CICs can deadlock against each
other.  I wonder if we could fix that ... or maybe we could teach pgbench
that it mustn't launch more than one instance of that script?  Or more
practically, use advisory locks in that script to enforce that only one
runs at once.

> On prairiedog, the proximate trouble is pgbench getting stuck.  IPC::Run is
> behaving normally given a stuck pgbench.  When pgbench stops sending queries,
> does pg_stat_activity show anything at all running?  If so, are those backends
> waiting on locks?  If not, what's the pgbench stack trace at that time?

The connected backend is idle, waiting for client input:

#0  0x9002ec88 in kevent ()
#1  0x003e3fa8 in WaitEventSetWait (set=0x3802258, timeout=-1, occurred_events=0xbfffe098, nevents=1,
wait_event_info=0)at latch.c:1601 
#2  0x0028d6a0 in secure_read (port=0x2501f40, ptr=0x7ad1d0, len=8192) at be-secure.c:186
#3  0x002958a4 in pq_recvbuf () at pqcomm.c:957
#4  0x00295994 in pq_getbyte () at pqcomm.c:1000
#5  0x004196a0 in PostgresMain

The last few entries in the postmaster log are:

2021-10-26 01:19:31.122 EDT [1013] 002_cic.pl LOG:  statement: DROP INDEX CONCURRENTLY idx;
2021-10-26 01:19:31.136 EDT [1013] 002_cic.pl LOG:  statement: CREATE INDEX CONCURRENTLY idx ON tbl(i);
2021-10-26 01:19:31.167 EDT [1013] 002_cic.pl LOG:  statement: SELECT bt_index_check('idx',true);

which is consistent with pg_stat_activity:

 datid | datname  | pid  | leader_pid | usesysid | usename | application_name | client_addr | client_hostname |
client_port|         backend_start         |          xact_start           |          query_start          |
state_change          
| wait_event_type |     wait_event      | state  | backend_xid | backend_xmin | query_id |               query
     |         backend_type          

-------+----------+------+------------+----------+---------+------------------+-------------+-----------------+-------------+-------------------------------+-------------------------------+-------------------------------+-------------------------------

+-----------------+---------------------+--------+-------------+--------------+----------+------------------------------------+------------------------------
 13019 | postgres | 1013 |            |       10 | tgl     | 002_cic.pl       |             |                 |
-1 | 2021-10-26 01:19:18.49282-04  |                               | 2021-10-26 01:19:31.167508-04 | 2021-10-26
01:19:31.190256-04 
| Client          | ClientRead          | idle   |             |              |          | SELECT
bt_index_check('idx',true);| client backend 

The pgbench process is sitting here:

#0  0x9001f488 in select ()
#1  0x0000758c in wait_on_socket_set (sa=0x300bb0, usecs=0) at pgbench.c:7098
#2  0x0000d514 in threadRun (arg=0x300420) at pgbench.c:6776
#3  0x00010c44 in main (argc=-1635084506, argv=0x0) at pgbench.c:6540

which means it thinks it's waiting for input.  The PGconn object is in
this state:

  status = CONNECTION_OK,
  asyncStatus = PGASYNC_BUSY,
  xactStatus = PQTRANS_IDLE,
  ...
  inBuffer = 0x6a000 "T",
  inBufSize = 16384,
  inStart = 0,
  inCursor = 0,
  inEnd = 0,
  outBuffer = 0x6e000 "Q",
  outBufSize = 16384,
  outCount = 0,
  outMsgStart = 1,
  outMsgEnd = 34,

The inBuffer is logically empty, but its contents look like the last input
was the result of the "SELECT bt_index_check" command, consistent with
the postmaster log.  The outBuffer is likewise logically empty, but what
it contains appears to be the next DROP INDEX command:

(gdb) x/64c state->con->outBuffer
0x6e000:        81 'Q'  0 '\0'  0 '\0'  0 '\0'  33 '!'  68 'D'  82 'R'  79 'O'
0x6e008:        80 'P'  32 ' '  73 'I'  78 'N'  68 'D'  69 'E'  88 'X'  32 ' '
0x6e010:        67 'C'  79 'O'  78 'N'  67 'C'  85 'U'  82 'R'  82 'R'  69 'E'
0x6e018:        78 'N'  84 'T'  76 'L'  89 'Y'  32 ' '  105 'i' 100 'd' 120 'x'
0x6e020:        59 ';'  0 '\0'  114 'r' 117 'u' 101 'e' 41 ')'  59 ';'  0 '\0'
0x6e028:        108 'l' 40 '('  105 'i' 41 ')'  59 ';'  0 '\0'  95 '_'  110 'n'
0x6e030:        97 'a'  109 'm' 101 'e' 0 '\0'  48 '0'  48 '0'  50 '2'  95 '_'
0x6e038:        99 'c'  105 'i' 99 'c'  46 '.'  112 'p' 108 'l' 0 '\0'  0 '\0'

So what we have is that libpq thinks it's sent the next DROP INDEX,
but the backend hasn't seen it.

It's fairly hard to blame that state of affairs on the IPC::Run harness.
I'm wondering if we might be looking at some timing-dependent corner-case
bug in the new libpq pipelining code.  Pipelining isn't enabled:

  pipelineStatus = PQ_PIPELINE_OFF,

but that doesn't mean that the pipelining code hasn't been anywhere
near this command.  I can see

  cmd_queue_head = 0x300d40,
  cmd_queue_tail = 0x300d40,
  cmd_queue_recycle = 0x0,

(gdb) p *state->con->cmd_queue_head
$4 = {
  queryclass = PGQUERY_SIMPLE,
  query = 0x3004e0 "DROP INDEX CONCURRENTLY idx;",
  next = 0x0
}

The trouble with this theory, of course, is "if libpq is busted, why is
only this test case showing it?".  But AFAICS it would take some pretty
spooky action-at-a-distance for the Perl harness to have caused this.

            regards, tom lane



Re: conchuela timeouts since 2021-10-09 system upgrade

From
Noah Misch
Date:
On Tue, Oct 26, 2021 at 02:03:54AM -0400, Tom Lane wrote:
> Noah Misch <noah@leadboat.com> writes:
> > On Mon, Oct 25, 2021 at 04:59:42PM -0400, Tom Lane wrote:
> >> What I think we should do in these two tests is nuke the use of
> >> background_pgbench entirely; that looks like a solution in search
> >> of a problem, and it seems unnecessary here.  Why not run
> >> the DROP/CREATE/bt_index_check transaction as one of three script
> >> options in the main pgbench run?
> 
> > The author tried that and got deadlocks:
> > https://postgr.es/m/5E041A70-4946-489C-9B6D-764DF627A92D@yandex-team.ru
> 
> Hmm, I guess that's because two concurrent CICs can deadlock against each
> other.  I wonder if we could fix that ... or maybe we could teach pgbench
> that it mustn't launch more than one instance of that script?

Both sound doable, but I don't expect either to fix prairiedog's trouble.

> Or more
> practically, use advisory locks in that script to enforce that only one
> runs at once.

The author did try that.

> So what we have is that libpq thinks it's sent the next DROP INDEX,
> but the backend hasn't seen it.

Thanks for isolating that.

> It's fairly hard to blame that state of affairs on the IPC::Run harness.
> I'm wondering if we might be looking at some timing-dependent corner-case
> bug in the new libpq pipelining code.  Pipelining isn't enabled:
> 
>   pipelineStatus = PQ_PIPELINE_OFF, 
> 
> but that doesn't mean that the pipelining code hasn't been anywhere
> near this command.  I can see
> 
>   cmd_queue_head = 0x300d40, 
>   cmd_queue_tail = 0x300d40, 
>   cmd_queue_recycle = 0x0, 
> 
> (gdb) p *state->con->cmd_queue_head
> $4 = {
>   queryclass = PGQUERY_SIMPLE, 
>   query = 0x3004e0 "DROP INDEX CONCURRENTLY idx;", 
>   next = 0x0
> }
> 
> The trouble with this theory, of course, is "if libpq is busted, why is
> only this test case showing it?".

Agreed, it's not clear how the new tests would reveal a libpq bug that
src/bin/pgbench/t/001_pgbench_with_server.pl has been unable to reveal.  Does
the problem reproduce on v13?

Grasping at straws, background_pgbench does differ by specifying stdin as a
ref to an empty scalar.  I think that makes IPC::Run open a pipe and never
write to it.  The older pgbench tests don't override stdin, so I think that
makes pgbench inherit the original stdin.  Given your pgbench stack trace,
this seems awfully unlikely to be the relevant difference.  If we run out of
ideas, you could try some runs with that difference removed:

--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -2110,7 +2110,7 @@ sub background_pgbench
     # IPC::Run would otherwise append to existing contents:
     $$stdout = "" if ref($stdout);
 
-    my $harness = IPC::Run::start \@cmd, '<', \$stdin, '>', $stdout, '2>&1',
+    my $harness = IPC::Run::start \@cmd, '>', $stdout, '2>&1',
       $timer;
 
     return $harness;

> But AFAICS it would take some pretty
> spooky action-at-a-distance for the Perl harness to have caused this.

Agreed.  We'll have to consider the harness innocent for the moment.



Re: conchuela timeouts since 2021-10-09 system upgrade

From
Tom Lane
Date:
Noah Misch <noah@leadboat.com> writes:
> On Tue, Oct 26, 2021 at 02:03:54AM -0400, Tom Lane wrote:
>> Or more
>> practically, use advisory locks in that script to enforce that only one
>> runs at once.

> The author did try that.

Hmm ... that ought to have done the trick, I'd think.  However:

> Both sound doable, but I don't expect either to fix prairiedog's trouble.

Yeah :-(.  I think this test is somehow stumbling over a pre-existing bug.

>> So what we have is that libpq thinks it's sent the next DROP INDEX,
>> but the backend hasn't seen it.

> Thanks for isolating that.

The plot thickens.  When I went back to look at that machine this morning,
I found this in the postmaster log:

2021-10-26 02:52:09.324 EDT [1013] 002_cic.pl LOG:  statement: DROP INDEX CONCURRENTLY idx;
2021-10-26 02:52:09.352 EDT [1013] 002_cic.pl LOG:  could not send data to client: Broken pipe
2021-10-26 02:52:09.352 EDT [1013] 002_cic.pl FATAL:  connection to client lost

The timestamps correspond (more or less anyway) to when I killed off the
stuck test run and went to bed.  So the DROP command *was* sent, and it
was eventually received by the backend, but it seems to have taken killing
the pgbench process to do it.

I think this probably exonerates the pgbench/libpq side of things, and
instead we have to wonder about a backend or kernel bug.  A kernel bug
could possibly explain the unexplainable connection to what's happening on
some other file descriptor.  I'd be prepared to believe that prairiedog's
ancient macOS version has some weird bug preventing kevent() from noticing
available data ... but (a) surely conchuela wouldn't share such a bug,
and (b) we've been using kevent() for a couple years now, so how come
we didn't see this before?

Still baffled.  I'm currently experimenting to see if the bug reproduces
when latch.c is made to use poll() instead of kevent().  But the failure
rate was low enough that it'll be hours before I can say confidently
that it doesn't (unless, of course, it does).

            regards, tom lane



Re: conchuela timeouts since 2021-10-09 system upgrade

From
Tom Lane
Date:
Noah Misch <noah@leadboat.com> writes:
> On Tue, Oct 26, 2021 at 02:03:54AM -0400, Tom Lane wrote:
>> Or more
>> practically, use advisory locks in that script to enforce that only one
>> runs at once.

> The author did try that.

Ah, I see: if the other pgbench thread is waiting in pg_advisory_lock,
then it is inside a transaction, so it blocks CIC from completing.
We can get around that though, by using pg_try_advisory_lock and not
proceeding if it fails.  The attached POC does this for the 002 test;
it looks like the same thing could be done to 003.

Now the problem with this is it will only work back to v12, because
pgbench lacks the necessary features before that.  However, I think
it's worth doing it like this in versions where we can do so, because
of the load-balancing aspect: this won't waste cycles running CICs
after the inserts have stopped, nor vice versa.

Thoughts?

            regards, tom lane

diff --git a/contrib/amcheck/t/002_cic.pl b/contrib/amcheck/t/002_cic.pl
index 0b14e66270..1f258dd59d 100644
--- a/contrib/amcheck/t/002_cic.pl
+++ b/contrib/amcheck/t/002_cic.pl
@@ -9,7 +9,7 @@ use Config;
 use PostgreSQL::Test::Cluster;
 use PostgreSQL::Test::Utils;
 
-use Test::More tests => 4;
+use Test::More tests => 3;
 
 my ($node, $result);
 
@@ -25,32 +25,18 @@ $node->safe_psql('postgres', q(CREATE TABLE tbl(i int)));
 $node->safe_psql('postgres', q(CREATE INDEX idx ON tbl(i)));
 
 #
-# Stress CIC with pgbench
+# Stress CIC with pgbench.
+#
+# pgbench might try to launch more than one instance of the CIC
+# transaction concurrently.  That would deadlock, so use an advisory
+# lock to ensure only one CIC runs at a time.
 #
-
-# Run background pgbench with CIC. We cannot mix-in this script into single
-# pgbench: CIC will deadlock with itself occasionally.
-my $pgbench_out   = '';
-my $pgbench_timer = IPC::Run::timeout(180);
-my $pgbench_h     = $node->background_pgbench(
-    '--no-vacuum --client=1 --transactions=200',
-    {
-        '002_pgbench_concurrent_cic' => q(
-            DROP INDEX CONCURRENTLY idx;
-            CREATE INDEX CONCURRENTLY idx ON tbl(i);
-            SELECT bt_index_check('idx',true);
-           )
-    },
-    \$pgbench_out,
-    $pgbench_timer);
-
-# Run pgbench.
 $node->pgbench(
     '--no-vacuum --client=5 --transactions=200',
     0,
     [qr{actually processed}],
     [qr{^$}],
-    'concurrent INSERTs',
+    'concurrent INSERTs and CIC',
     {
         '002_pgbench_concurrent_transaction' => q(
             BEGIN;
@@ -62,17 +48,17 @@ $node->pgbench(
             SAVEPOINT s1;
             INSERT INTO tbl VALUES(0);
             COMMIT;
+          ),
+        '002_pgbench_concurrent_cic' => q(
+            SELECT pg_try_advisory_lock(42)::integer AS gotlock \gset
+            \if :gotlock
+                DROP INDEX CONCURRENTLY idx;
+                CREATE INDEX CONCURRENTLY idx ON tbl(i);
+                SELECT bt_index_check('idx',true);
+                SELECT pg_advisory_unlock(42);
+            \endif
           )
     });
 
-$pgbench_h->pump_nb;
-$pgbench_h->finish();
-$result =
-    ($Config{osname} eq "MSWin32")
-  ? ($pgbench_h->full_results)[0]
-  : $pgbench_h->result(0);
-is($result, 0, "pgbench with CIC works");
-
-# done
 $node->stop;
 done_testing();

Re: conchuela timeouts since 2021-10-09 system upgrade

From
Thomas Munro
Date:
On Wed, Oct 27, 2021 at 3:29 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> [...] I'd be prepared to believe that prairiedog's
> ancient macOS version has some weird bug preventing kevent() from noticing
> available data ... but (a) surely conchuela wouldn't share such a bug,
> and (b) we've been using kevent() for a couple years now, so how come
> we didn't see this before?

There was this case soon after our kqueue support landed:

https://www.postgresql.org/message-id/CA%2BhUKGLzaR5cV0EmZWoVXJDO_XwZpmpQX_sYwCBRE1qLBEcGPQ%40mail.gmail.com

There are a few discussions on the 'net about the flakiness of both
kevent() and poll() around that vintage of macOS (both were new and
shared infrastructure, separate from select()); for example in libcurl
and libevent talked about this and blocked version ranges.

I don't have any ideas about conchuela.  For the next person who
manages to reproduce this, just to sanity-check what we're passing in
to kevent(), what do *port and waitfor look like when secure_read()
blocks in WaitEventSetWait?  It's good news that Andrey could
reproduce this on a VM.  I may look into setting one of those up too.



Re: conchuela timeouts since 2021-10-09 system upgrade

From
Tom Lane
Date:
Thomas Munro <thomas.munro@gmail.com> writes:
> On Wed, Oct 27, 2021 at 3:29 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> [...] I'd be prepared to believe that prairiedog's
>> ancient macOS version has some weird bug preventing kevent() from noticing
>> available data ... but (a) surely conchuela wouldn't share such a bug,
>> and (b) we've been using kevent() for a couple years now, so how come
>> we didn't see this before?

> There was this case soon after our kqueue support landed:
> https://www.postgresql.org/message-id/CA%2BhUKGLzaR5cV0EmZWoVXJDO_XwZpmpQX_sYwCBRE1qLBEcGPQ%40mail.gmail.com

Oh yeah ... that looks like the exact same thing, doesn't it?

I've just finished doing 500 cycles of amcheck's 002_cic.pl without a
failure after recompiling latch.c with -DWAIT_USE_POLL.  With the kevent
code path, it is hard to reproduce but not that hard.  So unless we can
learn something from the DragonFly case, I'm inclined to write this off
as ancient-macOS-bug and start using -DWAIT_USE_POLL on prairiedog.

> I don't have any ideas about conchuela.  For the next person who
> manages to reproduce this, just to sanity-check what we're passing in
> to kevent(), what do *port and waitfor look like when secure_read()
> blocks in WaitEventSetWait?

Not sure which variables you're referring to there, but here's a more
complete gdb dump from the stuck backend:

(gdb) bt
#0  0x9002ec88 in kevent ()
#1  0x003e3fa8 in WaitEventSetWait (set=0x3802258, timeout=-1, occurred_events=0xbfffe098, nevents=1,
wait_event_info=0)at latch.c:1601 
#2  0x0028d6a0 in secure_read (port=0x2501f40, ptr=0x7ad1d0, len=8192) at be-secure.c:186
#3  0x002958a4 in pq_recvbuf () at pqcomm.c:957
#4  0x00295994 in pq_getbyte () at pqcomm.c:1000
#5  0x004196a0 in PostgresMain (dbname=0x3805894 "", username=0x789020 "...") at postgres.c:353
#6  0x00363df8 in PostmasterMain (argc=8059652, argv=0x61789299) at postmaster.c:4560
#7  0x00298b6c in main (argc=4, argv=0x2500a30) at main.c:198
(gdb) f 1
#1  0x003e3fa8 in WaitEventSetWait (set=0x3802258, timeout=-1, occurred_events=0xbfffe098, nevents=1,
wait_event_info=0)at latch.c:1601 
1601            rc = kevent(set->kqueue_fd, NULL, 0,
(gdb) i locals
cur_event = (WaitEvent *) 0x78ed40
cur_kqueue_event = (struct kevent *) 0x0
timeout = {
  tv_sec = 13,
  tv_nsec = 0
}
rc = 4
timeout_p = (struct timespec *) 0x0
rc = 4
returned_events = 4
start_time = {
  tv_sec = -1879041144,
  tv_usec = 58742932
}
cur_time = {
  tv_sec = 59028992,
  tv_usec = 2
}
cur_timeout = -1
(gdb) p *set
$1 = {
  nevents = 3,
  nevents_space = 3,
  events = 0x380227c,
  latch = 0xc954ebc,
  latch_pos = 1,
  exit_on_postmaster_death = 0 '\0',
  kqueue_fd = 4,
  kqueue_ret_events = 0x38022ac,
  report_postmaster_not_running = 0 '\0'
}
(gdb) p set->events[0]
$2 = {
  pos = 0,
  events = 2,
  fd = 7,      # matches MyProcPort->sock
  user_data = 0x0
}
(gdb) p set->events[1]
$3 = {
  pos = 1,
  events = 1,
  fd = -1,
  user_data = 0x0
}
(gdb) p set->events[2]
$4 = {
  pos = 2,
  events = 16,
  fd = 3,
  user_data = 0x0
}
(gdb) p set->kqueue_ret_events[0]
$6 = {
  ident = 7,       # matches MyProcPort->sock
  filter = -1,     # EVFILT_READ
  flags = 1,       # EV_ADD
  fflags = 0,
  data = 40,
  udata = 0x380227c
}

So AFAICS, the data we're passing to kevent() is all sane.

It then occurred to me to replicate the case I saw this morning
(control-C'ing the test script) and ktrace all the involved processes.
The perl test script and the pgbench process both just immediately
die on SIGINT:

 25500 pgbench  RET   select -1 errno 4 Interrupted system call
 25500 pgbench  PSIG  SIGINT SIG_DFL

so they are clearly not doing anything to cause the DROP command
to get delivered.  Nonetheless, what the backend sees post-control-C is

 25502 postgres RET   kevent 1
 25502 postgres CALL  recvfrom(0x7,0x7ad1d0,0x2000,0,0,0)
 25502 postgres GIO   fd 7 wrote 34 bytes
       "Q\0\0\0!DROP INDEX CONCURRENTLY idx;\0"
 25502 postgres RET   recvfrom 34/0x22
 ... blah, blah, executing the command ...
 25502 postgres CALL  sendto(0x7,0x241d020,0x16,0,0,0)
 25502 postgres RET   sendto -1 errno 32 Broken pipe
 ... blah, blah, shutting down ...

It's really hard to look at this and not call it a kernel bug.

            regards, tom lane



Re: conchuela timeouts since 2021-10-09 system upgrade

From
Noah Misch
Date:
On Tue, Oct 26, 2021 at 03:41:58PM -0400, Tom Lane wrote:
> Noah Misch <noah@leadboat.com> writes:
> > On Tue, Oct 26, 2021 at 02:03:54AM -0400, Tom Lane wrote:
> >> Or more
> >> practically, use advisory locks in that script to enforce that only one
> >> runs at once.
> 
> > The author did try that.
> 
> Ah, I see: if the other pgbench thread is waiting in pg_advisory_lock,
> then it is inside a transaction, so it blocks CIC from completing.
> We can get around that though, by using pg_try_advisory_lock and not
> proceeding if it fails.

Good thought.

> The attached POC does this for the 002 test;
> it looks like the same thing could be done to 003.
> 
> Now the problem with this is it will only work back to v12, because
> pgbench lacks the necessary features before that.  However, I think
> it's worth doing it like this in versions where we can do so, because
> of the load-balancing aspect: this won't waste cycles running CICs
> after the inserts have stopped, nor vice versa.
> 
> Thoughts?

I tried it out with the fix removed (git checkout fdd965d; git checkout HEAD^
src/include src/backend; git apply CIC-test-with-one-pgbench.patch).
Sensitivity (~25%) and runtime (~900ms) were in the same ballpark.  Still,
even if it doesn't buy back notable cycles, the test code seems easier to read
and understand with your change.



Re: conchuela timeouts since 2021-10-09 system upgrade

From
Tom Lane
Date:
Noah Misch <noah@leadboat.com> writes:
> On Tue, Oct 26, 2021 at 03:41:58PM -0400, Tom Lane wrote:
>> Now the problem with this is it will only work back to v12, because
>> pgbench lacks the necessary features before that.  However, I think
>> it's worth doing it like this in versions where we can do so, because
>> of the load-balancing aspect: this won't waste cycles running CICs
>> after the inserts have stopped, nor vice versa.

> I tried it out with the fix removed (git checkout fdd965d; git checkout HEAD^
> src/include src/backend; git apply CIC-test-with-one-pgbench.patch).
> Sensitivity (~25%) and runtime (~900ms) were in the same ballpark.  Still,
> even if it doesn't buy back notable cycles, the test code seems easier to read
> and understand with your change.

The point wasn't to reduce the runtime: it was to get more useful test
iterations in the same runtime.  A test iteration that runs CIC while
there are no concurrent inserts isn't providing any useful coverage.

It's possible that with this patch, we could dial it back to less
than 1000 (200x5) total transactions and still get the same effective
number of test iterations as we have in HEAD.  I didn't really
look into that.  I did notice that even on a much faster machine
than prairiedog, HEAD wastes at least three-quarters of the CIC
iterations.

            regards, tom lane



Re: conchuela timeouts since 2021-10-09 system upgrade

From
Andrey Borodin
Date:

> 26 окт. 2021 г., в 22:41, Tom Lane <tgl@sss.pgh.pa.us> написал(а):
>
> We can get around that though, by using pg_try_advisory_lock and not
> proceeding if it fails.  The attached POC does this for the 002 test;
> it looks like the same thing could be done to 003.

That's a neat idea. PFA patch with copy of this changes to 003.
Also I've added a script to stress not only CREATE INDEX CONCURRENTLY, but also REINDEX CONCURRENTLY.
Because it was easy and checks slightly more code paths.
I do not observe failures on Dragonfly 6.0 with the patch, but I didn't run for a long yet time.

Best regards, Andrey Borodin.



Attachment

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Noah Misch
Date:
On Sun, Oct 24, 2021 at 04:35:02PM -0700, Noah Misch wrote:
> On Sat, Oct 23, 2021 at 10:00:28PM -0700, Noah Misch wrote:
> > Pushed.
> 
> cm@enterprisedb.com, could you post a stack trace from buildfarm member
> gharial failing in "make check"?  I'm looking for a trace from a SIGSEGV like
> those seen today:
> 
>   https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2021-10-24%2016%3A19%3A05
>   https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2021-10-24%2020%3A38%3A39
> 
> Those SIGSEGV are all in ALTER TABLE.  My commits from this thread probably
> broke something, but I've run out of ideas, and my hpux boxes died.

cm@enterprisedb.com, is this still in your queue, or not?  I am somewhat
concerned that the 2021-11-11 releases will contain a regression if the
gharial failures remain undiagnosed.  I don't know how to diagnose them
without your help.  (I currently can't justify reverting the suspect bug fix
on the basis of two master-branch-only failures in one buildfarm member.)



Re: conchuela timeouts since 2021-10-09 system upgrade

From
Tom Lane
Date:
Andrey Borodin <x4mmm@yandex-team.ru> writes:
>> 26 окт. 2021 г., в 22:41, Tom Lane <tgl@sss.pgh.pa.us> написал(а):
>> We can get around that though, by using pg_try_advisory_lock and not
>> proceeding if it fails.  The attached POC does this for the 002 test;
>> it looks like the same thing could be done to 003.

> That's a neat idea. PFA patch with copy of this changes to 003.

Pushed.  It'll be interesting to see if conchuela's behavior changes.

            regards, tom lane



Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Thomas Munro
Date:
On Fri, Oct 29, 2021 at 3:05 AM Noah Misch <noah@leadboat.com> wrote:
> > Those SIGSEGV are all in ALTER TABLE.  My commits from this thread probably
> > broke something, but I've run out of ideas, and my hpux boxes died.

I recently managed to scrounge up an account in another generous
project's operating system graveyard, and could possibly try to repro
that or put you in touch.  On first attempt just now I couldn't build
because of weird problems with ifaddr.c's need for structs in
<net/if.h> that seem to be kernel/internal-only that I unfortunately
won't have time to debug until next week (I've successfully built here
before so I must be doing something stupid).



Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Noah Misch
Date:
On Fri, Oct 29, 2021 at 09:59:14AM +1300, Thomas Munro wrote:
> On Fri, Oct 29, 2021 at 3:05 AM Noah Misch <noah@leadboat.com> wrote:
> > > Those SIGSEGV are all in ALTER TABLE.  My commits from this thread probably
> > > broke something, but I've run out of ideas, and my hpux boxes died.
> 
> I recently managed to scrounge up an account in another generous
> project's operating system graveyard, and could possibly try to repro
> that or put you in touch.  On first attempt just now I couldn't build
> because of weird problems with ifaddr.c's need for structs in
> <net/if.h> that seem to be kernel/internal-only that I unfortunately
> won't have time to debug until next week (I've successfully built here
> before so I must be doing something stupid).

Thanks.  Either would help.  If you can put me in touch this week, let's try
that since there's little time left before release.



Re: conchuela timeouts since 2021-10-09 system upgrade

From
Noah Misch
Date:
On Sun, Oct 24, 2021 at 09:03:27PM +0300, Andrey Borodin wrote:
> > 24 окт. 2021 г., в 19:19, Noah Misch <noah@leadboat.com> написал(а):
> > These failures started on 2021-10-09, the day conchuela updated from DragonFly
> > v4.4.3-RELEASE to DragonFly v6.0.0-RELEASE.  It smells like a kernel bug.
> > Since the theorized kernel bug seems not to affect
> > src/test/subscription/t/015_stream.pl, I wonder if we can borrow a workaround
> > from other tests.  One thing in common with src/test/recovery/t/017_shm.pl and
> > the newest failure sites is that they don't write anything to the child stdin.

> >  If not, does passing the script via stdin, like "pgbench -f-
> > <script.sql", work around the problem?
> 
> I'll test it tomorrow, the refactoring does not seem trivial given we use many simultaneous scripts.

Did that work?  Commit 7f580aa should make this unnecessary for v12+
contrib/amcheck tests, but older branches still need a fix, and 017_shm.pl
needs a fix in all branches.  A backup plan is just to skip affected tests on
dragonfly 6+.  Since the breakage has been limited to so few tests, I'm
optimistic that a better workaround will present itself.



Re: conchuela timeouts since 2021-10-09 system upgrade

From
Tom Lane
Date:
Noah Misch <noah@leadboat.com> writes:
> Did that work?  Commit 7f580aa should make this unnecessary for v12+
> contrib/amcheck tests, but older branches still need a fix, and 017_shm.pl
> needs a fix in all branches.  A backup plan is just to skip affected tests on
> dragonfly 6+.  Since the breakage has been limited to so few tests, I'm
> optimistic that a better workaround will present itself.

It indeed is looking like 7f580aa made the problem go away on conchuela,
but do we understand why?  The only theory I can think of is "kernel bug",
but while that's plausible for prairiedog it seems hard to credit for a
late-model BSD kernel.

            regards, tom lane



Re: conchuela timeouts since 2021-10-09 system upgrade

From
Thomas Munro
Date:
On Fri, Oct 29, 2021 at 4:20 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> It indeed is looking like 7f580aa made the problem go away on conchuela,
> but do we understand why?  The only theory I can think of is "kernel bug",
> but while that's plausible for prairiedog it seems hard to credit for a
> late-model BSD kernel.

I have yet to even log into a DBSD system (my attempt to install the
6.0.1 ISO on bhyve failed for lack of a driver, or something), but I
do intend to get it working at some point.  But I can offer a poorly
researched wildly speculative hypothesis: DBSD forked from FBSD in
2003.  macOS 10.3 took FBSD's kqueue code in... 2003.  So maybe a bug
was fixed later that they both inherited?  Or perhaps that makes no
sense, I dunno.  It'd be nice to try to write a repro and send them a
report, if we can.



Re: conchuela timeouts since 2021-10-09 system upgrade

From
Andrey Borodin
Date:

> 29 окт. 2021 г., в 08:42, Thomas Munro <thomas.munro@gmail.com> написал(а):
>
> On Fri, Oct 29, 2021 at 4:20 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> It indeed is looking like 7f580aa made the problem go away on conchuela,
>> but do we understand why?  The only theory I can think of is "kernel bug",
>> but while that's plausible for prairiedog it seems hard to credit for a
>> late-model BSD kernel.
>
> I have yet to even log into a DBSD system (my attempt to install the
> 6.0.1 ISO on bhyve failed for lack of a driver, or something), but I
> do intend to get it working at some point.

Here's an exported VM https://storage.yandexcloud.net/freebsd/dffb.ova if it's of any use.
root password is P@ssw0rd

To reproduce
cd postgres/contrib/amcheck
su x4m
/tesh.sh

Best regards, Andrey Borodin.


Re: conchuela timeouts since 2021-10-09 system upgrade

From
Noah Misch
Date:
On Fri, Oct 29, 2021 at 04:42:31PM +1300, Thomas Munro wrote:
> On Fri, Oct 29, 2021 at 4:20 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > It indeed is looking like 7f580aa made the problem go away on conchuela,
> > but do we understand why?

I don't.

> > The only theory I can think of is "kernel bug",
> > but while that's plausible for prairiedog it seems hard to credit for a
> > late-model BSD kernel.

DragonFly BSD is a niche OS, so I'm more willing than usual to conclude that.
Could be a bug in IPC::Run or in the port of Perl to DragonFly, but those feel
less likely than the kernel.  The upgrade from DragonFly v4.4.3 to DragonFly
v6.0.0, which introduced this form of PostgreSQL test breakage, also updated
Perl from v5.20.3 to 5.32.1.

> I have yet to even log into a DBSD system (my attempt to install the
> 6.0.1 ISO on bhyve failed for lack of a driver, or something), but I
> do intend to get it working at some point.  But I can offer a poorly
> researched wildly speculative hypothesis: DBSD forked from FBSD in
> 2003.  macOS 10.3 took FBSD's kqueue code in... 2003.  So maybe a bug
> was fixed later that they both inherited?  Or perhaps that makes no
> sense, I dunno.  It'd be nice to try to write a repro and send them a
> report, if we can.

The conchuela bug and the prairiedog bug both present with a timeout in
IPC::Run::finish, but the similarity ends there.  On prairiedog, the
postmaster was stuck when it should have been reading a query from pgbench.
On conchuela, pgbench ran to completion and became a zombie, and IPC::Run got
stuck when it should have been reaping that zombie.  Good thought, however.



Re: conchuela timeouts since 2021-10-09 system upgrade

From
Thomas Munro
Date:
On Fri, Oct 29, 2021 at 7:39 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
> Here's an exported VM https://storage.yandexcloud.net/freebsd/dffb.ova if it's of any use.

Thanks!  I may yet need to use that because I haven't seen the problem
yet, but I finally managed to get the OS running with a reusable
Vagrant file after solving a couple of stupid problems[1].

[1] https://github.com/macdice/postgresql-dev-vagrant/blob/master/dragonfly6/Vagrantfile



Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Sandeep Thakkar
Date:
Hi Noah,

On Thu, Oct 28, 2021 at 7:35 PM Noah Misch <noah@leadboat.com> wrote:
On Sun, Oct 24, 2021 at 04:35:02PM -0700, Noah Misch wrote:
> On Sat, Oct 23, 2021 at 10:00:28PM -0700, Noah Misch wrote:
> > Pushed.
>
> cm@enterprisedb.com, could you post a stack trace from buildfarm member
> gharial failing in "make check"?  I'm looking for a trace from a SIGSEGV like
> those seen today:
>
>   https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2021-10-24%2016%3A19%3A05
>   https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2021-10-24%2020%3A38%3A39
>
> Those SIGSEGV are all in ALTER TABLE.  My commits from this thread probably
> broke something, but I've run out of ideas, and my hpux boxes died.

cm@enterprisedb.com, is this still in your queue, or not?  I am somewhat
concerned that the 2021-11-11 releases will contain a regression if the
gharial failures remain undiagnosed.  I don't know how to diagnose them
without your help.  (I currently can't justify reverting the suspect bug fix
on the basis of two master-branch-only failures in one buildfarm member.)

Sorry for responding late as this email was missed somehow :-(  I checked the reports on the dashboard and there was a failure then but the reports after 25th Oct looks fine. The back branches all look fine as well. https://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=gharial&br=HEAD

Please let me know if I missed something

--
Sandeep Thakkar


Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Andrey Borodin
Date:

> 30 окт. 2021 г., в 11:09, Sandeep Thakkar <sandeep.thakkar@enterprisedb.com> написал(а):
>
> On Thu, Oct 28, 2021 at 7:35 PM Noah Misch <noah@leadboat.com> wrote:
> On Sun, Oct 24, 2021 at 04:35:02PM -0700, Noah Misch wrote:
> > On Sat, Oct 23, 2021 at 10:00:28PM -0700, Noah Misch wrote:
> > > Pushed.
> >
> > cm@enterprisedb.com, could you post a stack trace from buildfarm member
> > gharial failing in "make check"?  I'm looking for a trace from a SIGSEGV like
> > those seen today:
> >
> >   https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2021-10-24%2016%3A19%3A05
> >   https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2021-10-24%2020%3A38%3A39
> >
> > Those SIGSEGV are all in ALTER TABLE.  My commits from this thread probably
> > broke something, but I've run out of ideas, and my hpux boxes died.
>
> cm@enterprisedb.com, is this still in your queue, or not?  I am somewhat
> concerned that the 2021-11-11 releases will contain a regression if the
> gharial failures remain undiagnosed.  I don't know how to diagnose them
> without your help.  (I currently can't justify reverting the suspect bug fix
> on the basis of two master-branch-only failures in one buildfarm member.)
>
> Sorry for responding late as this email was missed somehow :-(  I checked the reports on the dashboard and there was
afailure then but the reports after 25th Oct looks fine. The back branches all look fine as well.
https://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=gharial&br=HEAD
>
> Please let me know if I missed something

Hi Sandeep, thank you for the response!

Some failed runs on this animal indicate SegFault in places changed by bugfix relevant to this thread. Fails were
observedonly on HEAD, but this may be a result of concurrent nature of possible new bug. If there is a new bug - this
wouldaffect upcoming point release. But we do not know for sure. Other animals show no signs of any related SegFault. 
Can you please run make check phase on this animal on HEAD multiple time (preferably ~hundred times)? If some of this
runsfail it's vital to collect backtraces of run. 

Many thanks!

Best regards, Andrey Borodin.


Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Sandeep Thakkar
Date:


On Sat, Oct 30, 2021 at 12:56 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:


> 30 окт. 2021 г., в 11:09, Sandeep Thakkar <sandeep.thakkar@enterprisedb.com> написал(а):
>
> On Thu, Oct 28, 2021 at 7:35 PM Noah Misch <noah@leadboat.com> wrote:
> On Sun, Oct 24, 2021 at 04:35:02PM -0700, Noah Misch wrote:
> > On Sat, Oct 23, 2021 at 10:00:28PM -0700, Noah Misch wrote:
> > > Pushed.
> >
> > cm@enterprisedb.com, could you post a stack trace from buildfarm member
> > gharial failing in "make check"?  I'm looking for a trace from a SIGSEGV like
> > those seen today:
> >
> >   https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2021-10-24%2016%3A19%3A05
> >   https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2021-10-24%2020%3A38%3A39
> >
> > Those SIGSEGV are all in ALTER TABLE.  My commits from this thread probably
> > broke something, but I've run out of ideas, and my hpux boxes died.
>
> cm@enterprisedb.com, is this still in your queue, or not?  I am somewhat
> concerned that the 2021-11-11 releases will contain a regression if the
> gharial failures remain undiagnosed.  I don't know how to diagnose them
> without your help.  (I currently can't justify reverting the suspect bug fix
> on the basis of two master-branch-only failures in one buildfarm member.)
>
> Sorry for responding late as this email was missed somehow :-(  I checked the reports on the dashboard and there was a failure then but the reports after 25th Oct looks fine. The back branches all look fine as well. https://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=gharial&br=HEAD
>
> Please let me know if I missed something

Hi Sandeep, thank you for the response!

Some failed runs on this animal indicate SegFault in places changed by bugfix relevant to this thread. Fails were observed only on HEAD, but this may be a result of concurrent nature of possible new bug. If there is a new bug - this would affect upcoming point release. But we do not know for sure. Other animals show no signs of any related SegFault.
Can you please run make check phase on this animal on HEAD multiple time (preferably ~hundred times)? If some of this runs fail it's vital to collect backtraces of run.

Hi Andrew,

OK I got it now. I've scheduled "./run_build.pl HEAD" on this animal to run multiple times in a day. Unfortunately, I can't run it ~100 times because it's a legacy (slow) server and also we run another animal (anole - with a different compiler) on the same server. Depending on the time every run on HEAD takes, I'll increase the frequency. Hope this helps.

 
Many thanks!

Best regards, Andrey Borodin.


--
Sandeep Thakkar


Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Sandeep Thakkar
Date:


On Sat, Oct 30, 2021 at 1:29 PM Sandeep Thakkar <sandeep.thakkar@enterprisedb.com> wrote:


On Sat, Oct 30, 2021 at 12:56 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:


> 30 окт. 2021 г., в 11:09, Sandeep Thakkar <sandeep.thakkar@enterprisedb.com> написал(а):
>
> On Thu, Oct 28, 2021 at 7:35 PM Noah Misch <noah@leadboat.com> wrote:
> On Sun, Oct 24, 2021 at 04:35:02PM -0700, Noah Misch wrote:
> > On Sat, Oct 23, 2021 at 10:00:28PM -0700, Noah Misch wrote:
> > > Pushed.
> >
> > cm@enterprisedb.com, could you post a stack trace from buildfarm member
> > gharial failing in "make check"?  I'm looking for a trace from a SIGSEGV like
> > those seen today:
> >
> >   https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2021-10-24%2016%3A19%3A05
> >   https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2021-10-24%2020%3A38%3A39
> >
> > Those SIGSEGV are all in ALTER TABLE.  My commits from this thread probably
> > broke something, but I've run out of ideas, and my hpux boxes died.
>
> cm@enterprisedb.com, is this still in your queue, or not?  I am somewhat
> concerned that the 2021-11-11 releases will contain a regression if the
> gharial failures remain undiagnosed.  I don't know how to diagnose them
> without your help.  (I currently can't justify reverting the suspect bug fix
> on the basis of two master-branch-only failures in one buildfarm member.)
>
> Sorry for responding late as this email was missed somehow :-(  I checked the reports on the dashboard and there was a failure then but the reports after 25th Oct looks fine. The back branches all look fine as well. https://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=gharial&br=HEAD
>
> Please let me know if I missed something

Hi Sandeep, thank you for the response!

Some failed runs on this animal indicate SegFault in places changed by bugfix relevant to this thread. Fails were observed only on HEAD, but this may be a result of concurrent nature of possible new bug. If there is a new bug - this would affect upcoming point release. But we do not know for sure. Other animals show no signs of any related SegFault.
Can you please run make check phase on this animal on HEAD multiple time (preferably ~hundred times)? If some of this runs fail it's vital to collect backtraces of run.

Hi Andrew,

OK I got it now. I've scheduled "./run_build.pl HEAD" on this animal to run multiple times in a day. Unfortunately, I can't run it ~100 times because it's a legacy (slow) server and also we run another animal (anole - with a different compiler) on the same server. Depending on the time every run on HEAD takes, I'll increase the frequency. Hope this helps.

I've used "--force" option so that it ignores the last running status. 
./run_build.pl --verbose --force HEAD
 
 
Many thanks!

Best regards, Andrey Borodin.


--
Sandeep Thakkar




--
Sandeep Thakkar


Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Noah Misch
Date:
On Sat, Oct 30, 2021 at 01:44:56PM +0530, Sandeep Thakkar wrote:
> On Sat, Oct 30, 2021 at 1:29 PM Sandeep Thakkar <sandeep.thakkar@enterprisedb.com> wrote:
> > On Sat, Oct 30, 2021 at 12:56 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
> >> Some failed runs on this animal indicate SegFault in places changed by
> >> bugfix relevant to this thread. Fails were observed only on HEAD, but this
> >> may be a result of concurrent nature of possible new bug. If there is a new
> >> bug - this would affect upcoming point release. But we do not know for
> >> sure. Other animals show no signs of any related SegFault.
> >> Can you please run make check phase on this animal on HEAD multiple time
> >> (preferably ~hundred times)? If some of this runs fail it's vital to
> >> collect backtraces of run.
> >
> > OK I got it now. I've scheduled "./run_build.pl HEAD" on this animal to
> > run multiple times in a day. Unfortunately, I can't run it ~100 times
> > because it's a legacy (slow) server and also we run another animal (anole -
> > with a different compiler) on the same server. Depending on the time every
> > run on HEAD takes, I'll increase the frequency. Hope this helps.
>
> I've used "--force" option so that it ignores the last running status.
> ./run_build.pl --verbose --force HEAD

Thanks, but I don't think that gets us closer to having a stack trace for the
SIGSEGV.  If some run gets a SIGSEGV, the core dump will get unlinked.  I'd
disable both buildfarm member cron jobs and then run this script (derived from
gharial's buildfarm configuration):

git clone https://git.postgresql.org/git/postgresql.git
cd postgresql
git checkout 70bef49
export
LD_LIBRARY_PATH='/opt/uuid-1.6.2/inst/lib:/opt/packages/uuid-1.6.2/inst/lib:/opt/packages/krb5-1.11.3/inst/lib/hpux64:/opt/packages/libmemcached-0.46/inst/lib/hpux64:/opt/packages/libevent-2.0.10/inst/lib/hpux64:/opt/packages/expat-2.0.1/inst/lib/hpux64:/opt/packages/gdbm-1.8.3/inst/lib/hpux64:/opt/packages/openldap-2.4.24/inst/lib/hpux64:/opt/packages/proj-4.7.0/inst/lib:/opt/packages/geos-3.2.2/inst/lib:/opt/packages/db-5.1.19/inst/lib/hpux64:/opt/packages/freetype-2.4.4/inst/lib/hpux64:/opt/packages/tcltk-8.5.9/inst/lib/hpux64:/opt/packages/openssl-1.0.0d/inst/lib/hpux64:/opt/packages/editline-2.9/inst/lib/hpux64:/opt/packages/mutt-1.5.21/inst/lib/hpux64:/opt/packages/libidn-1.20/inst/lib/hpux64:/opt/packages/libxslt-1.1.26/inst/lib/hpux64:/opt/packages/libgcrypt-1.4.6/inst/lib/hpux64:/opt/packages/libgpg_error-1.10/inst/lib/hpux64:/opt/packages/libxml2-2.7.8/inst/lib/hpux64:/opt/packages/zlib-1.2.5/inst/lib/hpux64:/opt/packages/grep-2.7/inst/lib/hpux64:/opt/packages/pcre-8.12/inst/lib/hpux64:/opt/packages/ncurses-5.8/inst/lib/hpux64:/opt/packages/termcap-1.3.1/inst/lib/hpux64:/opt/packages/gettext-0.18.1.1/inst/lib/hpux64:/opt/packages/libiconv-1.13.1/inst/lib/hpux64:/opt/packages/sdk-10.2.0.5.0-hpux-ia64/instantclient_10_2/lib'
cat >temp.conf <<EOF
log_line_prefix = '%m [%p:%l] %q%a '
log_connections = 'true'
log_disconnections = 'true'
log_statement = 'all'
fsync = off
force_parallel_mode = regress
EOF
./configure --enable-cassert --enable-debug --with-perl --without-readline \
        --with-libxml --with-libxslt
--with-libs=/opt/packages/zlib-1.2.5/inst/lib/hpux64:/opt/packages/libxslt-1.1.26/inst/lib/hpux64:/opt/packages/libxml2-2.7.8/inst/lib/hpux64
\

--with-includes=/opt/packages/zlib-1.2.5/inst/include:/opt/packages/libxslt-1.1.26/inst/include:/opt/packages/libxml2-2.7.8/inst/include
\
        --with-pgport=5678 CFLAGS=-mlp64 CC=gcc
make
while make check TEMP_CONFIG=$PWD/temp.conf NO_LOCALE=1; do :; done
# When the loop stops, use the core file to make a stack trace.  If it runs
# for a week without stopping, give up.



Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Sandeep Thakkar
Date:
Hi Noah,

On Sun, Oct 31, 2021 at 12:49 AM Noah Misch <noah@leadboat.com> wrote:
On Sat, Oct 30, 2021 at 01:44:56PM +0530, Sandeep Thakkar wrote:
> On Sat, Oct 30, 2021 at 1:29 PM Sandeep Thakkar <sandeep.thakkar@enterprisedb.com> wrote:
> > On Sat, Oct 30, 2021 at 12:56 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
> >> Some failed runs on this animal indicate SegFault in places changed by
> >> bugfix relevant to this thread. Fails were observed only on HEAD, but this
> >> may be a result of concurrent nature of possible new bug. If there is a new
> >> bug - this would affect upcoming point release. But we do not know for
> >> sure. Other animals show no signs of any related SegFault.
> >> Can you please run make check phase on this animal on HEAD multiple time
> >> (preferably ~hundred times)? If some of this runs fail it's vital to
> >> collect backtraces of run.
> >
> > OK I got it now. I've scheduled "./run_build.pl HEAD" on this animal to
> > run multiple times in a day. Unfortunately, I can't run it ~100 times
> > because it's a legacy (slow) server and also we run another animal (anole -
> > with a different compiler) on the same server. Depending on the time every
> > run on HEAD takes, I'll increase the frequency. Hope this helps.
>
> I've used "--force" option so that it ignores the last running status.
> ./run_build.pl --verbose --force HEAD

Thanks, but I don't think that gets us closer to having a stack trace for the
SIGSEGV.  If some run gets a SIGSEGV, the core dump will get unlinked.  I'd
disable both buildfarm member cron jobs and then run this script (derived from
gharial's buildfarm configuration):

git clone https://git.postgresql.org/git/postgresql.git
cd postgresql
git checkout 70bef49
export LD_LIBRARY_PATH='/opt/uuid-1.6.2/inst/lib:/opt/packages/uuid-1.6.2/inst/lib:/opt/packages/krb5-1.11.3/inst/lib/hpux64:/opt/packages/libmemcached-0.46/inst/lib/hpux64:/opt/packages/libevent-2.0.10/inst/lib/hpux64:/opt/packages/expat-2.0.1/inst/lib/hpux64:/opt/packages/gdbm-1.8.3/inst/lib/hpux64:/opt/packages/openldap-2.4.24/inst/lib/hpux64:/opt/packages/proj-4.7.0/inst/lib:/opt/packages/geos-3.2.2/inst/lib:/opt/packages/db-5.1.19/inst/lib/hpux64:/opt/packages/freetype-2.4.4/inst/lib/hpux64:/opt/packages/tcltk-8.5.9/inst/lib/hpux64:/opt/packages/openssl-1.0.0d/inst/lib/hpux64:/opt/packages/editline-2.9/inst/lib/hpux64:/opt/packages/mutt-1.5.21/inst/lib/hpux64:/opt/packages/libidn-1.20/inst/lib/hpux64:/opt/packages/libxslt-1.1.26/inst/lib/hpux64:/opt/packages/libgcrypt-1.4.6/inst/lib/hpux64:/opt/packages/libgpg_error-1.10/inst/lib/hpux64:/opt/packages/libxml2-2.7.8/inst/lib/hpux64:/opt/packages/zlib-1.2.5/inst/lib/hpux64:/opt/packages/grep-2.7/inst/lib/hpux64:/opt/packages/pcre-8.12/inst/lib/hpux64:/opt/packages/ncurses-5.8/inst/lib/hpux64:/opt/packages/termcap-1.3.1/inst/lib/hpux64:/opt/packages/gettext-0.18.1.1/inst/lib/hpux64:/opt/packages/libiconv-1.13.1/inst/lib/hpux64:/opt/packages/sdk-10.2.0.5.0-hpux-ia64/instantclient_10_2/lib'
cat >temp.conf <<EOF
log_line_prefix = '%m [%p:%l] %q%a '
log_connections = 'true'
log_disconnections = 'true'
log_statement = 'all'
fsync = off
force_parallel_mode = regress
EOF
./configure --enable-cassert --enable-debug --with-perl --without-readline \
        --with-libxml --with-libxslt --with-libs=/opt/packages/zlib-1.2.5/inst/lib/hpux64:/opt/packages/libxslt-1.1.26/inst/lib/hpux64:/opt/packages/libxml2-2.7.8/inst/lib/hpux64 \
        --with-includes=/opt/packages/zlib-1.2.5/inst/include:/opt/packages/libxslt-1.1.26/inst/include:/opt/packages/libxml2-2.7.8/inst/include \
        --with-pgport=5678 CFLAGS=-mlp64 CC=gcc
make
while make check TEMP_CONFIG=$PWD/temp.conf NO_LOCALE=1; do :; done
# When the loop stops, use the core file to make a stack trace.  If it runs
# for a week without stopping, give up.

Yes, that makes sense. I did what you suggested (except that I used git:// instead of https:// during cloning as the latter returns with an error; I'll fix it later) and in the first run the core files got generated:
bash-4.1$ ls -l ./postgresql/src/test/regress/tmp_check/data/core.postgres.*  
-rw------- 1 pgbfarm users 6672384 Oct 30 22:04 ./postgresql/src/test/regress/tmp_check/data/core.postgres.1267
-rw------- 1 pgbfarm users 3088384 Oct 30 22:09 ./postgresql/src/test/regress/tmp_check/data/core.postgres.5422

Here is the stack trace:

bash-4.1$ gdb ./postgresql/tmp_install/usr/local/pgsql/bin/postgres ./postgresql/src/test/regress/tmp_check/data/core.postgres.5422

HP gdb 6.1 for HP Itanium (32 or 64 bit) and target HP-UX 11iv2 and 11iv3.

Copyright 1986 - 2009 Free Software Foundation, Inc.

Hewlett-Packard Wildebeest 6.1 (based on GDB) is covered by the

GNU General Public License. Type "show copying" to see the conditions to

change it and/or distribute copies. Type "show warranty" for warranty/support.

..

Core was generated by `postgres'.

Program terminated with signal 11, Segmentation fault.

SEGV_MAPERR - Address not mapped to object

#0  0x3fffffffff3fdbf0 in <unknown_procedure> ()

(gdb) bt

#0  0x3fffffffff3fdbf0 in <unknown_procedure> ()

warning: Attempting to unwind past bad PC 0x3fffffffff3fdbf0 

#1  0x40000000003fdc00:0 in equalTupleDescs (tupdesc1=0x60000000001f65e0, 

    tupdesc2=0x60000000001fba08)

#2  0x40000000017f9660:0 in RelationClearRelation (

    relation=0x60000000001f3730, rebuild=true)

#3  0x40000000017fa730:0 in RelationFlushRelation (relation=0x60000000001f3730)

#4  0x40000000017fabb0:0 in RelationCacheInvalidateEntry (relationId=27272)

#5  0x40000000017c8f20:0 in LocalExecuteInvalidationMessage (

    msg=0x60000000001a46b8)

#6  0x40000000017c84e0:0 in ProcessInvalidationMessages (

    group=0x60000000001a43e4, func=0x87ffffffef7b7250)

#7  0x40000000017cb420:0 in CommandEndInvalidationMessages ()

#8  0x40000000006b1c50:0 in AtCCI_LocalCache ()

#9  0x40000000006b0910:0 in CommandCounterIncrement ()

#10 0x4000000000807130:0 in create_toast_table (rel=0x60000000001f3730, 

    toastOid=0, toastIndexOid=0, reloptions=0, lockmode=8, check=true, 

    OIDOldToast=0)

#11 0x4000000000805ac0:0 in CheckAndCreateToastTable (relOid=27272, 

    reloptions=0, lockmode=8, check=true, OIDOldToast=0) at toasting.c:88

#12 0x4000000000805850:0 in AlterTableCreateToastTable (relOid=27272, 

    reloptions=0, lockmode=8) at toasting.c:62

#13 0x4000000000aa9a30:0 in ATRewriteCatalogs (wqueue=0x87ffffffffffc3b0, 

    lockmode=8, context=0x87ffffffffffc590)


 Since the results are not posted to the dashboard, I've attached the script log, regression.diffs and temp.conf for reference.

--
Sandeep Thakkar


Attachment

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Andrey Borodin
Date:

> 31 окт. 2021 г., в 12:31, Sandeep Thakkar <sandeep.thakkar@enterprisedb.com> написал(а):
>
> Here is the stack trace:
Thanks Sandeep!

So far I didn't come up with a clear understanding how this might happen.

The only idea I have:
1. It seems equalTupleDescs() got two valid pointers, probably with broken data.
2. Maybe relation->rd_rel (alloceted just before relation->rd_att) was used incorrectly?
3. This could happen if CLASS_TUPLE_SIZE is calculated wrong. Don't we need to MAXALIGN everything due to added
sizeof(relminmxid)?
#define CLASS_TUPLE_SIZE \
     (offsetof(FormData_pg_class,relminmxid) + sizeof(TransactionId))

The idea doesn't seem very strong but that's only 2 cents I have. Noah, what do you think?

Best regards, Andrey Borodin.


Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Andres Freund
Date:
Hi,

On 2021-10-31 13:01:00 +0530, Sandeep Thakkar wrote:
> > #1  0x40000000003fdc00:0 in equalTupleDescs (tupdesc1=0x60000000001f65e0,
> >
> >     tupdesc2=0x60000000001fba08)

Could you print out *tupdesc1, *tupdesc2? And best also
p tupdesc1->attrs[0]
p tupdesc1->attrs[1]
p tupdesc1->attrs[2]
p tupdesc2->attrs[0]
p tupdesc2->attrs[1]
p tupdesc2->attrs[2]


> > #2  0x40000000017f9660:0 in RelationClearRelation (
> >
> >     relation=0x60000000001f3730, rebuild=true)

Hm, too bad that we don't have working line numbers for some reason?


> > #3  0x40000000017fa730:0 in RelationFlushRelation
> > (relation=0x60000000001f3730)
> >
> > #4  0x40000000017fabb0:0 in RelationCacheInvalidateEntry
> > (relationId=27272)
> >
> > #5  0x40000000017c8f20:0 in LocalExecuteInvalidationMessage (
> >
> >     msg=0x60000000001a46b8)
> >
> > #6  0x40000000017c84e0:0 in ProcessInvalidationMessages (
> >
> >     group=0x60000000001a43e4, func=0x87ffffffef7b7250)
> >
> > #7  0x40000000017cb420:0 in CommandEndInvalidationMessages ()
> >
> > #8  0x40000000006b1c50:0 in AtCCI_LocalCache ()
> >
> > #9  0x40000000006b0910:0 in CommandCounterIncrement ()
> >
> > #10 0x4000000000807130:0 in create_toast_table (rel=0x60000000001f3730,
> >
> >     toastOid=0, toastIndexOid=0, reloptions=0, lockmode=8, check=true,
> >
> >     OIDOldToast=0)
> >
> > #11 0x4000000000805ac0:0 in CheckAndCreateToastTable (relOid=27272,
> >
> >     reloptions=0, lockmode=8, check=true, OIDOldToast=0) at toasting.c:88
> >
> > #12 0x4000000000805850:0 in AlterTableCreateToastTable (relOid=27272,
> >
> >     reloptions=0, lockmode=8) at toasting.c:62
> >
> > #13 0x4000000000aa9a30:0 in ATRewriteCatalogs (wqueue=0x87ffffffffffc3b0,
> >
> >     lockmode=8, context=0x87ffffffffffc590)


This crash could suggest that somehow the catalogs were corrupted and
that we're not reading back valid tupledescs from them. Hopefully we'll
know more after looking at the tupledescs.


Greetings,

Andres Freund



Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Sandeep Thakkar
Date:


On Mon, Nov 1, 2021 at 12:48 AM Andres Freund <andres@anarazel.de> wrote:
Hi,

On 2021-10-31 13:01:00 +0530, Sandeep Thakkar wrote:
> > #1  0x40000000003fdc00:0 in equalTupleDescs (tupdesc1=0x60000000001f65e0,
> >
> >     tupdesc2=0x60000000001fba08)

Could you print out *tupdesc1, *tupdesc2? And best also
p tupdesc1->attrs[0]
p tupdesc1->attrs[1]
p tupdesc1->attrs[2]
p tupdesc2->attrs[0]
p tupdesc2->attrs[1]
p tupdesc2->attrs[2]

you mean make the changes in the .c files to print these values and rerun the build? Can you please share the files where this needs to be done or give me a patch?

> > #2  0x40000000017f9660:0 in RelationClearRelation (
> >
> >     relation=0x60000000001f3730, rebuild=true)

Hm, too bad that we don't have working line numbers for some reason?


> > #3  0x40000000017fa730:0 in RelationFlushRelation
> > (relation=0x60000000001f3730)
> >
> > #4  0x40000000017fabb0:0 in RelationCacheInvalidateEntry
> > (relationId=27272)
> >
> > #5  0x40000000017c8f20:0 in LocalExecuteInvalidationMessage (
> >
> >     msg=0x60000000001a46b8)
> >
> > #6  0x40000000017c84e0:0 in ProcessInvalidationMessages (
> >
> >     group=0x60000000001a43e4, func=0x87ffffffef7b7250)
> >
> > #7  0x40000000017cb420:0 in CommandEndInvalidationMessages ()
> >
> > #8  0x40000000006b1c50:0 in AtCCI_LocalCache ()
> >
> > #9  0x40000000006b0910:0 in CommandCounterIncrement ()
> >
> > #10 0x4000000000807130:0 in create_toast_table (rel=0x60000000001f3730,
> >
> >     toastOid=0, toastIndexOid=0, reloptions=0, lockmode=8, check=true,
> >
> >     OIDOldToast=0)
> >
> > #11 0x4000000000805ac0:0 in CheckAndCreateToastTable (relOid=27272,
> >
> >     reloptions=0, lockmode=8, check=true, OIDOldToast=0) at toasting.c:88
> >
> > #12 0x4000000000805850:0 in AlterTableCreateToastTable (relOid=27272,
> >
> >     reloptions=0, lockmode=8) at toasting.c:62
> >
> > #13 0x4000000000aa9a30:0 in ATRewriteCatalogs (wqueue=0x87ffffffffffc3b0,
> >
> >     lockmode=8, context=0x87ffffffffffc590)


This crash could suggest that somehow the catalogs were corrupted and
that we're not reading back valid tupledescs from them. Hopefully we'll
know more after looking at the tupledescs.


Greetings,

Andres Freund


--
Sandeep Thakkar


Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Andrey Borodin
Date:

> 1 нояб. 2021 г., в 18:10, Sandeep Thakkar <sandeep.thakkar@enterprisedb.com> написал(а):
>
>
>
> On Mon, Nov 1, 2021 at 12:48 AM Andres Freund <andres@anarazel.de> wrote:
> Hi,
>
> On 2021-10-31 13:01:00 +0530, Sandeep Thakkar wrote:
> > > #1  0x40000000003fdc00:0 in equalTupleDescs (tupdesc1=0x60000000001f65e0,
> > >
> > >     tupdesc2=0x60000000001fba08)
>
> Could you print out *tupdesc1, *tupdesc2? And best also
> p tupdesc1->attrs[0]
> p tupdesc1->attrs[1]
> p tupdesc1->attrs[2]
> p tupdesc2->attrs[0]
> p tupdesc2->attrs[1]
> p tupdesc2->attrs[2]
>
> you mean make the changes in the .c files to print these values and rerun the build? Can you please share the files
wherethis needs to be done or give me a patch? 

You can paste these commands into gdb. Just as you did with bt. It will print values.

Thank you!

Best regards, Andrey Borodin.


Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Sandeep Thakkar
Date:


On Mon, Nov 1, 2021 at 6:48 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:


> 1 нояб. 2021 г., в 18:10, Sandeep Thakkar <sandeep.thakkar@enterprisedb.com> написал(а):
>
>
>
> On Mon, Nov 1, 2021 at 12:48 AM Andres Freund <andres@anarazel.de> wrote:
> Hi,
>
> On 2021-10-31 13:01:00 +0530, Sandeep Thakkar wrote:
> > > #1  0x40000000003fdc00:0 in equalTupleDescs (tupdesc1=0x60000000001f65e0,
> > >
> > >     tupdesc2=0x60000000001fba08)
>
> Could you print out *tupdesc1, *tupdesc2? And best also
> p tupdesc1->attrs[0]
> p tupdesc1->attrs[1]
> p tupdesc1->attrs[2]
> p tupdesc2->attrs[0]
> p tupdesc2->attrs[1]
> p tupdesc2->attrs[2]
>
> you mean make the changes in the .c files to print these values and rerun the build? Can you please share the files where this needs to be done or give me a patch?

You can paste these commands into gdb. Just as you did with bt. It will print values.

Here you go (and with full bt):

(gdb) bt                  

#0  0x3fffffffff3fdbf0 in <unknown_procedure> ()

#1  0x40000000003fdc00:0 in equalTupleDescs (tupdesc1=0x60000000001f65e0, 

    tupdesc2=0x60000000001fba08)

#2  0x40000000017f9660:0 in RelationClearRelation (

    relation=0x60000000001f3730, rebuild=true)

#3  0x40000000017fa730:0 in RelationFlushRelation (relation=0x60000000001f3730)

#4  0x40000000017fabb0:0 in RelationCacheInvalidateEntry (relationId=27272)

#5  0x40000000017c8f20:0 in LocalExecuteInvalidationMessage (

    msg=0x60000000001a46b8)

#6  0x40000000017c84e0:0 in ProcessInvalidationMessages (

    group=0x60000000001a43e4, func=0x87ffffffef7b7250)

#7  0x40000000017cb420:0 in CommandEndInvalidationMessages ()

#8  0x40000000006b1c50:0 in AtCCI_LocalCache ()

#9  0x40000000006b0910:0 in CommandCounterIncrement ()

#10 0x4000000000807130:0 in create_toast_table (rel=0x60000000001f3730, 

    toastOid=0, toastIndexOid=0, reloptions=0, lockmode=8, check=true, 

    OIDOldToast=0)

#11 0x4000000000805ac0:0 in CheckAndCreateToastTable (relOid=27272, 

    reloptions=0, lockmode=8, check=true, OIDOldToast=0) at toasting.c:88

#12 0x4000000000805850:0 in AlterTableCreateToastTable (relOid=27272, 

    reloptions=0, lockmode=8) at toasting.c:62

#13 0x4000000000aa9a30:0 in ATRewriteCatalogs (wqueue=0x87ffffffffffc3b0, 

    lockmode=8, context=0x87ffffffffffc590)

---Type <return> to continue, or q <return> to quit---

#14 0x4000000000aa6fc0:0 in ATController (parsetree=0x60000000000e04e0, 

    rel=0x60000000001f3730, cmds=0x60000000000e0488, recurse=true, lockmode=8, 

    context=0x87ffffffffffc590)

#15 0x4000000000aa6410:0 in AlterTable (stmt=0x60000000000e04e0, lockmode=8, 

    context=0x87ffffffffffc590)

#16 0x40000000012f9a50:0 in ProcessUtilitySlow (pstate=0x6000000000117210, 

    pstmt=0x60000000000e0830, 

    queryString=0x60000000000df7f0 "ALTER TABLE attmp ADD COLUMN c text;", 

    context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, 

    dest=0x60000000000e0920, qc=0x87ffffffffffd838)

#17 0x40000000012f8890:0 in standard_ProcessUtility (pstmt=0x60000000000e0830, 

    queryString=0x60000000000df7f0 "ALTER TABLE attmp ADD COLUMN c text;", 

    readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL, params=0x0, 

    queryEnv=0x0, dest=0x60000000000e0920, qc=0x87ffffffffffd838)

#18 0x40000000012f6150:0 in ProcessUtility (pstmt=0x60000000000e0830, 

    queryString=0x60000000000df7f0 "ALTER TABLE attmp ADD COLUMN c text;", 

    readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL, params=0x0, 

    queryEnv=0x0, dest=0x60000000000e0920, qc=0x87ffffffffffd838)

#19 0x40000000012f24c0:0 in PortalRunUtility (portal=0x600000000015baf0, 

    pstmt=0x60000000000e0830, isTopLevel=true, setHoldSnapshot=false, 

    dest=0x60000000000e0920, qc=0x87ffffffffffd838)

#20 0x40000000012f2c90:0 in PortalRunMulti (portal=0x600000000015baf0, 

    isTopLevel=true, setHoldSnapshot=false, dest=0x60000000000e0920, 

---Type <return> to continue, or q <return> to quit---

    altdest=0x60000000000e0920, qc=0x87ffffffffffd838)

#21 0x40000000012f08d0:0 in PortalRun (portal=0x600000000015baf0, 

    count=9223372036854775807, isTopLevel=true, run_once=true, 

    dest=0x60000000000e0920, altdest=0x60000000000e0920, qc=0x87ffffffffffd838)

#22 0x40000000012dcf70:0 in exec_simple_query (

    query_string=0x60000000000df7f0 "ALTER TABLE attmp ADD COLUMN c text;")

#23 0x40000000012eaaf0:0 in PostgresMain (

    dbname=0x60000000000aaa78 "regression", 

    username=0x60000000000a8ff8 "pgbfarm")

#24 0x40000000010751b0:0 in BackendRun (port=0x60000000001168e0)

#25 0x4000000001074040:0 in BackendStartup (port=0x60000000001168e0)

#26 0x4000000001068e60:0 in ServerLoop ()

#27 0x40000000010679b0:0 in PostmasterMain (argc=8, argv=0x87ffffffffffe610)

#28 0x4000000000d45660:0 in main (argc=8, argv=0x87ffffffffffe610)

    at main.c:146

(gdb) p tupdesc1->attrs[1]

No symbol "tupdesc1" in current context.

(gdb) p tupdesc1->attrs[2]

No symbol "tupdesc1" in current context.

(gdb) p tupdesc2->attrs[0]

No symbol "tupdesc2" in current context.

(gdb) p tupdesc2->attrs[1]

No symbol "tupdesc2" in current context.

(gdb)
 
 
Thank you!

Best regards, Andrey Borodin.


--
Sandeep Thakkar


Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Robert Haas
Date:
On Mon, Nov 1, 2021 at 9:33 AM Sandeep Thakkar
<sandeep.thakkar@enterprisedb.com> wrote:
> (gdb) p tupdesc1->attrs[1]
> No symbol "tupdesc1" in current context.
> (gdb) p tupdesc1->attrs[2]
> No symbol "tupdesc1" in current context.
> (gdb) p tupdesc2->attrs[0]
> No symbol "tupdesc2" in current context.
> (gdb) p tupdesc2->attrs[1]
> No symbol "tupdesc2" in current context.
> (gdb)

I think you need to select stack frame 1 before running these
commands. I believe just running "frame 1" before you run these print
commands should do the trick.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Sandeep Thakkar
Date:


On Mon, Nov 1, 2021 at 7:47 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Nov 1, 2021 at 9:33 AM Sandeep Thakkar
<sandeep.thakkar@enterprisedb.com> wrote:
> (gdb) p tupdesc1->attrs[1]
> No symbol "tupdesc1" in current context.
> (gdb) p tupdesc1->attrs[2]
> No symbol "tupdesc1" in current context.
> (gdb) p tupdesc2->attrs[0]
> No symbol "tupdesc2" in current context.
> (gdb) p tupdesc2->attrs[1]
> No symbol "tupdesc2" in current context.
> (gdb)

I think you need to select stack frame 1 before running these
commands. I believe just running "frame 1" before you run these print
commands should do the trick.

Thanks Robert, that worked. Here is the output:

(gdb) frame 1

#1  0x40000000003fdc00:0 in equalTupleDescs (tupdesc1=0x60000000001f65e0, 

    tupdesc2=0x60000000001fba08)

(gdb) p tupdesc1->attrs[0]

$1 = {attrelid = 27272, attname = {

    data = "initial", '\000' <repeats 56 times>}, atttypid = 23, 

  attstattarget = -1, attlen = 4, attnum = 1, attndims = 0, attcacheoff = 0, 

  atttypmod = -1, attbyval = true, attalign = 105 'i', attstorage = 112 'p', 

  attcompression = 0 '\000', attnotnull = false, atthasdef = false, 

  atthasmissing = false, attidentity = 0 '\000', attgenerated = 0 '\000', 

  attisdropped = false, attislocal = true, attinhcount = 0, attcollation = 0}

(gdb) p tupdesc1->attrs[1]

$2 = {attrelid = 27272, attname = {data = "a", '\000' <repeats 62 times>}, 

  atttypid = 23, attstattarget = -1, attlen = 4, attnum = 2, attndims = 0, 

  attcacheoff = -1, atttypmod = -1, attbyval = true, attalign = 105 'i', 

  attstorage = 112 'p', attcompression = 0 '\000', attnotnull = false, 

  atthasdef = true, atthasmissing = true, attidentity = 0 '\000', 

  attgenerated = 0 '\000', attisdropped = false, attislocal = true, 

  attinhcount = 0, attcollation = 0}

(gdb) p tupdesc1->attrs[2]

$3 = {attrelid = 27272, attname = {data = "b", '\000' <repeats 62 times>}, 

  atttypid = 19, attstattarget = -1, attlen = 64, attnum = 3, attndims = 0, 

  attcacheoff = -1, atttypmod = -1, attbyval = false, attalign = 99 'c', 

  attstorage = 112 'p', attcompression = 0 '\000', attnotnull = false, 

  atthasdef = false, atthasmissing = false, attidentity = 0 '\000', 

  attgenerated = 0 '\000', attisdropped = false, attislocal = true, 

  attinhcount = 0, attcollation = 950}

(gdb) p tupdesc2->attrs[0]

$4 = {attrelid = 27272, attname = {

    data = "initial", '\000' <repeats 56 times>}, atttypid = 23, 

  attstattarget = -1, attlen = 4, attnum = 1, attndims = 0, attcacheoff = 0, 

  atttypmod = -1, attbyval = true, attalign = 105 'i', attstorage = 112 'p', 

  attcompression = 0 '\000', attnotnull = false, atthasdef = false, 

  atthasmissing = false, attidentity = 0 '\000', attgenerated = 0 '\000', 

  attisdropped = false, attislocal = true, attinhcount = 0, attcollation = 0}

(gdb) p tupdesc2->attrs[1]

$5 = {attrelid = 27272, attname = {data = "a", '\000' <repeats 62 times>}, 

  atttypid = 23, attstattarget = -1, attlen = 4, attnum = 2, attndims = 0, 

  attcacheoff = -1, atttypmod = -1, attbyval = true, attalign = 105 'i', 

  attstorage = 112 'p', attcompression = 0 '\000', attnotnull = false, 

  atthasdef = true, atthasmissing = true, attidentity = 0 '\000', 

  attgenerated = 0 '\000', attisdropped = false, attislocal = true, 

  attinhcount = 0, attcollation = 0}

(gdb) p tupdesc2->attrs[2]

$6 = {attrelid = 27272, attname = {data = "b", '\000' <repeats 62 times>}, 

  atttypid = 19, attstattarget = -1, attlen = 64, attnum = 3, attndims = 0, 

  attcacheoff = -1, atttypmod = -1, attbyval = false, attalign = 99 'c', 

  attstorage = 112 'p', attcompression = 0 '\000', attnotnull = false, 

  atthasdef = false, atthasmissing = false, attidentity = 0 '\000', 

  attgenerated = 0 '\000', attisdropped = false, attislocal = true, 

  attinhcount = 0, attcollation = 950}

(gdb)
 
 
--
Robert Haas
EDB: http://www.enterprisedb.com


--
Sandeep Thakkar


Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Noah Misch
Date:
On Tue, Nov 02, 2021 at 06:20:42AM +0530, Sandeep Thakkar wrote:
> (gdb) frame 1
> 
> #1  0x40000000003fdc00:0 in equalTupleDescs (tupdesc1=0x60000000001f65e0,
> 
>     tupdesc2=0x60000000001fba08)
> 
...
> (gdb) p tupdesc2->attrs[2]
> 
> $6 = {attrelid = 27272, attname = {data = "b", '\000' <repeats 62 times>},
>   atttypid = 19, attstattarget = -1, attlen = 64, attnum = 3, attndims = 0,
>   attcacheoff = -1, atttypmod = -1, attbyval = false, attalign = 99 'c',
>   attstorage = 112 'p', attcompression = 0 '\000', attnotnull = false,
>   atthasdef = false, atthasmissing = false, attidentity = 0 '\000',
>   attgenerated = 0 '\000', attisdropped = false, attislocal = true,
>   attinhcount = 0, attcollation = 950}

That looks healthy.  Since gdb isn't giving line numbers, let's single-step
from the start of the function and see if that is informative.  Please apply
the attached patch, which just adds a test file.  Then run "make -C
src/test/subscription check PROVE_TESTS=t/080_step_equalTupleDescs.pl" and
attach
src/test/subscription/tmp_check/log/regress_log_080_step_equalTupleDescs in a
reply to this email.

On Mon, Nov 01, 2021 at 12:01:08AM +0500, Andrey Borodin wrote:
> So far I didn't come up with a clear understanding how this might happen.

Agreed.

> The only idea I have:
> 1. It seems equalTupleDescs() got two valid pointers, probably with broken data.
> 2. Maybe relation->rd_rel (alloceted just before relation->rd_att) was used incorrectly?
> 3. This could happen if CLASS_TUPLE_SIZE is calculated wrong. Don't we need to MAXALIGN everything due to added
sizeof(relminmxid)?
> #define CLASS_TUPLE_SIZE \
>      (offsetof(FormData_pg_class,relminmxid) + sizeof(TransactionId))

See the comment at overread_tuplestruct_pg_cast for the reason why I think
that can't cause an actual malfunction.  Still, there's some possibility of
this being the explanation.

Attachment

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Semab Tariq
Date:
Hi Noah

Sandeep is on vacation. So I am looking into this

I am facing some issues while applying the patch

bash-4.1$ git apply step-to-crash-v0.patch
fatal: git apply: bad git-diff - expected /dev/null on line 4
bash-4.1$ patch -p1 < step-to-crash-v0.patch
Hmm... I can't seem to find a patch in there anywhere.

Then i decided to manually create that src/test/subscription/t/080_step_equalTupleDescs.pl file and placed the content there once file is created i run this command make -C src/test/subscription check PROVE_TESTS=t/080_step_equalTupleDescs.pl but it didn't generated any src/test/subscription/tmp_check/log/regress_log_080_step_equalTupleDescs file

Also, I get this at the end of the make command 
TAP tests not enabled. Try configuring with --enable-tap-tests

After enabling tap tests and executing the configure command again I get this error message

checking for fop... no
checking for dbtoepub... no
checking for perl module IPC::Run 0.79... no
checking for perl module Test::More 0.87... no
checking for perl module Time::HiRes 1.52... ok
configure: error: Additional Perl modules are required to run TAP tests

Seems like can't enable tap tests with current frameworks



On Wed, Nov 3, 2021 at 9:42 AM Noah Misch <noah@leadboat.com> wrote:
On Tue, Nov 02, 2021 at 06:20:42AM +0530, Sandeep Thakkar wrote:
> (gdb) frame 1
>
> #1  0x40000000003fdc00:0 in equalTupleDescs (tupdesc1=0x60000000001f65e0,
>
>     tupdesc2=0x60000000001fba08)
>
...
> (gdb) p tupdesc2->attrs[2]
>
> $6 = {attrelid = 27272, attname = {data = "b", '\000' <repeats 62 times>},
>   atttypid = 19, attstattarget = -1, attlen = 64, attnum = 3, attndims = 0,
>   attcacheoff = -1, atttypmod = -1, attbyval = false, attalign = 99 'c',
>   attstorage = 112 'p', attcompression = 0 '\000', attnotnull = false,
>   atthasdef = false, atthasmissing = false, attidentity = 0 '\000',
>   attgenerated = 0 '\000', attisdropped = false, attislocal = true,
>   attinhcount = 0, attcollation = 950}

That looks healthy.  Since gdb isn't giving line numbers, let's single-step
from the start of the function and see if that is informative.  Please apply
the attached patch, which just adds a test file.  Then run "make -C
src/test/subscription check PROVE_TESTS=t/080_step_equalTupleDescs.pl" and
attach
src/test/subscription/tmp_check/log/regress_log_080_step_equalTupleDescs in a
reply to this email.

On Mon, Nov 01, 2021 at 12:01:08AM +0500, Andrey Borodin wrote:
> So far I didn't come up with a clear understanding how this might happen.

Agreed.

> The only idea I have:
> 1. It seems equalTupleDescs() got two valid pointers, probably with broken data.
> 2. Maybe relation->rd_rel (alloceted just before relation->rd_att) was used incorrectly?
> 3. This could happen if CLASS_TUPLE_SIZE is calculated wrong. Don't we need to MAXALIGN everything due to added sizeof(relminmxid)?
> #define CLASS_TUPLE_SIZE \
>      (offsetof(FormData_pg_class,relminmxid) + sizeof(TransactionId))

See the comment at overread_tuplestruct_pg_cast for the reason why I think
that can't cause an actual malfunction.  Still, there's some possibility of
this being the explanation.


--
Thanks & Regards,
Semab

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Thomas Munro
Date:
On Wed, Nov 3, 2021 at 9:04 PM Semab Tariq <semab.tariq@enterprisedb.com> wrote:
> checking for fop... no
> checking for dbtoepub... no
> checking for perl module IPC::Run 0.79... no
> checking for perl module Test::More 0.87... no
> checking for perl module Time::HiRes 1.52... ok
> configure: error: Additional Perl modules are required to run TAP tests
>
> Seems like can't enable tap tests with current frameworks

You could always install that locally with CPAN, with something like:

export PERL5LIB=~/perl5/lib/perl5
cpan -T IPC::Run

Hmm, speaking of missing package management, GCC 4.6.0 is long dead
and missing many bug fixes, and in fact the whole 4.6.x line is
finished.

I remembered another phase of weird segmentation faults that was never
really explained, on that system:

https://www.postgresql.org/message-id/flat/CA%2BhUKGLukanJE9W8C%2B0n8iRsZDpbuhcWOxBMjGaUO-RNHhBGXw%40mail.gmail.com



Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Noah Misch
Date:
On Thu, Nov 04, 2021 at 01:50:16PM +1300, Thomas Munro wrote:
> On Wed, Nov 3, 2021 at 9:04 PM Semab Tariq <semab.tariq@enterprisedb.com> wrote:
> > checking for fop... no
> > checking for dbtoepub... no
> > checking for perl module IPC::Run 0.79... no
> > checking for perl module Test::More 0.87... no
> > checking for perl module Time::HiRes 1.52... ok
> > configure: error: Additional Perl modules are required to run TAP tests
> >
> > Seems like can't enable tap tests with current frameworks
> 
> You could always install that locally with CPAN, with something like:
> 
> export PERL5LIB=~/perl5/lib/perl5
> cpan -T IPC::Run

Here's something more self-contained:

mkdir $HOME/perl-for-tap
(cd $HOME/perl-for-tap &&
 wget http://backpan.perl.org/modules/by-authors/id/M/MS/MSCHWERN/Test-Simple-0.87_01.tar.gz &&
 gzip -d <Test-Simple-0.87_01.tar.gz | tar xf - &&
 wget https://cpan.metacpan.org/authors/id/R/RS/RSOD/IPC-Run-0.79.tar.gz &&
 gzip -d <IPC-Run-0.79.tar.gz | tar xf -)
export PERL5LIB=$HOME/perl-for-tap/Test-Simple-0.87_01/lib:$HOME/perl-for-tap/IPC-Run-0.79/lib
perl -MIPC::Run -MTest::More -e 'print $IPC::Run::VERSION, "\n", $Test::More::VERSION, "\n"'

Semab, if you run those commands and then rerun the "configure" that failed,
does that help?



Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Semab Tariq
Date:


On Fri, Nov 5, 2021 at 6:35 AM Noah Misch <noah@leadboat.com> wrote:
On Thu, Nov 04, 2021 at 01:50:16PM +1300, Thomas Munro wrote:
> On Wed, Nov 3, 2021 at 9:04 PM Semab Tariq <semab.tariq@enterprisedb.com> wrote:
> > checking for fop... no
> > checking for dbtoepub... no
> > checking for perl module IPC::Run 0.79... no
> > checking for perl module Test::More 0.87... no
> > checking for perl module Time::HiRes 1.52... ok
> > configure: error: Additional Perl modules are required to run TAP tests
> >
> > Seems like can't enable tap tests with current frameworks
>
> You could always install that locally with CPAN, with something like:
>
> export PERL5LIB=~/perl5/lib/perl5
> cpan -T IPC::Run

Here's something more self-contained:

mkdir $HOME/perl-for-tap
(cd $HOME/perl-for-tap &&
 wget http://backpan.perl.org/modules/by-authors/id/M/MS/MSCHWERN/Test-Simple-0.87_01.tar.gz &&
 gzip -d <Test-Simple-0.87_01.tar.gz | tar xf - &&
 wget https://cpan.metacpan.org/authors/id/R/RS/RSOD/IPC-Run-0.79.tar.gz &&
 gzip -d <IPC-Run-0.79.tar.gz | tar xf -)
export PERL5LIB=$HOME/perl-for-tap/Test-Simple-0.87_01/lib:$HOME/perl-for-tap/IPC-Run-0.79/lib
perl -MIPC::Run -MTest::More -e 'print $IPC::Run::VERSION, "\n", $Test::More::VERSION, "\n"'

Yes configure works with these commands thanks

After configure when i run make -C sarc/test/subscription check PROVE_TESTS=t/080_step_equalTupleDescs.pl it got stucked I am attaching regress_log_080_step_equalTupleDescs file and log file of make command
 



--
Thanks & Regards,
Semab
Attachment

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Noah Misch
Date:
On Fri, Nov 05, 2021 at 11:02:16AM +0500, Semab Tariq wrote:
> After configure when i run make -C sarc/test/subscription check
> PROVE_TESTS=t/080_step_equalTupleDescs.pl it got stucked I am attaching
> regress_log_080_step_equalTupleDescs file and log file of make command

Thanks.  Please replace the previous test case patch with this one, and run it
again.  (On your platform, gdb needs an explicit path to the executable.)

Attachment

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Semab Tariq
Date:


On Fri, Nov 5, 2021 at 2:03 PM Noah Misch <noah@leadboat.com> wrote:
On Fri, Nov 05, 2021 at 11:02:16AM +0500, Semab Tariq wrote:
> After configure when i run make -C sarc/test/subscription check
> PROVE_TESTS=t/080_step_equalTupleDescs.pl it got stucked I am attaching
> regress_log_080_step_equalTupleDescs file and log file of make command

Thanks.  Please replace the previous test case patch with this one, and run it
again. 

Did the same with the updated patch and this time make command exit with an error

Bailout called.  Further testing stopped:  command "pg_ctl -D /home/pgbfarm/buildroot-gharial-HEAD/postgresql/src/test/subscription/tmp_check/t_080_step_equalTupleDescs_main_data/pgdata -m immediate stop" exited with value 1 make: *** [check] Error 1 make: Leaving directory `/home/pgbfarm/buildroot-gharial-HEAD/postgresql/src/test/subscription'

PFA make command log and regress_log_080_step_equalTupleDescs files
 


--
Thanks & Regards,
Semab
Attachment

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Noah Misch
Date:
On Fri, Nov 05, 2021 at 03:21:15PM +0500, Semab Tariq wrote:
> Breakpoint 1, 0x40000000003fcb50:0 in equalTupleDescs (
>     tupdesc1=0x40010006f968, tupdesc2=0x87ffffffffffac50)

The addresses there are weird.  tupdesc1 is neither a stack address nor a heap
address; it may be in a program text section.  tupdesc2 is a stack address.
In the earlier stack trace from
https://postgr.es/m/CANFyU94Xa8a5+4sZ7PxOiDLq+yN89g6y-9nNk-eLEvX6YUXbXA@mail.gmail.com
both tupdesc1 and tupdesc2 were heap addresses.

>
/home/pgbfarm/buildroot-gharial-HEAD/postgresql/src/test/subscription/tmp_check/t_080_step_equalTupleDescs_main_data/commands-gdb:8:
Errorin sourced command file:
 
> Error accessing memory address 0x40010006f968: Bad address.

Thanks.  Please try the attached test version, which avoids exiting too early
like the last version did.

Attachment

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Noah Misch
Date:
On Fri, Nov 05, 2021 at 04:22:36AM -0700, Noah Misch wrote:
> On Fri, Nov 05, 2021 at 03:21:15PM +0500, Semab Tariq wrote:
> > Breakpoint 1, 0x40000000003fcb50:0 in equalTupleDescs (
> >     tupdesc1=0x40010006f968, tupdesc2=0x87ffffffffffac50)
> 
> The addresses there are weird.  tupdesc1 is neither a stack address nor a heap
> address; it may be in a program text section.  tupdesc2 is a stack address.
> In the earlier stack trace from
> https://postgr.es/m/CANFyU94Xa8a5+4sZ7PxOiDLq+yN89g6y-9nNk-eLEvX6YUXbXA@mail.gmail.com
> both tupdesc1 and tupdesc2 were heap addresses.
> 
> >
/home/pgbfarm/buildroot-gharial-HEAD/postgresql/src/test/subscription/tmp_check/t_080_step_equalTupleDescs_main_data/commands-gdb:8:
Errorin sourced command file:
 
> > Error accessing memory address 0x40010006f968: Bad address.
> 
> Thanks.  Please try the attached test version, which avoids exiting too early
> like the last version did.

If you haven't run that yet, please use this version instead.  It collects
more data.  The log will probably be too big to be proper for the mailing
list, so please compress it.

Attachment

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Semab Tariq
Date:


On Sun, Nov 7, 2021 at 4:08 AM Noah Misch <noah@leadboat.com> wrote:
On Fri, Nov 05, 2021 at 04:22:36AM -0700, Noah Misch wrote:
> On Fri, Nov 05, 2021 at 03:21:15PM +0500, Semab Tariq wrote:
> > Breakpoint 1, 0x40000000003fcb50:0 in equalTupleDescs (
> >     tupdesc1=0x40010006f968, tupdesc2=0x87ffffffffffac50)
>
> The addresses there are weird.  tupdesc1 is neither a stack address nor a heap
> address; it may be in a program text section.  tupdesc2 is a stack address.
> In the earlier stack trace from
> https://postgr.es/m/CANFyU94Xa8a5+4sZ7PxOiDLq+yN89g6y-9nNk-eLEvX6YUXbXA@mail.gmail.com
> both tupdesc1 and tupdesc2 were heap addresses.
>
> > /home/pgbfarm/buildroot-gharial-HEAD/postgresql/src/test/subscription/tmp_check/t_080_step_equalTupleDescs_main_data/commands-gdb:8: Error in sourced command file:
> > Error accessing memory address 0x40010006f968: Bad address.
>
> Thanks.  Please try the attached test version, which avoids exiting too early
> like the last version did.

If you haven't run that yet, please use this version instead.  It collects
more data.  The log will probably be too big to be proper for the mailing
list, so please compress it.

Hi Noah 
PFA new regress_log_080_step_equalTupleDescs file generated from your latest patch

--
Thanks & Regards,
Semab
Attachment

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Noah Misch
Date:
On Sun, Nov 07, 2021 at 08:25:09PM +0500, Semab Tariq wrote:
> On Sun, Nov 7, 2021 at 4:08 AM Noah Misch <noah@leadboat.com> wrote:
> > On Fri, Nov 05, 2021 at 04:22:36AM -0700, Noah Misch wrote:
> > > On Fri, Nov 05, 2021 at 03:21:15PM +0500, Semab Tariq wrote:
> > > > Breakpoint 1, 0x40000000003fcb50:0 in equalTupleDescs (
> > > >     tupdesc1=0x40010006f968, tupdesc2=0x87ffffffffffac50)
> > >
> > > The addresses there are weird.  tupdesc1 is neither a stack address nor a heap
> > > address; it may be in a program text section.  tupdesc2 is a stack address.
> > > In the earlier stack trace from
> > > https://postgr.es/m/CANFyU94Xa8a5+4sZ7PxOiDLq+yN89g6y-9nNk-eLEvX6YUXbXA@mail.gmail.com
> > > both tupdesc1 and tupdesc2 were heap addresses.

That turned out to be a false alarm.  On gharial, a breakpoint at the start of
the function doesn't see the real arguments.  After a ten-instruction
prologue, the real arguments appear, and they are heap addresses.

> PFA new regress_log_080_step_equalTupleDescs file generated from your latest patch

Thanks.  That shows the crash happened sometime after strcmp(defval1->adbin,
defval2->adbin).  Please run the attached version, which collects yet more
information.

Attachment

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Semab Tariq
Date:


On Mon, Nov 8, 2021 at 4:22 AM Noah Misch <noah@leadboat.com> wrote:
On Sun, Nov 07, 2021 at 08:25:09PM +0500, Semab Tariq wrote:
> On Sun, Nov 7, 2021 at 4:08 AM Noah Misch <noah@leadboat.com> wrote:
> > On Fri, Nov 05, 2021 at 04:22:36AM -0700, Noah Misch wrote:
> > > On Fri, Nov 05, 2021 at 03:21:15PM +0500, Semab Tariq wrote:
> > > > Breakpoint 1, 0x40000000003fcb50:0 in equalTupleDescs (
> > > >     tupdesc1=0x40010006f968, tupdesc2=0x87ffffffffffac50)
> > >
> > > The addresses there are weird.  tupdesc1 is neither a stack address nor a heap
> > > address; it may be in a program text section.  tupdesc2 is a stack address.
> > > In the earlier stack trace from
> > > https://postgr.es/m/CANFyU94Xa8a5+4sZ7PxOiDLq+yN89g6y-9nNk-eLEvX6YUXbXA@mail.gmail.com
> > > both tupdesc1 and tupdesc2 were heap addresses.

That turned out to be a false alarm.  On gharial, a breakpoint at the start of
the function doesn't see the real arguments.  After a ten-instruction
prologue, the real arguments appear, and they are heap addresses.

> PFA new regress_log_080_step_equalTupleDescs file generated from your latest patch

Thanks.  That shows the crash happened sometime after strcmp(defval1->adbin,
defval2->adbin).  Please run the attached version,

PFA the new log file


--
Thanks & Regards,
Semab
Attachment

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Noah Misch
Date:
On Mon, Nov 08, 2021 at 11:22:45AM +0500, Semab Tariq wrote:
> On Mon, Nov 8, 2021 at 4:22 AM Noah Misch <noah@leadboat.com> wrote:
> > Thanks.  That shows the crash happened sometime after strcmp(defval1->adbin,
> > defval2->adbin).  Please run the attached version,
> 
> PFA the new log file

> 0x40000000003fdc30:2 in equalTupleDescs (tupdesc1=0x60000000001fdb98, tupdesc2=0x6000000000202da8)
> 0x40000000003fdc30:2 <equalTupleDescs+0x10e2>:          br.call.sptk.many rp=0x3fffffffff3fdc30
> 0x3fffffffff3fdc30 in <unknown_procedure> ()
> 0x3fffffffff3fdc30:    Error accessing memory address 0x3fffffffff3fdc30: Bad address.

This postgres binary apparently contains an explicit branch to
0x3fffffffff3fdc30, which is not an address reasonably expected to contain
code.  (It's not a known heap, a known stack, or a CODE section from the
binary file.)  This probably confirms a toolchain bug.

Would you do "git checkout 166f943" in the source directory you've been
testing, then rerun the test and post the compressed tmp_check/log directory?
I'm guessing that will show the bad branch instruction no longer present.



Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Semab Tariq
Date:


On Mon, Nov 8, 2021 at 12:09 PM Noah Misch <noah@leadboat.com> wrote:
On Mon, Nov 08, 2021 at 11:22:45AM +0500, Semab Tariq wrote:
> On Mon, Nov 8, 2021 at 4:22 AM Noah Misch <noah@leadboat.com> wrote:
> > Thanks.  That shows the crash happened sometime after strcmp(defval1->adbin,
> > defval2->adbin).  Please run the attached version,
>
> PFA the new log file

> 0x40000000003fdc30:2 in equalTupleDescs (tupdesc1=0x60000000001fdb98, tupdesc2=0x6000000000202da8)
> 0x40000000003fdc30:2 <equalTupleDescs+0x10e2>:              br.call.sptk.many rp=0x3fffffffff3fdc30
> 0x3fffffffff3fdc30 in <unknown_procedure> ()
> 0x3fffffffff3fdc30:   Error accessing memory address 0x3fffffffff3fdc30: Bad address.

This postgres binary apparently contains an explicit branch to
0x3fffffffff3fdc30, which is not an address reasonably expected to contain
code.  (It's not a known heap, a known stack, or a CODE section from the
binary file.)  This probably confirms a toolchain bug.

Would you do "git checkout 166f943" in the source directory you've been
testing, then rerun the test and post the compressed tmp_check/log directory?

So make command fails after I applied the v0.4 patch to 166f943 also it did not create any tmp_check/log directory
PFA make output 

I'm guessing that will show the bad branch instruction no longer present.


--
Thanks & Regards,
Semab
Attachment

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Noah Misch
Date:
On Mon, Nov 08, 2021 at 02:09:15PM +0500, Semab Tariq wrote:
> On Mon, Nov 8, 2021 at 12:09 PM Noah Misch <noah@leadboat.com> wrote:
> > This postgres binary apparently contains an explicit branch to
> > 0x3fffffffff3fdc30, which is not an address reasonably expected to contain
> > code.  (It's not a known heap, a known stack, or a CODE section from the
> > binary file.)  This probably confirms a toolchain bug.
> >
> > Would you do "git checkout 166f943" in the source directory you've been
> > testing, then rerun the test and post the compressed
> > tmp_check/log directory?
> >
>
> So make command fails after I applied the v0.4 patch to 166f943 also it did
> not create any tmp_check/log directory

> t/080_step_equalTupleDescs....Can't locate IPC/Run.pm in @INC (@INC contains:
/home/pgbfarm/buildroot-gharial-HEAD/postgresql/src/test/subscription/../../../src/test/perl
/home/pgbfarm/buildroot-gharial-HEAD/postgresql/src/test/subscription
/home/pgbfarm/buildroot-gharial-HEAD/postgresql/src/test/subscription/../../../src/test/perl
/home/pgbfarm/buildroot-gharial-HEAD/postgresql/src/test/subscription
/opt/perl_64/lib/5.8.8/IA64.ARCHREV_0-thread-multi-LP64/opt/perl_64/lib/5.8.8
/opt/perl_64/lib/site_perl/5.8.8/IA64.ARCHREV_0-thread-multi-LP64/opt/perl_64/lib/site_perl/5.8.8
/opt/perl_64/lib/site_perl/opt/perl_64/lib/vendor_perl/5.8.8/IA64.ARCHREV_0-thread-multi-LP64
/opt/perl_64/lib/vendor_perl/5.8.8/opt/perl_64/lib/vendor_perl .) at
/home/pgbfarm/buildroot-gharial-HEAD/postgresql/src/test/subscription/../../../src/test/perl/PostgreSQL/Test/Cluster.pm
line102. 

It looks like this attempt did not use a shell with the environment variables
discussed upthread.  Use these commands first to bring back the required
environment:

export
LD_LIBRARY_PATH='/opt/uuid-1.6.2/inst/lib:/opt/packages/uuid-1.6.2/inst/lib:/opt/packages/krb5-1.11.3/inst/lib/hpux64:/opt/packages/libmemcached-0.46/inst/lib/hpux64:/opt/packages/libevent-2.0.10/inst/lib/hpux64:/opt/packages/expat-2.0.1/inst/lib/hpux64:/opt/packages/gdbm-1.8.3/inst/lib/hpux64:/opt/packages/openldap-2.4.24/inst/lib/hpux64:/opt/packages/proj-4.7.0/inst/lib:/opt/packages/geos-3.2.2/inst/lib:/opt/packages/db-5.1.19/inst/lib/hpux64:/opt/packages/freetype-2.4.4/inst/lib/hpux64:/opt/packages/tcltk-8.5.9/inst/lib/hpux64:/opt/packages/openssl-1.0.0d/inst/lib/hpux64:/opt/packages/editline-2.9/inst/lib/hpux64:/opt/packages/mutt-1.5.21/inst/lib/hpux64:/opt/packages/libidn-1.20/inst/lib/hpux64:/opt/packages/libxslt-1.1.26/inst/lib/hpux64:/opt/packages/libgcrypt-1.4.6/inst/lib/hpux64:/opt/packages/libgpg_error-1.10/inst/lib/hpux64:/opt/packages/libxml2-2.7.8/inst/lib/hpux64:/opt/packages/zlib-1.2.5/inst/lib/hpux64:/opt/packages/grep-2.7/inst/lib/hpux64:/opt/packages/pcre-8.12/inst/lib/hpux64:/opt/packages/ncurses-5.8/inst/lib/hpux64:/opt/packages/termcap-1.3.1/inst/lib/hpux64:/opt/packages/gettext-0.18.1.1/inst/lib/hpux64:/opt/packages/libiconv-1.13.1/inst/lib/hpux64:/opt/packages/sdk-10.2.0.5.0-hpux-ia64/instantclient_10_2/lib'
export PERL5LIB=$HOME/perl-for-tap/Test-Simple-0.87_01/lib:$HOME/perl-for-tap/IPC-Run-0.79/lib



Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Semab Tariq
Date:


On Mon, Nov 8, 2021 at 7:46 PM Noah Misch <noah@leadboat.com> wrote:
On Mon, Nov 08, 2021 at 02:09:15PM +0500, Semab Tariq wrote:
> On Mon, Nov 8, 2021 at 12:09 PM Noah Misch <noah@leadboat.com> wrote:
> > This postgres binary apparently contains an explicit branch to
> > 0x3fffffffff3fdc30, which is not an address reasonably expected to contain
> > code.  (It's not a known heap, a known stack, or a CODE section from the
> > binary file.)  This probably confirms a toolchain bug.
> >
> > Would you do "git checkout 166f943" in the source directory you've been
> > testing, then rerun the test and post the compressed
> > tmp_check/log directory?
> >
>
> So make command fails after I applied the v0.4 patch to 166f943 also it did
> not create any tmp_check/log directory

> t/080_step_equalTupleDescs....Can't locate IPC/Run.pm in @INC (@INC contains: /home/pgbfarm/buildroot-gharial-HEAD/postgresql/src/test/subscription/../../../src/test/perl /home/pgbfarm/buildroot-gharial-HEAD/postgresql/src/test/subscription /home/pgbfarm/buildroot-gharial-HEAD/postgresql/src/test/subscription/../../../src/test/perl /home/pgbfarm/buildroot-gharial-HEAD/postgresql/src/test/subscription /opt/perl_64/lib/5.8.8/IA64.ARCHREV_0-thread-multi-LP64 /opt/perl_64/lib/5.8.8 /opt/perl_64/lib/site_perl/5.8.8/IA64.ARCHREV_0-thread-multi-LP64 /opt/perl_64/lib/site_perl/5.8.8 /opt/perl_64/lib/site_perl /opt/perl_64/lib/vendor_perl/5.8.8/IA64.ARCHREV_0-thread-multi-LP64 /opt/perl_64/lib/vendor_perl/5.8.8 /opt/perl_64/lib/vendor_perl .) at /home/pgbfarm/buildroot-gharial-HEAD/postgresql/src/test/subscription/../../../src/test/perl/PostgreSQL/Test/Cluster.pm line 102.

It looks like this attempt did not use a shell with the environment variables
discussed upthread.  

Yes, I missed the trick PFA tmp_check.tar.bz2


--
Thanks & Regards,
Semab
Attachment

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Noah Misch
Date:
On Tue, Nov 09, 2021 at 11:55:57AM +0500, Semab Tariq wrote:
> On Mon, Nov 8, 2021 at 7:46 PM Noah Misch <noah@leadboat.com> wrote:
> > On Mon, Nov 08, 2021 at 02:09:15PM +0500, Semab Tariq wrote:
> > > On Mon, Nov 8, 2021 at 12:09 PM Noah Misch <noah@leadboat.com> wrote:
> > > > This postgres binary apparently contains an explicit branch to
> > > > 0x3fffffffff3fdc30, which is not an address reasonably expected to contain
> > > > code.  (It's not a known heap, a known stack, or a CODE section from the
> > > > binary file.)  This probably confirms a toolchain bug.
> > > >
> > > > Would you do "git checkout 166f943" in the source directory you've been
> > > > testing, then rerun the test and post the compressed
> > > > tmp_check/log directory?

> PFA tmp_check.tar.bz2

Excellent.  No crash, and the only difference in equalTupleDescs() code
generation is the branch destination addresses.  At commit 70bef49, gharial's
toolchain generates the invalid branch destination:

$ diff -U0 <(cut -b47- disasm/70bef49/equalTupleDescs) <(cut -b47- disasm/166f943/equalTupleDescs)
--- /dev/fd/63  2021-11-09 06:11:20.927444437 -0800
+++ /dev/fd/62  2021-11-09 06:11:20.926444428 -0800
@@ -100 +100 @@
-      br.call.sptk.many rp=0x40000000003cdc20
+      br.call.sptk.many rp=0x40000000003cde20
@@ -658 +658 @@
-      br.call.sptk.many rp=0x40000000003cdc20
+      br.call.sptk.many rp=0x40000000003cde20
@@ -817 +817 @@
-             br.call.sptk.many rp=0x3fffffffff3fdc30
+             br.call.sptk.many rp=0x4000000000400c30
@@ -949 +949 @@
-             br.call.sptk.many rp=0x40000000003cdc20
+             br.call.sptk.many rp=0x40000000003cde20
@@ -970 +970 @@
-             br.call.sptk.many rp=0x40000000003cdc20
+             br.call.sptk.many rp=0x40000000003cde20

Since "git diff 70bef49 166f943" contains nothing that correlates with such a
change, I'm concluding that this is a bug in gharial's toolchain.

It looks like gharial's automatic buildfarm runs have been paused for nine
days.  Feel free to unpause it.  Also, I recommend using the buildfarm client
setnotes.pl to add a note like 'Rare signal 11 from toolchain bug'.  Months or
years pass between these events.  Here are all gharial "signal 11" failures,
likely some of which have other causes:

 sysname │      snapshot       │    branch     │                                            bfurl
                    
 

─────────┼─────────────────────┼───────────────┼─────────────────────────────────────────────────────────────────────────────────────────────
 gharial │ 2018-04-10 00:32:08 │ HEAD          │
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2018-04-10%2000%3A32%3A08
 gharial │ 2019-03-08 01:30:45 │ HEAD          │
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2019-03-08%2001%3A30%3A45
 gharial │ 2019-03-08 08:55:31 │ HEAD          │
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2019-03-08%2008%3A55%3A31
 gharial │ 2019-03-08 19:55:38 │ HEAD          │
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2019-03-08%2019%3A55%3A38
 gharial │ 2019-08-20 09:57:27 │ REL_12_STABLE │
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2019-08-20%2009%3A57%3A27
 gharial │ 2019-08-21 08:04:58 │ REL_12_STABLE │
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2019-08-21%2008%3A04%3A58
 gharial │ 2019-08-22 00:37:03 │ REL_12_STABLE │
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2019-08-22%2000%3A37%3A03
 gharial │ 2019-08-22 12:42:02 │ REL_12_STABLE │
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2019-08-22%2012%3A42%3A02
 gharial │ 2019-08-24 18:43:52 │ REL_12_STABLE │
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2019-08-24%2018%3A43%3A52
 gharial │ 2019-08-25 11:14:36 │ REL_12_STABLE │
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2019-08-25%2011%3A14%3A36
 gharial │ 2019-08-25 18:44:04 │ REL_12_STABLE │
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2019-08-25%2018%3A44%3A04
 gharial │ 2019-08-26 08:47:19 │ REL_12_STABLE │
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2019-08-26%2008%3A47%3A19
 gharial │ 2019-08-26 22:30:23 │ REL_12_STABLE │
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2019-08-26%2022%3A30%3A23
 gharial │ 2021-04-08 03:21:42 │ HEAD          │
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2021-04-08%2003%3A21%3A42
 gharial │ 2021-04-09 06:40:31 │ HEAD          │
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2021-04-09%2006%3A40%3A31
 gharial │ 2021-10-24 16:19:05 │ HEAD          │
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2021-10-24%2016%3A19%3A05
 gharial │ 2021-10-24 20:38:39 │ HEAD          │
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2021-10-24%2020%3A38%3A39
(17 rows)



Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Thomas Munro
Date:
On Wed, Nov 10, 2021 at 3:40 AM Noah Misch <noah@leadboat.com> wrote:
> $ diff -U0 <(cut -b47- disasm/70bef49/equalTupleDescs) <(cut -b47- disasm/166f943/equalTupleDescs)
> --- /dev/fd/63  2021-11-09 06:11:20.927444437 -0800
> +++ /dev/fd/62  2021-11-09 06:11:20.926444428 -0800
> @@ -100 +100 @@
> -      br.call.sptk.many rp=0x40000000003cdc20
> +      br.call.sptk.many rp=0x40000000003cde20
> @@ -658 +658 @@
> -      br.call.sptk.many rp=0x40000000003cdc20
> +      br.call.sptk.many rp=0x40000000003cde20
> @@ -817 +817 @@
> -             br.call.sptk.many rp=0x3fffffffff3fdc30
> +             br.call.sptk.many rp=0x4000000000400c30
> @@ -949 +949 @@
> -             br.call.sptk.many rp=0x40000000003cdc20
> +             br.call.sptk.many rp=0x40000000003cde20
> @@ -970 +970 @@
> -             br.call.sptk.many rp=0x40000000003cdc20
> +             br.call.sptk.many rp=0x40000000003cde20
>
> Since "git diff 70bef49 166f943" contains nothing that correlates with such a
> change, I'm concluding that this is a bug in gharial's toolchain.

Excellent detective work.

> It looks like gharial's automatic buildfarm runs have been paused for nine
> days.  Feel free to unpause it.  Also, I recommend using the buildfarm client
> setnotes.pl to add a note like 'Rare signal 11 from toolchain bug'.  Months or
> years pass between these events.  Here are all gharial "signal 11" failures,
> likely some of which have other causes:

Yeah I spent time investigating some of these at the time and I know
others did too.

IMHO we should hunt down dead toolchains and gently see if we can get
them updated.   There is no point in testing PostgreSQL on every
commit on a compiler someone built from a tarball many years ago and
never updated with bug fixes.



Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From
Noah Misch
Date:
On Wed, Nov 10, 2021 at 10:55:13AM +1300, Thomas Munro wrote:
> On Wed, Nov 10, 2021 at 3:40 AM Noah Misch <noah@leadboat.com> wrote:
> > $ diff -U0 <(cut -b47- disasm/70bef49/equalTupleDescs) <(cut -b47- disasm/166f943/equalTupleDescs)
> > --- /dev/fd/63  2021-11-09 06:11:20.927444437 -0800
> > +++ /dev/fd/62  2021-11-09 06:11:20.926444428 -0800
> > @@ -100 +100 @@
> > -      br.call.sptk.many rp=0x40000000003cdc20
> > +      br.call.sptk.many rp=0x40000000003cde20
> > @@ -658 +658 @@
> > -      br.call.sptk.many rp=0x40000000003cdc20
> > +      br.call.sptk.many rp=0x40000000003cde20
> > @@ -817 +817 @@
> > -             br.call.sptk.many rp=0x3fffffffff3fdc30
> > +             br.call.sptk.many rp=0x4000000000400c30
> > @@ -949 +949 @@
> > -             br.call.sptk.many rp=0x40000000003cdc20
> > +             br.call.sptk.many rp=0x40000000003cde20
> > @@ -970 +970 @@
> > -             br.call.sptk.many rp=0x40000000003cdc20
> > +             br.call.sptk.many rp=0x40000000003cde20
> >
> > Since "git diff 70bef49 166f943" contains nothing that correlates with such a
> > change, I'm concluding that this is a bug in gharial's toolchain.
> 
> Excellent detective work.

Thanks.

> > It looks like gharial's automatic buildfarm runs have been paused for nine
> > days.  Feel free to unpause it.  Also, I recommend using the buildfarm client
> > setnotes.pl to add a note like 'Rare signal 11 from toolchain bug'.  Months or
> > years pass between these events.  Here are all gharial "signal 11" failures,
> > likely some of which have other causes:
> 
> Yeah I spent time investigating some of these at the time and I know
> others did too.
> 
> IMHO we should hunt down dead toolchains and gently see if we can get
> them updated.   There is no point in testing PostgreSQL on every
> commit on a compiler someone built from a tarball many years ago and
> never updated with bug fixes.

+1 for "gently see if".  It's good that we have a few old-gcc animals, but
having versions of intermediate age is less important.  Some owners will
decline, and that's okay.



Re: conchuela timeouts since 2021-10-09 system upgrade

From
Noah Misch
Date:
On Thu, Oct 28, 2021 at 07:56:00PM -0700, Noah Misch wrote:
> On Sun, Oct 24, 2021 at 09:03:27PM +0300, Andrey Borodin wrote:
> > > 24 окт. 2021 г., в 19:19, Noah Misch <noah@leadboat.com> написал(а):
> > > These failures started on 2021-10-09, the day conchuela updated from DragonFly
> > > v4.4.3-RELEASE to DragonFly v6.0.0-RELEASE.  It smells like a kernel bug.
> > > Since the theorized kernel bug seems not to affect
> > > src/test/subscription/t/015_stream.pl, I wonder if we can borrow a workaround
> > > from other tests.  One thing in common with src/test/recovery/t/017_shm.pl and
> > > the newest failure sites is that they don't write anything to the child stdin.
> 
> > >  If not, does passing the script via stdin, like "pgbench -f-
> > > <script.sql", work around the problem?
> > 
> > I'll test it tomorrow, the refactoring does not seem trivial given we use many simultaneous scripts.
> 
> Did that work?  Commit 7f580aa should make this unnecessary for v12+
> contrib/amcheck tests, but older branches still need a fix, and 017_shm.pl
> needs a fix in all branches.  A backup plan is just to skip affected tests on
> dragonfly 6+.  Since the breakage has been limited to so few tests, I'm
> optimistic that a better workaround will present itself.

Is this still in your queue, or not?  The conchuela breakage in
src/test/recovery and v11+v10 src/bin/pgbench is a large source of buildfarm
noise right now.



Re: conchuela timeouts since 2021-10-09 system upgrade

From
Andrey Borodin
Date:

> 10 нояб. 2021 г., в 09:15, Noah Misch <noah@leadboat.com> написал(а):
>
> On Thu, Oct 28, 2021 at 07:56:00PM -0700, Noah Misch wrote:
>> On Sun, Oct 24, 2021 at 09:03:27PM +0300, Andrey Borodin wrote:
>>>> 24 окт. 2021 г., в 19:19, Noah Misch <noah@leadboat.com> написал(а):
>>>> These failures started on 2021-10-09, the day conchuela updated from DragonFly
>>>> v4.4.3-RELEASE to DragonFly v6.0.0-RELEASE.  It smells like a kernel bug.
>>>> Since the theorized kernel bug seems not to affect
>>>> src/test/subscription/t/015_stream.pl, I wonder if we can borrow a workaround
>>>> from other tests.  One thing in common with src/test/recovery/t/017_shm.pl and
>>>> the newest failure sites is that they don't write anything to the child stdin.
>>
>>>> If not, does passing the script via stdin, like "pgbench -f-
>>>> <script.sql", work around the problem?
>>>
>>> I'll test it tomorrow, the refactoring does not seem trivial given we use many simultaneous scripts.
>>
>> Did that work?  Commit 7f580aa should make this unnecessary for v12+
>> contrib/amcheck tests, but older branches still need a fix, and 017_shm.pl
>> needs a fix in all branches.  A backup plan is just to skip affected tests on
>> dragonfly 6+.  Since the breakage has been limited to so few tests, I'm
>> optimistic that a better workaround will present itself.
>
> Is this still in your queue, or not?  The conchuela breakage in
> src/test/recovery and v11+v10 src/bin/pgbench is a large source of buildfarm
> noise right now.
Uh, sorry, this problem fell out of my attention somehow. I'll try to do something with 10 and 11 this or next week.

Best regards, Andrey Borodin.


Re: conchuela timeouts since 2021-10-09 system upgrade

From
Andrey Borodin
Date:

>> 10 нояб. 2021 г., в 09:15, Noah Misch <noah@leadboat.com> написал(а):
>>
> Uh, sorry, this problem fell out of my attention somehow. I'll try to do something with 10 and 11 this or next week.
> 

I've adapted 7f580aa to functionality of REL_11 using "\if 0 = :client_id" metacommand.
I really do not like idea of supporting background_pgbench() in older branches without counterpart in newer branches.
But so far I didn't come up with some clever mutex idea for REL_10.

Best regards, Andrey Borodin.


Attachment

Re: conchuela timeouts since 2021-10-09 system upgrade

From
Noah Misch
Date:
On Sat, Nov 13, 2021 at 11:47:43PM +0500, Andrey Borodin wrote:
> >> 10 нояб. 2021 г., в 09:15, Noah Misch <noah@leadboat.com> написал(а):
> > Uh, sorry, this problem fell out of my attention somehow. I'll try to do something with 10 and 11 this or next
week.
> 
> I've adapted 7f580aa to functionality of REL_11 using "\if 0 = :client_id" metacommand.
> I really do not like idea of supporting background_pgbench() in older branches without counterpart in newer
branches.
> But so far I didn't come up with some clever mutex idea for REL_10.

That's a reasonable sentiment, but removing background_pgbench() isn't going
to fix 017_shm.pl.  I'm not enthusiastic about any fix that repairs
src/bin/pgbench without repairing 017_shm.pl.  I'm okay with skipping affected
test files on dragonfly >= 6 if you decide to cease figuring out how to make
them pass like the others do.



Re: conchuela timeouts since 2021-10-09 system upgrade

From
Thomas Munro
Date:
On Sun, Nov 14, 2021 at 1:17 PM Noah Misch <noah@leadboat.com> wrote:
> On Sat, Nov 13, 2021 at 11:47:43PM +0500, Andrey Borodin wrote:
> > I've adapted 7f580aa to functionality of REL_11 using "\if 0 = :client_id" metacommand.
> > I really do not like idea of supporting background_pgbench() in older branches without counterpart in newer
branches.
> > But so far I didn't come up with some clever mutex idea for REL_10.
>
> That's a reasonable sentiment, but removing background_pgbench() isn't going
> to fix 017_shm.pl.  I'm not enthusiastic about any fix that repairs
> src/bin/pgbench without repairing 017_shm.pl.  I'm okay with skipping affected
> test files on dragonfly >= 6 if you decide to cease figuring out how to make
> them pass like the others do.

Hmm, so if "IPC::Run got stuck when it should have been reaping that
zombie", what's it stuck in, I guess select() or waitpid()?  Maybe
there' s a kernel bug but it seems hard to believe that a Unix system
would have bugs in such fundamental facilities and still be able to
build itself and ship a release...  Otherwise I guess Perl, or perl
scripts, would need to be confusing fds or pids or something?  But
that's hard to believe on its own, too, given the lack of problems on
other systems that are pretty similar.  If Andrey can still reproduce
this, it'd be interesting to see a gdb backtrace, and also "ps O
wchan" or perhaps kill -INFO $pid, and lsof for the process (or
according to old pages found with google, perhaps the equivalent tool
is "fstat" on that system).



Re: conchuela timeouts since 2021-10-09 system upgrade

From
Andrey Borodin
Date:

> 14 нояб. 2021 г., в 05:09, Noah Misch <noah@leadboat.com> написал(а):
>
> On Sat, Nov 13, 2021 at 11:47:43PM +0500, Andrey Borodin wrote:
>>>> 10 нояб. 2021 г., в 09:15, Noah Misch <noah@leadboat.com> написал(а):
>>> Uh, sorry, this problem fell out of my attention somehow. I'll try to do something with 10 and 11 this or next
week.
>>
>> I've adapted 7f580aa to functionality of REL_11 using "\if 0 = :client_id" metacommand.
>> I really do not like idea of supporting background_pgbench() in older branches without counterpart in newer
branches.
>> But so far I didn't come up with some clever mutex idea for REL_10.
>
> That's a reasonable sentiment, but removing background_pgbench() isn't going
> to fix 017_shm.pl.  I'm not enthusiastic about any fix that repairs
> src/bin/pgbench without repairing 017_shm.pl.  I'm okay with skipping affected
> test files on dragonfly >= 6 if you decide to cease figuring out how to make
> them pass like the others do.

Let's skip these tests. How this can be accomplished?
Should we mute only 022_cic.pl, 023_cic_2pc.pl, and 017_shm.pl? Or everything that calls harness->finish?

I've sent some diagnostics info to Thomas offlist, but I do not understand how it can be used...

Best regards, Andrey Borodin.


Re: conchuela timeouts since 2021-10-09 system upgrade

From
Thomas Munro
Date:
On Mon, Nov 15, 2021 at 8:20 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
> I've sent some diagnostics info to Thomas offlist, but I do not understand how it can be used...

Summary for the record: defunct child, and gdb showed perl blocked in
select(), and ps revealed that Dragonfly internally maps select() onto
kevent() (wchan = kqread), which is interesting, but apparently not
new.  (Hmmm, /me eyes the recent changes to events reported on
half-closed pipes[1].)

[1] https://github.com/DragonFlyBSD/DragonFlyBSD/commits/master/sys/kern/sys_generic.c



Re: conchuela timeouts since 2021-10-09 system upgrade

From
Thomas Munro
Date:
I managed to reproduce the 017_shm.pl hang using my Vagrant file (see
earlier).  It happens for "gmake -s check-world", but not "gmake -s -C
src/test/recovery check", which might have confused me last time.  The
symptoms are as described already, though this time I learned with
truss that it's in a retry loop waiting 1 second at a time.

I don't know how to get debug symbols for libc on this operating
system (well, probably needed to select debug option while installing
the OS, but I didn't install the OS, I'm using a lazy bones pre-rolled
Vagrant image).  So in order to be able to inspect the arguments to
select, I made my own LD_PRELOAD wrapper for select (see end), and I
observed that it was calling select() with an empty fd_set:

(gdb) bt
#0  0x00000008014c1e6c in select () from /lib/libc.so.8
#1  0x00000008009f0fe6 in select () from /usr/lib/libpthread.so.0
#2  0x000000080044c622 in select (nfds=16, readfds=0x8017f6de0,
writefds=0x8017f6d50, exceptfds=0x0,
    timeout=0x7fffffdfd4a0) at select.c:13
#3  0x00000008007bfaf9 in Perl_pp_sselect () from
/usr/local/lib/perl5/5.32/mach/CORE/libperl.so.5.32
#4  0x000000080076b036 in Perl_runops_standard ()
   from /usr/local/lib/perl5/5.32/mach/CORE/libperl.so.5.32
#5  0x00000008006da3b7 in perl_run () from
/usr/local/lib/perl5/5.32/mach/CORE/libperl.so.5.32
#6  0x0000000000400de4 in main ()
(gdb) f 2
#2  0x000000080044c622 in select (nfds=16, readfds=0x8017f6de0,
writefds=0x8017f6d50, exceptfds=0x0,
    timeout=0x7fffffdfd4a0) at select.c:13
13        return real_function(nfds, readfds, writefds, exceptfds, timeout);
(gdb) print nfds
$1 = 16
(gdb) print *readfds
$2 = {fds_bits = {0 <repeats 16 times>}}
(gdb) print *writefds
$3 = {fds_bits = {0 <repeats 16 times>}}

So it looks a lot like something on the perl side has lost track of
the pipe it's supposed to be selecting on.  If I understand correctly,
it's supposed to be waiting for one of the following pipes to appear
as readable, whichever is the one that the zombie psql process
previously held the other end of:

$ fstat -p 73032
USER     CMD          PID   FD PATH                  INUM MODE
  SZ|DV R/W
vagrant  perl       73032 root /                        1 drwxr-xr-x
 offset:0  r
vagrant  perl       73032   wd
/home/vagrant/postgres/src/test/recovery     4434524058 drwxr-xr-x
offset:0  r
vagrant  perl       73032 text /pfs/@@-1:00001/local/bin/perl
     4329919842 -rwxr-xr-x    offset:0  r
vagrant  perl       73032    0 /dev/pts/0
          1335 crw--w----     pts/0:155171 rw
vagrant  perl       73032    1
/home/vagrant/postgres/src/test/recovery/tmp_check/log/regress_log_017_shm
  4477427235 -rw-r--r--    offset:9457  w
vagrant  perl       73032    2
/home/vagrant/postgres/src/test/recovery/tmp_check/log/regress_log_017_shm
  4477427235 -rw-r--r--    offset:9457  w
vagrant  perl       73032    3
/home/vagrant/postgres/src/test/recovery/tmp_check/log/regress_log_017_shm
  4477427235 -rw-r--r--    offset:9457  w
vagrant  perl       73032    4* pipe fffff800aac1dca0 (B<->A) ravail 0
wavail 0 rw
vagrant  perl       73032    5* pipe fffff800ab6e9020 (B<->A) ravail 0
wavail 0 rw
vagrant  perl       73032    6* pipe fffff800aac1dca0 (B<->A) ravail 0
wavail 0 rw
vagrant  perl       73032    7* pipe fffff800ab6e9020 (B<->A) ravail 0
wavail 0 rw
vagrant  perl       73032    8* pipe fffff800aac1dca0 (B<->A) ravail 0
wavail 0 rw
vagrant  perl       73032    9* pipe fffff800ab6e9020 (B<->A) ravail 0
wavail 0 rw
vagrant  perl       73032   13* pipe fffff800ab157560 (B<->A) ravail 0
wavail 0 rw


=== select wrapper ===

$ cat select.c
#include <dlfcn.h>
#include <sys/select.h>

int
select(int nfds,fd_set *readfds, fd_set *writefds, fd_set *exceptfds,
    struct timeval *timeout)
{
    static int (*real_function)(int, fd_set *, fd_set *, fd_set *,
        struct timeval *);

    if (!real_function)
        real_function = dlsym(RTLD_NEXT, "select");

    return real_function(nfds, readfds, writefds, exceptfds, timeout);
}

$ cc -Wall -fPIC -shared -g -o myselect.so select.c -ldl
$ cd postgres
$ LD_PRELOAD=/home/vagrant/myselect.so gmake check-world -s



Re: conchuela timeouts since 2021-10-09 system upgrade

From
Thomas Munro
Date:
On Fri, Nov 19, 2021 at 5:09 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> (gdb) print *writefds
> $3 = {fds_bits = {0 <repeats 16 times>}}

Oops, that was after it had been cleared already by the OS; duh.  On
entry to my wrapper, writefds does in fact contain the bit pattern for
fd 13.  That led me to try a very simply C program which runs to
completion on Linux and FreeBSD, but hangs forever on Dragonfly.

Attachment

Re: conchuela timeouts since 2021-10-09 system upgrade

From
Noah Misch
Date:
On Fri, Nov 19, 2021 at 05:52:59PM +1300, Thomas Munro wrote:
> led me to try a very simply C program which runs to
> completion on Linux and FreeBSD, but hangs forever on Dragonfly.

> [pipe FD considered non-writable, but writing would give EPIPE]

Looks like optimal material for a kernel bug report.  Excellent discovery.



Re: conchuela timeouts since 2021-10-09 system upgrade

From
Andrey Borodin
Date:

> 19 нояб. 2021 г., в 11:13, Noah Misch <noah@leadboat.com> написал(а):
>
> Looks like optimal material for a kernel bug report.  Excellent discovery.

Are we going to wait for the fix or disable tests partially on conchuela?

Best regards, Andrey Borodin.


Re: conchuela timeouts since 2021-10-09 system upgrade

From
Thomas Munro
Date:
On Fri, Nov 19, 2021 at 7:13 PM Noah Misch <noah@leadboat.com> wrote:
> On Fri, Nov 19, 2021 at 05:52:59PM +1300, Thomas Munro wrote:
> > led me to try a very simply C program which runs to
> > completion on Linux and FreeBSD, but hangs forever on Dragonfly.
>
> > [pipe FD considered non-writable, but writing would give EPIPE]
>
> Looks like optimal material for a kernel bug report.  Excellent discovery.

https://bugs.dragonflybsd.org/issues/3307



Re: conchuela timeouts since 2021-10-09 system upgrade

From
Tom Lane
Date:
Thomas Munro <thomas.munro@gmail.com> writes:
> On Fri, Nov 19, 2021 at 7:13 PM Noah Misch <noah@leadboat.com> wrote:
>> Looks like optimal material for a kernel bug report.  Excellent discovery.

> https://bugs.dragonflybsd.org/issues/3307

I see they're pushing back on whether this is a bug.  I failed to find a
way to quickly comment on the bug report, but I suggest you quote POSIX
select(2) at them:

    A descriptor shall be considered ready for reading when a call to an
    input function with O_NONBLOCK clear would not block, whether or not
    the function would transfer data successfully. (The function might
    return data, an end-of-file indication, or an error other than one
    indicating that it is blocked, and in each of these cases the
    descriptor shall be considered ready for reading.)

    A descriptor shall be considered ready for writing when a call to an
    output function with O_NONBLOCK clear would not block, whether or not
    the function would transfer data successfully.

I don't know whether it'd help to point out that the test program works
as expected on other BSDen.  (I'm planning to go try it on a few more
platforms in a bit, but in any case the standards-compliance question
looks pretty open-and-shut to me.)

            regards, tom lane



Re: conchuela timeouts since 2021-10-09 system upgrade

From
Tom Lane
Date:
Andrey Borodin <x4mmm@yandex-team.ru> writes:
> Are we going to wait for the fix or disable tests partially on conchuela?

Let's wait a bit and see if a fix will be forthcoming promptly.

            regards, tom lane



Re: conchuela timeouts since 2021-10-09 system upgrade

From
Tom Lane
Date:
Thomas Munro <thomas.munro@gmail.com> writes:
> Oops, that was after it had been cleared already by the OS; duh.  On
> entry to my wrapper, writefds does in fact contain the bit pattern for
> fd 13.  That led me to try a very simply C program which runs to
> completion on Linux and FreeBSD, but hangs forever on Dragonfly.

For completeness, I checked this on macOS, OpenBSD, and NetBSD,
and they all give the expected results.

            regards, tom lane



Re: conchuela timeouts since 2021-10-09 system upgrade

From
Thomas Munro
Date:
On Sat, Nov 20, 2021 at 3:57 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Andrey Borodin <x4mmm@yandex-team.ru> writes:
> > Are we going to wait for the fix or disable tests partially on conchuela?
>
> Let's wait a bit and see if a fix will be forthcoming promptly.

The fix came forth.  I'll wait for the back-patch and then ask Mikael
to upgrade conchuela.



Re: conchuela timeouts since 2021-10-09 system upgrade

From
Thomas Munro
Date:
On Sat, Nov 20, 2021 at 8:35 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> The fix came forth.  I'll wait for the back-patch and then ask Mikael
> to upgrade conchuela.

Done; thanks Mikael.



Re: conchuela timeouts since 2021-10-09 system upgrade

From
Mikael Kjellström
Date:
On 2021-12-29 20:53, Thomas Munro wrote:
> On Sat, Nov 20, 2021 at 8:35 AM Thomas Munro <thomas.munro@gmail.com> wrote:
>> The fix came forth.  I'll wait for the back-patch and then ask Mikael
>> to upgrade conchuela.
> 
> Done; thanks Mikael.

It has only had 1 pass through the buildfarm so far but all branches passed.

So looking promising.

Let's see how it looks after a few more passes.

/Mikael