Re: [HACKERS] Issues with logical replication - Mailing list pgsql-hackers

From Stas Kelvich
Subject Re: [HACKERS] Issues with logical replication
Date
Msg-id 59145A4F-CC42-47D8-B451-D4C4285BB3F0@postgrespro.ru
Whole thread Raw
In response to Re: [HACKERS] Issues with logical replication  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: [HACKERS] Issues with logical replication  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
> On 15 Nov 2017, at 23:09, Robert Haas <robertmhaas@gmail.com> wrote:
>
> Ouch.  This seems like a bug that needs to be fixed, but do you think
> it's related to to Petr's proposed fix to set es_output_cid?  That fix
> looks reasonable, since we shouldn't try to lock tuples without a
> valid CommandId.
>
> Now, having said that, I understand how the lack of that fix could cause:
>
> 2017-10-02 18:40:26.101 MSK [2954] ERROR:  attempted to lock invisible tuple
>
> But I do not understand how it could cause:
>
> #3  0x000000000086ac1d in XactLockTableWait (xid=0, rel=0x0, ctid=0x0,
> oper=XLTW_None) at lmgr.c:582
>

I think that’s two separate bugs. Second one is due to race between
XactLockTableWait() and acquiring of heavyweight transaction lock in
AssignTransactionId(). XactLockTableWait assumes that transaction
already set it’s hw transaction lock, but with current machinery of
RunningXacts and snapshot building we can start such waiting before
transaction gets it hw lock.

I can see two ways out of this and both of them are quite unattractive:

1. Create a flag in pgproc indicating that hw lock is acquired and
include in RunningXacts only transaction with this flag set.
That's probably quite expensive since we need to hold one more exclusive
lwlock in AssignTransactionId.

2. Rewrite XactLockTableWait() in a way that it is aware of the fact that
valid xid can be topmost transaction but not yet have a hw lock entry.
In such situation we can just wait and try to lock again. Good stop
condition in this loop is TransactionIdIsInProgress(xid) becoming
false.

I did a sketch of first approach just to confirm that it solves the problem.
But there I hold ProcArrayLock during update of flag. Since only reader is
GetRunningTransactionData it possible to have a custom lock there. In
this case GetRunningTransactionData will hold three locks simultaneously,
since it already holds ProcArrayLock and XidGenLock =)

Any better ideas?

--
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Attachment

pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: [HACKERS] Partition-wise aggregation/grouping
Next
From: Masahiko Sawada
Date:
Subject: Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.