Thread: [HACKERS] Issues with logical replication

[HACKERS] Issues with logical replication

From
Konstantin Knizhnik
Date:
I have faced two issues with logical replication.
Reproducing scenario:

1. start two Postgres instances (I start both at local machine).
2. Initialize pgbench tables at both instances:    pgbench -i -s 10 postgres
3. Create publication of pgbench_accounts table at one node:    create publication pub for table pgbench_accounts;
4. Create subscription at another node:    delete from pgbench_accounts;    CREATE SUBSCRIPTION sub connection
'dbname=postgreshost=localhost 
 
port=5432 sslmode=disable' publication pub;    CREATE OR REPLACE FUNCTION replication_trigger_proc() RETURNS 
TRIGGER AS $$    BEGIN    --  RETURN NEW;    END $$ LANGUAGE plpgsql;   CREATE TRIGGER replication_trigger BEFORE
INSERTOR UPDATE OR DELETE 
 
ON pgbench_accounts FOR EACH ROW EXECUTE PROCEDURE 
replication_trigger_proc();   ALTER TABLE pgbench_accounts ENABLE ALWAYS TRIGGER replication_trigger;
5. Start pgbench at primary node:    pgbench -T 1000 -P 2 -c 10 postgres


Please notice commented "return new" statement. Invocation of this 
function cause error and in log we see repeated messages:

2017-10-02 16:47:46.764 MSK [32129] LOG:  logical replication table 
synchronization worker for subscription "sub", table "pgbench_accounts" 
has started
2017-10-02 16:47:46.771 MSK [32129] ERROR:  control reached end of 
trigger procedure without RETURN
2017-10-02 16:47:46.771 MSK [32129] CONTEXT:  PL/pgSQL function 
replication_trigger_proc()    COPY pgbench_accounts, line 1: "1    1    0 "
2017-10-02 16:47:46.772 MSK [28020] LOG:  worker process: logical 
replication worker for subscription 17264 sync 17251 (PID 32129) exited 
with exit code 1
...


After few minutes of work primary node (with publication) is crashed 
with the following stack trace:

Program terminated with signal SIGABRT, Aborted.
#0  0x00007f3608f8ec37 in __GI_raise (sig=sig@entry=6) at 
../nptl/sysdeps/unix/sysv/linux/raise.c:56
56    ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  0x00007f3608f8ec37 in __GI_raise (sig=sig@entry=6) at 
../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007f3608f92028 in __GI_abort () at abort.c:89
#2  0x00000000009f5740 in ExceptionalCondition (conditionName=0xbf6b30 
"!(((xid) != ((TransactionId) 0)))",    errorType=0xbf69af "FailedAssertion", fileName=0xbf69a8 "lmgr.c", 
lineNumber=582) at assert.c:54
#3  0x000000000086ac1d in XactLockTableWait (xid=0, rel=0x0, ctid=0x0, 
oper=XLTW_None) at lmgr.c:582
#4  0x000000000081c9f7 in SnapBuildWaitSnapshot (running=0x2749198, 
cutoff=898498) at snapbuild.c:1400
#5  0x000000000081c7c7 in SnapBuildFindSnapshot (builder=0x2807910, 
lsn=798477760, running=0x2749198) at snapbuild.c:1311
#6  0x000000000081c339 in SnapBuildProcessRunningXacts 
(builder=0x2807910, lsn=798477760, running=0x2749198) at snapbuild.c:1106
#7  0x000000000080a1c7 in DecodeStandbyOp (ctx=0x27ef870, 
buf=0x7ffd301858d0) at decode.c:314
#8  0x0000000000809ce1 in LogicalDecodingProcessRecord (ctx=0x27ef870, 
record=0x27efb30) at decode.c:117
#9  0x000000000080ddf9 in DecodingContextFindStartpoint (ctx=0x27ef870) 
at logical.c:458
#10 0x0000000000823968 in CreateReplicationSlot (cmd=0x27483a8) at 
walsender.c:934
#11 0x00000000008246ee in exec_replication_command (    cmd_string=0x27b9520 "CREATE_REPLICATION_SLOT 
\"sub_17264_sync_17251\" TEMPORARY LOGICAL pgoutput USE_SNAPSHOT") at 
walsender.c:1511
#12 0x000000000088eb44 in PostgresMain (argc=1, argv=0x275b738, 
dbname=0x275b578 "postgres", username=0x272b800 "knizhnik")    at postgres.c:4086
#13 0x00000000007ef9a1 in BackendRun (port=0x27532a0) at postmaster.c:4357
#14 0x00000000007ef0cb in BackendStartup (port=0x27532a0) at 
postmaster.c:4029
#15 0x00000000007eb68b in ServerLoop () at postmaster.c:1753
#16 0x00000000007eac77 in PostmasterMain (argc=3, argv=0x2729670) at 
postmaster.c:1361
#17 0x0000000000728552 in main (argc=3, argv=0x2729670) at main.c:228


Now fix the trigger function:
CREATE OR REPLACE FUNCTION replication_trigger_proc() RETURNS TRIGGER AS $$
BEGIN  RETURN NEW;
END $$ LANGUAGE plpgsql;

And manually perform at master two updates inside one transaction:

postgres=# begin;
BEGIN
postgres=# update pgbench_accounts set abalance=abalance+1 where aid=26;
UPDATE 1
postgres=# update pgbench_accounts set abalance=abalance-1 where aid=26;
UPDATE 1
postgres=# commit;
<hangs>

and in replica log we see:
2017-10-02 18:40:26.094 MSK [2954] LOG:  logical replication apply 
worker for subscription "sub" has started
2017-10-02 18:40:26.101 MSK [2954] ERROR:  attempted to lock invisible 
tuple
2017-10-02 18:40:26.102 MSK [2882] LOG:  worker process: logical 
replication worker for subscription 16403 (PID 2954) exited with exit 
code 1

Error happens in trigger.c:

#3  0x000000000069bddb in GetTupleForTrigger (estate=0x2e36b50, 
epqstate=0x7ffc4420eda0, relinfo=0x2dcfe90, tid=0x2dd08ac,    lockmode=LockTupleNoKeyExclusive, newSlot=0x7ffc4420ec40)
at
 
trigger.c:3103
#4  0x000000000069b259 in ExecBRUpdateTriggers (estate=0x2e36b50, 
epqstate=0x7ffc4420eda0, relinfo=0x2dcfe90, tupleid=0x2dd08ac,    fdw_trigtuple=0x0, slot=0x2dd0240) at trigger.c:2748
#5  0x00000000006d2395 in ExecSimpleRelationUpdate (estate=0x2e36b50, 
epqstate=0x7ffc4420eda0, searchslot=0x2dd0358, slot=0x2dd0240)    at execReplication.c:461
#6  0x0000000000820894 in apply_handle_update (s=0x7ffc442163b0) at 
worker.c:736
#7  0x0000000000820d83 in apply_dispatch (s=0x7ffc442163b0) at worker.c:892
#8  0x000000000082121b in LogicalRepApplyLoop (last_received=2225693736) 
at worker.c:1095
#9  0x000000000082219d in ApplyWorkerMain (main_arg=0) at worker.c:1647
#10 0x00000000007dd496 in StartBackgroundWorker () at bgworker.c:835
#11 0x00000000007f0889 in do_start_bgworker (rw=0x2d75f50) at 
postmaster.c:5680
#12 0x00000000007f0bc3 in maybe_start_bgworkers () at postmaster.c:5884
#13 0x00000000007efbeb in sigusr1_handler (postgres_signal_arg=10) at 
postmaster.c:5073
#14 <signal handler called>
#15 0x00007fbe83a054a3 in __select_nocancel () at 
../sysdeps/unix/syscall-template.S:81
#16 0x00000000007eb517 in ServerLoop () at postmaster.c:1717
#17 0x00000000007eac48 in PostmasterMain (argc=3, argv=0x2d4e660) at 
postmaster.c:1361
#18 0x0000000000728523 in main (argc=3, argv=0x2d4e660) at main.c:228

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issues with logical replication

From
Petr Jelinek
Date:
On 02/10/17 17:49, Konstantin Knizhnik wrote:
> I have faced two issues with logical replication.
> Reproducing scenario:
> 
> 1. start two Postgres instances (I start both at local machine).
> 2. Initialize pgbench tables at both instances:
>     pgbench -i -s 10 postgres
> 3. Create publication of pgbench_accounts table at one node:
>     create publication pub for table pgbench_accounts;
> 4. Create subscription at another node:
>     delete from pgbench_accounts;
>     CREATE SUBSCRIPTION sub connection 'dbname=postgres host=localhost
> port=5432 sslmode=disable' publication pub;
>     CREATE OR REPLACE FUNCTION replication_trigger_proc() RETURNS
> TRIGGER AS $$
>     BEGIN
>     --  RETURN NEW;
>     END $$ LANGUAGE plpgsql;
>    CREATE TRIGGER replication_trigger BEFORE INSERT OR UPDATE OR DELETE
> ON pgbench_accounts FOR EACH ROW EXECUTE PROCEDURE
> replication_trigger_proc();
>    ALTER TABLE pgbench_accounts ENABLE ALWAYS TRIGGER replication_trigger;
> 5. Start pgbench at primary node:
>     pgbench -T 1000 -P 2 -c 10 postgres
> 
> 
> Please notice commented "return new" statement. Invocation of this
> function cause error and in log we see repeated messages:
> 
> 2017-10-02 16:47:46.764 MSK [32129] LOG:  logical replication table
> synchronization worker for subscription "sub", table "pgbench_accounts"
> has started
> 2017-10-02 16:47:46.771 MSK [32129] ERROR:  control reached end of
> trigger procedure without RETURN
> 2017-10-02 16:47:46.771 MSK [32129] CONTEXT:  PL/pgSQL function
> replication_trigger_proc()
>     COPY pgbench_accounts, line 1: "1    1    0 "
> 2017-10-02 16:47:46.772 MSK [28020] LOG:  worker process: logical
> replication worker for subscription 17264 sync 17251 (PID 32129) exited
> with exit code 1
> ...
> 
> 
> After few minutes of work primary node (with publication) is crashed
> with the following stack trace:
> 
> Program terminated with signal SIGABRT, Aborted.
> #0  0x00007f3608f8ec37 in __GI_raise (sig=sig@entry=6) at
> ../nptl/sysdeps/unix/sysv/linux/raise.c:56
> 56    ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
> (gdb) bt
> #0  0x00007f3608f8ec37 in __GI_raise (sig=sig@entry=6) at
> ../nptl/sysdeps/unix/sysv/linux/raise.c:56
> #1  0x00007f3608f92028 in __GI_abort () at abort.c:89
> #2  0x00000000009f5740 in ExceptionalCondition (conditionName=0xbf6b30
> "!(((xid) != ((TransactionId) 0)))",
>     errorType=0xbf69af "FailedAssertion", fileName=0xbf69a8 "lmgr.c",
> lineNumber=582) at assert.c:54
> #3  0x000000000086ac1d in XactLockTableWait (xid=0, rel=0x0, ctid=0x0,
> oper=XLTW_None) at lmgr.c:582
> #4  0x000000000081c9f7 in SnapBuildWaitSnapshot (running=0x2749198,
> cutoff=898498) at snapbuild.c:1400
> #5  0x000000000081c7c7 in SnapBuildFindSnapshot (builder=0x2807910,
> lsn=798477760, running=0x2749198) at snapbuild.c:1311
> #6  0x000000000081c339 in SnapBuildProcessRunningXacts
> (builder=0x2807910, lsn=798477760, running=0x2749198) at snapbuild.c:1106


Hmm this would suggest that XLOG_RUNNING_XACTS record contains invalid
transaction ids but we specifically skip those in
GetRunningTransactionData(). Can you check pg_waldump output for that
record (lsn is shown in the backtrace)?

> 
> Now fix the trigger function:
> CREATE OR REPLACE FUNCTION replication_trigger_proc() RETURNS TRIGGER AS $$
> BEGIN
>   RETURN NEW;
> END $$ LANGUAGE plpgsql;
> 
> And manually perform at master two updates inside one transaction:
> 
> postgres=# begin;
> BEGIN
> postgres=# update pgbench_accounts set abalance=abalance+1 where aid=26;
> UPDATE 1
> postgres=# update pgbench_accounts set abalance=abalance-1 where aid=26;
> UPDATE 1
> postgres=# commit;
> <hangs>
> 
> and in replica log we see:
> 2017-10-02 18:40:26.094 MSK [2954] LOG:  logical replication apply
> worker for subscription "sub" has started
> 2017-10-02 18:40:26.101 MSK [2954] ERROR:  attempted to lock invisible
> tuple
> 2017-10-02 18:40:26.102 MSK [2882] LOG:  worker process: logical
> replication worker for subscription 16403 (PID 2954) exited with exit
> code 1
> 
> Error happens in trigger.c:
> 
> #3  0x000000000069bddb in GetTupleForTrigger (estate=0x2e36b50,
> epqstate=0x7ffc4420eda0, relinfo=0x2dcfe90, tid=0x2dd08ac,
>     lockmode=LockTupleNoKeyExclusive, newSlot=0x7ffc4420ec40) at
> trigger.c:3103
> #4  0x000000000069b259 in ExecBRUpdateTriggers (estate=0x2e36b50,
> epqstate=0x7ffc4420eda0, relinfo=0x2dcfe90, tupleid=0x2dd08ac,
>     fdw_trigtuple=0x0, slot=0x2dd0240) at trigger.c:2748
> #5  0x00000000006d2395 in ExecSimpleRelationUpdate (estate=0x2e36b50,
> epqstate=0x7ffc4420eda0, searchslot=0x2dd0358, slot=0x2dd0240)
>     at execReplication.c:461
> #6  0x0000000000820894 in apply_handle_update (s=0x7ffc442163b0) at
> worker.c:736

We have locked the same tuple in RelationFindReplTupleByIndex() just
before this gets called and didn't get the same error. I guess we do
something wrong with snapshots. Will need to investigate more.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issues with logical replication

From
Petr Jelinek
Date:
On 02/10/17 18:59, Petr Jelinek wrote:
>>
>> Now fix the trigger function:
>> CREATE OR REPLACE FUNCTION replication_trigger_proc() RETURNS TRIGGER AS $$
>> BEGIN
>>   RETURN NEW;
>> END $$ LANGUAGE plpgsql;
>>
>> And manually perform at master two updates inside one transaction:
>>
>> postgres=# begin;
>> BEGIN
>> postgres=# update pgbench_accounts set abalance=abalance+1 where aid=26;
>> UPDATE 1
>> postgres=# update pgbench_accounts set abalance=abalance-1 where aid=26;
>> UPDATE 1
>> postgres=# commit;
>> <hangs>
>>
>> and in replica log we see:
>> 2017-10-02 18:40:26.094 MSK [2954] LOG:  logical replication apply
>> worker for subscription "sub" has started
>> 2017-10-02 18:40:26.101 MSK [2954] ERROR:  attempted to lock invisible
>> tuple
>> 2017-10-02 18:40:26.102 MSK [2882] LOG:  worker process: logical
>> replication worker for subscription 16403 (PID 2954) exited with exit
>> code 1
>>
>> Error happens in trigger.c:
>>
>> #3  0x000000000069bddb in GetTupleForTrigger (estate=0x2e36b50,
>> epqstate=0x7ffc4420eda0, relinfo=0x2dcfe90, tid=0x2dd08ac,
>>     lockmode=LockTupleNoKeyExclusive, newSlot=0x7ffc4420ec40) at
>> trigger.c:3103
>> #4  0x000000000069b259 in ExecBRUpdateTriggers (estate=0x2e36b50,
>> epqstate=0x7ffc4420eda0, relinfo=0x2dcfe90, tupleid=0x2dd08ac,
>>     fdw_trigtuple=0x0, slot=0x2dd0240) at trigger.c:2748
>> #5  0x00000000006d2395 in ExecSimpleRelationUpdate (estate=0x2e36b50,
>> epqstate=0x7ffc4420eda0, searchslot=0x2dd0358, slot=0x2dd0240)
>>     at execReplication.c:461
>> #6  0x0000000000820894 in apply_handle_update (s=0x7ffc442163b0) at
>> worker.c:736
> 
> We have locked the same tuple in RelationFindReplTupleByIndex() just
> before this gets called and didn't get the same error. I guess we do
> something wrong with snapshots. Will need to investigate more.
> 

Okay, so it's not snapshot. It's the fact that we don't set the
es_output_cid in replication worker which GetTupleForTrigger is using
when locking the tuple. Attached one-liner fixes it.

-- 
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Issues with logical replication

From
Masahiko Sawada
Date:
On Wed, Oct 4, 2017 at 8:35 AM, Petr Jelinek
<petr.jelinek@2ndquadrant.com> wrote:
> On 02/10/17 18:59, Petr Jelinek wrote:
>>>
>>> Now fix the trigger function:
>>> CREATE OR REPLACE FUNCTION replication_trigger_proc() RETURNS TRIGGER AS $$
>>> BEGIN
>>>   RETURN NEW;
>>> END $$ LANGUAGE plpgsql;
>>>
>>> And manually perform at master two updates inside one transaction:
>>>
>>> postgres=# begin;
>>> BEGIN
>>> postgres=# update pgbench_accounts set abalance=abalance+1 where aid=26;
>>> UPDATE 1
>>> postgres=# update pgbench_accounts set abalance=abalance-1 where aid=26;
>>> UPDATE 1
>>> postgres=# commit;
>>> <hangs>
>>>
>>> and in replica log we see:
>>> 2017-10-02 18:40:26.094 MSK [2954] LOG:  logical replication apply
>>> worker for subscription "sub" has started
>>> 2017-10-02 18:40:26.101 MSK [2954] ERROR:  attempted to lock invisible
>>> tuple
>>> 2017-10-02 18:40:26.102 MSK [2882] LOG:  worker process: logical
>>> replication worker for subscription 16403 (PID 2954) exited with exit
>>> code 1
>>>
>>> Error happens in trigger.c:
>>>
>>> #3  0x000000000069bddb in GetTupleForTrigger (estate=0x2e36b50,
>>> epqstate=0x7ffc4420eda0, relinfo=0x2dcfe90, tid=0x2dd08ac,
>>>     lockmode=LockTupleNoKeyExclusive, newSlot=0x7ffc4420ec40) at
>>> trigger.c:3103
>>> #4  0x000000000069b259 in ExecBRUpdateTriggers (estate=0x2e36b50,
>>> epqstate=0x7ffc4420eda0, relinfo=0x2dcfe90, tupleid=0x2dd08ac,
>>>     fdw_trigtuple=0x0, slot=0x2dd0240) at trigger.c:2748
>>> #5  0x00000000006d2395 in ExecSimpleRelationUpdate (estate=0x2e36b50,
>>> epqstate=0x7ffc4420eda0, searchslot=0x2dd0358, slot=0x2dd0240)
>>>     at execReplication.c:461
>>> #6  0x0000000000820894 in apply_handle_update (s=0x7ffc442163b0) at
>>> worker.c:736
>>
>> We have locked the same tuple in RelationFindReplTupleByIndex() just
>> before this gets called and didn't get the same error. I guess we do
>> something wrong with snapshots. Will need to investigate more.
>>
>
> Okay, so it's not snapshot. It's the fact that we don't set the
> es_output_cid in replication worker which GetTupleForTrigger is using
> when locking the tuple. Attached one-liner fixes it.
>

Thank you for the patch.
This bug can happen even without the trigger and I confirmed tgat the
bug is fixed by the patch. I think the patch fixed it properly.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issues with logical replication

From
Stas Kelvich
Date:
> On 2 Oct 2017, at 19:59, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote:
>
>>
>> Program terminated with signal SIGABRT, Aborted.
>> #0  0x00007f3608f8ec37 in __GI_raise (sig=sig@entry=6) at
>> ../nptl/sysdeps/unix/sysv/linux/raise.c:56
>> 56    ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
>> (gdb) bt
>> #0  0x00007f3608f8ec37 in __GI_raise (sig=sig@entry=6) at
>> ../nptl/sysdeps/unix/sysv/linux/raise.c:56
>> #1  0x00007f3608f92028 in __GI_abort () at abort.c:89
>> #2  0x00000000009f5740 in ExceptionalCondition (conditionName=0xbf6b30
>> "!(((xid) != ((TransactionId) 0)))",
>>     errorType=0xbf69af "FailedAssertion", fileName=0xbf69a8 "lmgr.c",
>> lineNumber=582) at assert.c:54
>> #3  0x000000000086ac1d in XactLockTableWait (xid=0, rel=0x0, ctid=0x0,
>> oper=XLTW_None) at lmgr.c:582
>> #4  0x000000000081c9f7 in SnapBuildWaitSnapshot (running=0x2749198,
>> cutoff=898498) at snapbuild.c:1400
>> #5  0x000000000081c7c7 in SnapBuildFindSnapshot (builder=0x2807910,
>> lsn=798477760, running=0x2749198) at snapbuild.c:1311
>> #6  0x000000000081c339 in SnapBuildProcessRunningXacts
>> (builder=0x2807910, lsn=798477760, running=0x2749198) at snapbuild.c:1106
>
>
> Hmm this would suggest that XLOG_RUNNING_XACTS record contains invalid
> transaction ids but we specifically skip those in
> GetRunningTransactionData(). Can you check pg_waldump output for that
> record (lsn is shown in the backtrace)?

  I investigated this case and it seems that XactLockTableWait() in SnapBuildWaitSnapshot()
not always work as expected. XactLockTableWait() waits on LockAcquire() for xid to be
completed and if we finally got this lock but transaction is still in progress then such xid
assumed to be a subxid. However LockAcquire/LockRelease cycle can happen after transaction
set xid, but before XactLockTableInsert().

Namely following history happened for xid 5225 and lead to crash:

[backend] LOG:  AssignTransactionId: XactTopTransactionId = 5225
   [walsender] LOG:  LogCurrentRunningXacts: Wrote RUNNING_XACTS xctn=1, xid[0]=5225
   [walsender] LOG:  XactLockTableWait: LockAcquire 5225
   [walsender] LOG:  XactLockTableWait: LockRelease 5225
[backend] LOG:  AssignTransactionId: LockAcquire ExclusiveLock 5225
   [walsender] LOG:  TransactionIdIsInProgress: SVC->latestCompletedXid=5224 < xid=5225 => true
[backend] LOG:  CommitTransaction: ProcArrayEndTransaction xid=5225, ipw=0
[backend] LOG:  CommitTransaction: ResourceOwnerRelease locks xid=5225

I’ve quickly checked other usages of XactLockTableWait() and it seems that it is used mostly with
xids from heap so tx definetly set it lock somewhere in the past.

Not sure what it the best approach to handle that. May be write running xacts only if they already
set their lock?

Also attaching pgbench script that can reproduce failure. I run it over usual pgbench database
in 10 threads. It takes several minutes to crash.


Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Issues with logical replication

From
Robert Haas
Date:
On Mon, Oct 9, 2017 at 9:19 PM, Stas Kelvich <s.kelvich@postgrespro.ru> wrote:
>   I investigated this case and it seems that XactLockTableWait() in SnapBuildWaitSnapshot()
> not always work as expected. XactLockTableWait() waits on LockAcquire() for xid to be
> completed and if we finally got this lock but transaction is still in progress then such xid
> assumed to be a subxid. However LockAcquire/LockRelease cycle can happen after transaction
> set xid, but before XactLockTableInsert().
>
> Namely following history happened for xid 5225 and lead to crash:
>
> [backend] LOG:  AssignTransactionId: XactTopTransactionId = 5225
>    [walsender] LOG:  LogCurrentRunningXacts: Wrote RUNNING_XACTS xctn=1, xid[0]=5225
>    [walsender] LOG:  XactLockTableWait: LockAcquire 5225
>    [walsender] LOG:  XactLockTableWait: LockRelease 5225
> [backend] LOG:  AssignTransactionId: LockAcquire ExclusiveLock 5225
>    [walsender] LOG:  TransactionIdIsInProgress: SVC->latestCompletedXid=5224 < xid=5225 => true
> [backend] LOG:  CommitTransaction: ProcArrayEndTransaction xid=5225, ipw=0
> [backend] LOG:  CommitTransaction: ResourceOwnerRelease locks xid=5225

Ouch.  This seems like a bug that needs to be fixed, but do you think
it's related to to Petr's proposed fix to set es_output_cid?  That fix
looks reasonable, since we shouldn't try to lock tuples without a
valid CommandId.

Now, having said that, I understand how the lack of that fix could cause:

2017-10-02 18:40:26.101 MSK [2954] ERROR:  attempted to lock invisible tuple

But I do not understand how it could cause:

#3  0x000000000086ac1d in XactLockTableWait (xid=0, rel=0x0, ctid=0x0,
oper=XLTW_None) at lmgr.c:582

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [HACKERS] Issues with logical replication

From
Stas Kelvich
Date:
> On 15 Nov 2017, at 23:09, Robert Haas <robertmhaas@gmail.com> wrote:
>
> Ouch.  This seems like a bug that needs to be fixed, but do you think
> it's related to to Petr's proposed fix to set es_output_cid?  That fix
> looks reasonable, since we shouldn't try to lock tuples without a
> valid CommandId.
>
> Now, having said that, I understand how the lack of that fix could cause:
>
> 2017-10-02 18:40:26.101 MSK [2954] ERROR:  attempted to lock invisible tuple
>
> But I do not understand how it could cause:
>
> #3  0x000000000086ac1d in XactLockTableWait (xid=0, rel=0x0, ctid=0x0,
> oper=XLTW_None) at lmgr.c:582
>

I think that’s two separate bugs. Second one is due to race between
XactLockTableWait() and acquiring of heavyweight transaction lock in
AssignTransactionId(). XactLockTableWait assumes that transaction
already set it’s hw transaction lock, but with current machinery of
RunningXacts and snapshot building we can start such waiting before
transaction gets it hw lock.

I can see two ways out of this and both of them are quite unattractive:

1. Create a flag in pgproc indicating that hw lock is acquired and
include in RunningXacts only transaction with this flag set.
That's probably quite expensive since we need to hold one more exclusive
lwlock in AssignTransactionId.

2. Rewrite XactLockTableWait() in a way that it is aware of the fact that
valid xid can be topmost transaction but not yet have a hw lock entry.
In such situation we can just wait and try to lock again. Good stop
condition in this loop is TransactionIdIsInProgress(xid) becoming
false.

I did a sketch of first approach just to confirm that it solves the problem.
But there I hold ProcArrayLock during update of flag. Since only reader is
GetRunningTransactionData it possible to have a custom lock there. In
this case GetRunningTransactionData will hold three locks simultaneously,
since it already holds ProcArrayLock and XidGenLock =)

Any better ideas?

--
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Attachment

Re: [HACKERS] Issues with logical replication

From
Robert Haas
Date:
On Wed, Nov 15, 2017 at 8:20 PM, Stas Kelvich <s.kelvich@postgrespro.ru> wrote:
> I did a sketch of first approach just to confirm that it solves the problem.
> But there I hold ProcArrayLock during update of flag. Since only reader is
> GetRunningTransactionData it possible to have a custom lock there. In
> this case GetRunningTransactionData will hold three locks simultaneously,
> since it already holds ProcArrayLock and XidGenLock =)

To me, it seems like SnapBuildWaitSnapshot() is fundamentally
misdesigned, and ideally Petr (who wrote the patch) or Andres (who
committed it) ought to get involved here and help fix this problem.
My own first inclination would be to rewrite this as a loop: if the
transaction ID precedes the oldest running XID, then continue; else if
TransactionIdDidCommit() || TransactionIdDidAbort() then conclude that
we don't need to wait; else XactLockTableWait() then loop.  That way,
if you hit the race condition, you'll just busy-wait instead of doing
the wrong thing.  Maybe insert a sleep(1) if we retry more than once.
That sucks, of course, but it seems like a better idea than trying to
redesign XactLockTableWait() or the procarray, which could affect an
awful lot of other things.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [HACKERS] Issues with logical replication

From
Andres Freund
Date:
On 2017-11-16 10:36:40 -0500, Robert Haas wrote:
> On Wed, Nov 15, 2017 at 8:20 PM, Stas Kelvich <s.kelvich@postgrespro.ru> wrote:
> > I did a sketch of first approach just to confirm that it solves the problem.
> > But there I hold ProcArrayLock during update of flag. Since only reader is
> > GetRunningTransactionData it possible to have a custom lock there. In
> > this case GetRunningTransactionData will hold three locks simultaneously,
> > since it already holds ProcArrayLock and XidGenLock =)
> 
> To me, it seems like SnapBuildWaitSnapshot() is fundamentally
> misdesigned

Maybe I'm confused, but why is it fundamentally misdesigned? It's not
such an absurd idea to wait for an xid in a WAL record.  I get that
there's a race condition here, which obviously bad, but I don't really
see as evidence of the above claim.

I actually think this code used to be safe because ProcArrayLock used to
be held while generating and logging the running snapshots record.  That
was removed when fixing some other bug, but perhaps that shouldn't have
been done...


> and ideally Petr (who wrote the patch) or Andres (who committed it)
> ought to get involved here and help fix this problem.  My own first
> inclination would be to rewrite this as a loop: if the transaction ID
> precedes the oldest running XID, then continue; else if
> TransactionIdDidCommit() || TransactionIdDidAbort() then conclude that
> we don't need to wait; else XactLockTableWait() then loop.  That way,
> if you hit the race condition, you'll just busy-wait instead of doing
> the wrong thing.  Maybe insert a sleep(1) if we retry more than once.
> That sucks, of course, but it seems like a better idea than trying to
> redesign XactLockTableWait() or the procarray, which could affect an
> awful lot of other things.

Hm. Thinking about this.

Greetings,

Andres Freund


Re: [HACKERS] Issues with logical replication

From
Robert Haas
Date:
On Thu, Nov 16, 2017 at 2:41 PM, Andres Freund <andres@anarazel.de> wrote:
>> To me, it seems like SnapBuildWaitSnapshot() is fundamentally
>> misdesigned
>
> Maybe I'm confused, but why is it fundamentally misdesigned? It's not
> such an absurd idea to wait for an xid in a WAL record.  I get that
> there's a race condition here, which obviously bad, but I don't really
> see as evidence of the above claim.
>
> I actually think this code used to be safe because ProcArrayLock used to
> be held while generating and logging the running snapshots record.  That
> was removed when fixing some other bug, but perhaps that shouldn't have
> been done...

OK.  Well, I might be overstating the case.  My comment about
fundamental misdesign was really just based on the assumption that
XactLockTableWait() could be used to wait for an XID the instant it
was generated.  That was never gonna work and there's no obvious clean
workaround for the problem.  Getting snapshot building to work
properly seems to be Hard (TM).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [HACKERS] Issues with logical replication

From
Michael Paquier
Date:
On Fri, Nov 17, 2017 at 5:12 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Nov 16, 2017 at 2:41 PM, Andres Freund <andres@anarazel.de> wrote:
>>> To me, it seems like SnapBuildWaitSnapshot() is fundamentally
>>> misdesigned
>>
>> Maybe I'm confused, but why is it fundamentally misdesigned? It's not
>> such an absurd idea to wait for an xid in a WAL record.  I get that
>> there's a race condition here, which obviously bad, but I don't really
>> see as evidence of the above claim.
>>
>> I actually think this code used to be safe because ProcArrayLock used to
>> be held while generating and logging the running snapshots record.  That
>> was removed when fixing some other bug, but perhaps that shouldn't have
>> been done...
>
> OK.  Well, I might be overstating the case.  My comment about
> fundamental misdesign was really just based on the assumption that
> XactLockTableWait() could be used to wait for an XID the instant it
> was generated.  That was never gonna work and there's no obvious clean
> workaround for the problem.  Getting snapshot building to work
> properly seems to be Hard (TM).

The patches discussed here deserve tracking, so please note that I
have added an entry in the CF app:
https://commitfest.postgresql.org/16/1381/
-- 
Michael


Re: [HACKERS] Issues with logical replication

From
Craig Ringer
Date:
On 4 October 2017 at 07:35, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote:
On 02/10/17 18:59, Petr Jelinek wrote:
>>
>> Now fix the trigger function:
>> CREATE OR REPLACE FUNCTION replication_trigger_proc() RETURNS TRIGGER AS $$
>> BEGIN
>>   RETURN NEW;
>> END $$ LANGUAGE plpgsql;
>>
>> And manually perform at master two updates inside one transaction:
>>
>> postgres=# begin;
>> BEGIN
>> postgres=# update pgbench_accounts set abalance=abalance+1 where aid=26;
>> UPDATE 1
>> postgres=# update pgbench_accounts set abalance=abalance-1 where aid=26;
>> UPDATE 1
>> postgres=# commit;
>> <hangs>
>>
>> and in replica log we see:
>> 2017-10-02 18:40:26.094 MSK [2954] LOG:  logical replication apply
>> worker for subscription "sub" has started
>> 2017-10-02 18:40:26.101 MSK [2954] ERROR:  attempted to lock invisible
>> tuple
>> 2017-10-02 18:40:26.102 MSK [2882] LOG:  worker process: logical
>> replication worker for subscription 16403 (PID 2954) exited with exit
>> code 1
>>
>> Error happens in trigger.c:
>>
>> #3  0x000000000069bddb in GetTupleForTrigger (estate=0x2e36b50,
>> epqstate=0x7ffc4420eda0, relinfo=0x2dcfe90, tid=0x2dd08ac,
>>     lockmode=LockTupleNoKeyExclusive, newSlot=0x7ffc4420ec40) at
>> trigger.c:3103
>> #4  0x000000000069b259 in ExecBRUpdateTriggers (estate=0x2e36b50,
>> epqstate=0x7ffc4420eda0, relinfo=0x2dcfe90, tupleid=0x2dd08ac,
>>     fdw_trigtuple=0x0, slot=0x2dd0240) at trigger.c:2748
>> #5  0x00000000006d2395 in ExecSimpleRelationUpdate (estate=0x2e36b50,
>> epqstate=0x7ffc4420eda0, searchslot=0x2dd0358, slot=0x2dd0240)
>>     at execReplication.c:461
>> #6  0x0000000000820894 in apply_handle_update (s=0x7ffc442163b0) at
>> worker.c:736
>
> We have locked the same tuple in RelationFindReplTupleByIndex() just
> before this gets called and didn't get the same error. I guess we do
> something wrong with snapshots. Will need to investigate more.
>

Okay, so it's not snapshot. It's the fact that we don't set the
es_output_cid in replication worker which GetTupleForTrigger is using
when locking the tuple. Attached one-liner fixes it.

This seems like a clear-cut bug with a simple fix.

Lets get this committed, so we don't lose it. The rest of the thread is going off into the weeds a bit issues unrelated to the original problem.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Issues with logical replication

From
Simon Riggs
Date:
On 4 October 2017 at 10:35, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote:
> On 02/10/17 18:59, Petr Jelinek wrote:
>>>
>>> Now fix the trigger function:
>>> CREATE OR REPLACE FUNCTION replication_trigger_proc() RETURNS TRIGGER AS $$
>>> BEGIN
>>>   RETURN NEW;
>>> END $$ LANGUAGE plpgsql;
>>>
>>> And manually perform at master two updates inside one transaction:
>>>
>>> postgres=# begin;
>>> BEGIN
>>> postgres=# update pgbench_accounts set abalance=abalance+1 where aid=26;
>>> UPDATE 1
>>> postgres=# update pgbench_accounts set abalance=abalance-1 where aid=26;
>>> UPDATE 1
>>> postgres=# commit;
>>> <hangs>
>>>
>>> and in replica log we see:
>>> 2017-10-02 18:40:26.094 MSK [2954] LOG:  logical replication apply
>>> worker for subscription "sub" has started
>>> 2017-10-02 18:40:26.101 MSK [2954] ERROR:  attempted to lock invisible
>>> tuple
>>> 2017-10-02 18:40:26.102 MSK [2882] LOG:  worker process: logical
>>> replication worker for subscription 16403 (PID 2954) exited with exit
>>> code 1
>>>
>>> Error happens in trigger.c:
>>>
>>> #3  0x000000000069bddb in GetTupleForTrigger (estate=0x2e36b50,
>>> epqstate=0x7ffc4420eda0, relinfo=0x2dcfe90, tid=0x2dd08ac,
>>>     lockmode=LockTupleNoKeyExclusive, newSlot=0x7ffc4420ec40) at
>>> trigger.c:3103
>>> #4  0x000000000069b259 in ExecBRUpdateTriggers (estate=0x2e36b50,
>>> epqstate=0x7ffc4420eda0, relinfo=0x2dcfe90, tupleid=0x2dd08ac,
>>>     fdw_trigtuple=0x0, slot=0x2dd0240) at trigger.c:2748
>>> #5  0x00000000006d2395 in ExecSimpleRelationUpdate (estate=0x2e36b50,
>>> epqstate=0x7ffc4420eda0, searchslot=0x2dd0358, slot=0x2dd0240)
>>>     at execReplication.c:461
>>> #6  0x0000000000820894 in apply_handle_update (s=0x7ffc442163b0) at
>>> worker.c:736
>>
>> We have locked the same tuple in RelationFindReplTupleByIndex() just
>> before this gets called and didn't get the same error. I guess we do
>> something wrong with snapshots. Will need to investigate more.
>>
>
> Okay, so it's not snapshot. It's the fact that we don't set the
> es_output_cid in replication worker which GetTupleForTrigger is using
> when locking the tuple. Attached one-liner fixes it.

Applied, thanks

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] Issues with logical replication

From
Petr Jelinek
Date:
Hi,

(sorry for not being active here, I am still catching up after being
away for some family issues)

On 16/11/17 21:12, Robert Haas wrote:
> On Thu, Nov 16, 2017 at 2:41 PM, Andres Freund <andres@anarazel.de> wrote:
>>> To me, it seems like SnapBuildWaitSnapshot() is fundamentally
>>> misdesigned
>>
>> Maybe I'm confused, but why is it fundamentally misdesigned? It's not
>> such an absurd idea to wait for an xid in a WAL record.  I get that
>> there's a race condition here, which obviously bad, but I don't really
>> see as evidence of the above claim.
>>
>> I actually think this code used to be safe because ProcArrayLock used to
>> be held while generating and logging the running snapshots record.  That
>> was removed when fixing some other bug, but perhaps that shouldn't have
>> been done...
> 
> OK.  Well, I might be overstating the case.  My comment about
> fundamental misdesign was really just based on the assumption that
> XactLockTableWait() could be used to wait for an XID the instant it
> was generated.  That was never gonna work and there's no obvious clean
> workaround for the problem.  Getting snapshot building to work
> properly seems to be Hard (TM).
> 

No kidding, we've been at it since 9.4.

But anyway, while sure we have race condition because
XactLockTableWait() finishes too early, all that should mean is we call
LogStandbySnapshot() too early and as a result taking snapshot (and
hence slot creation) will take longer as we'll wait for next call of
LogStandbySnapshot() from some other caller (because the tx we care
about will still be running). The whole SnapBuildWaitSnapshot() is just
optimization to make slot creation take less time (and also to be able
to write tests).

What I don't understand is how it leads to crash (and I could not
reproduce it using the pgbench file attached in this thread either) and
moreover how it leads to 0 xid being logged. The only explanation I can
come up is that some kind of similar race has to be in
LogStandbySnapshot() but we explicitly check for 0 xid value there.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services


Re: [HACKERS] Issues with logical replication

From
Stas Kelvich
Date:
> On 29 Nov 2017, at 18:46, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote:
>
> What I don't understand is how it leads to crash (and I could not
> reproduce it using the pgbench file attached in this thread either) and
> moreover how it leads to 0 xid being logged. The only explanation I can
> come up is that some kind of similar race has to be in
> LogStandbySnapshot() but we explicitly check for 0 xid value there.
>

Zero xid isn’t logged. Loop in XactLockTableWait() does following:

    for (;;)
    {
        Assert(TransactionIdIsValid(xid));
        Assert(!TransactionIdEquals(xid, GetTopTransactionIdIfAny()));

    <...>

        xid = SubTransGetParent(xid);
    }

So if last statement is reached for top transaction then next iteration
will crash in first assert. And it will be reached if whole this loop
happens before transaction acquired heavyweight lock.

Probability of that crash can be significantly increased be adding sleep
between xid generation and lock insertion in AssignTransactionId().




Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Attachment

Re: [HACKERS] Issues with logical replication

From
Petr Jelinek
Date:
On 29/11/17 20:11, Stas Kelvich wrote:
> 
>> On 29 Nov 2017, at 18:46, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote:
>>
>> What I don't understand is how it leads to crash (and I could not
>> reproduce it using the pgbench file attached in this thread either) and
>> moreover how it leads to 0 xid being logged. The only explanation I can
>> come up is that some kind of similar race has to be in
>> LogStandbySnapshot() but we explicitly check for 0 xid value there.
>>
> 
> Zero xid isn’t logged. Loop in XactLockTableWait() does following:
> 
>     for (;;)
>     {
>         Assert(TransactionIdIsValid(xid));
>         Assert(!TransactionIdEquals(xid, GetTopTransactionIdIfAny()));
> 
>     <...>
>     
>         xid = SubTransGetParent(xid);
>     }
> 
> So if last statement is reached for top transaction then next iteration
> will crash in first assert. And it will be reached if whole this loop
> happens before transaction acquired heavyweight lock.
> 
> Probability of that crash can be significantly increased be adding sleep
> between xid generation and lock insertion in AssignTransactionId().
> 

Yes that helps thanks. Now that I reproduced it I understand, I was
confused by the backtrace that said xid was 0 on the input to
XactLockTableWait() but that's not the case, it's what xid is changed to
in the inner loop.

So what happens is that we manage to do LogStandbySnapshot(), decode the
logged snapshot, and run SnapBuildWaitSnapshot() for a transaction in
between GetNewTransactionId() and XactLockTableInsert() calls in
AssignTransactionId() for that same transaction.

I guess the probability of this happening is increased by the fact that
GetRunningTransactionData() acquires XidGenLock so if there is
GetNewTransactionId() running in parallel it will wait for it to finish
and we'll log immediately after that.

Hmm that means that Robert's loop idea will not help and ProcArrayLock
will not save us either. Maybe we could either rewrite XactLockTableWait
or create another version of it with SubTransGetParent() call replaced
by SubTransGetTopmostTransaction() as that will return the same top
level xid in case the input xid wasn't a subxact. That would make it
safe to be called on transactions that didn't acquire lock on themselves
yet.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services


Re: [HACKERS] Issues with logical replication

From
Andres Freund
Date:
On 2017-11-30 00:25:58 +0100, Petr Jelinek wrote:
> Yes that helps thanks. Now that I reproduced it I understand, I was
> confused by the backtrace that said xid was 0 on the input to
> XactLockTableWait() but that's not the case, it's what xid is changed to
> in the inner loop.

> So what happens is that we manage to do LogStandbySnapshot(), decode the
> logged snapshot, and run SnapBuildWaitSnapshot() for a transaction in
> between GetNewTransactionId() and XactLockTableInsert() calls in
> AssignTransactionId() for that same transaction.
> 
> I guess the probability of this happening is increased by the fact that
> GetRunningTransactionData() acquires XidGenLock so if there is
> GetNewTransactionId() running in parallel it will wait for it to finish
> and we'll log immediately after that.
> 
> Hmm that means that Robert's loop idea will not help and ProcArrayLock
> will not save us either. Maybe we could either rewrite XactLockTableWait
> or create another version of it with SubTransGetParent() call replaced
> by SubTransGetTopmostTransaction() as that will return the same top
> level xid in case the input xid wasn't a subxact. That would make it
> safe to be called on transactions that didn't acquire lock on themselves
> yet.

I've not really looked into this deeply, but afair we can just make this
code accept that edgecase be done with it. As the comment says:
* Iterate through xids in record, wait for all older than the cutoff to* finish.  Then, if possible, log a new
xl_running_xactsrecord.*
 
--- highlight ---* This isn't required for the correctness of decoding, but to:
--- highlight ---* a) allow isolationtester to notice that we're currently waiting for*      something.* b) log a new
xl_running_xactsrecord where it'd be helpful, without having*      to write for bgwriter or checkpointer.
 


Greetings,

Andres Freund


Re: [HACKERS] Issues with logical replication

From
Petr Jelinek
Date:
On 30/11/17 00:40, Andres Freund wrote:
> On 2017-11-30 00:25:58 +0100, Petr Jelinek wrote:
>> Yes that helps thanks. Now that I reproduced it I understand, I was
>> confused by the backtrace that said xid was 0 on the input to
>> XactLockTableWait() but that's not the case, it's what xid is changed to
>> in the inner loop.
> 
>> So what happens is that we manage to do LogStandbySnapshot(), decode the
>> logged snapshot, and run SnapBuildWaitSnapshot() for a transaction in
>> between GetNewTransactionId() and XactLockTableInsert() calls in
>> AssignTransactionId() for that same transaction.
>>
>> I guess the probability of this happening is increased by the fact that
>> GetRunningTransactionData() acquires XidGenLock so if there is
>> GetNewTransactionId() running in parallel it will wait for it to finish
>> and we'll log immediately after that.
>>
>> Hmm that means that Robert's loop idea will not help and ProcArrayLock
>> will not save us either. Maybe we could either rewrite XactLockTableWait
>> or create another version of it with SubTransGetParent() call replaced
>> by SubTransGetTopmostTransaction() as that will return the same top
>> level xid in case the input xid wasn't a subxact. That would make it
>> safe to be called on transactions that didn't acquire lock on themselves
>> yet.
> 
> I've not really looked into this deeply, but afair we can just make this
> code accept that edgecase be done with it. As the comment says:
> 
>  * Iterate through xids in record, wait for all older than the cutoff to
>  * finish.  Then, if possible, log a new xl_running_xacts record.
>  *
> --- highlight ---
>  * This isn't required for the correctness of decoding, but to:
> --- highlight ---
>  * a) allow isolationtester to notice that we're currently waiting for
>  *      something.
>  * b) log a new xl_running_xacts record where it'd be helpful, without having
>  *      to write for bgwriter or checkpointer.
> 

I don't understand. I mean sure the SnapBuildWaitSnapshot() can live
with it, but the problematic logic happens inside the
XactLockTableInsert() and SnapBuildWaitSnapshot() has no way of
detecting the situation short of reimplementing the
XactLockTableInsert() instead of calling it.

The problem is that SubTransGetParent() returns InvalidTransactionId
when the race happens and SubTransGetParen() is called because
XactLockTableInsert() assumes that if transaction lock was acquired and
the xid is still in progress, the input must have been xid of a
subtransaction). This is why I suggested replacing that call with
SubTransGetTopmostTransaction() which will just return same xid it got
on input and we'll simply retry to do the lock instead of crashing.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services


Re: [HACKERS] Issues with logical replication

From
Andres Freund
Date:
On 2017-11-30 00:45:44 +0100, Petr Jelinek wrote:
> I don't understand. I mean sure the SnapBuildWaitSnapshot() can live
> with it, but the problematic logic happens inside the
> XactLockTableInsert() and SnapBuildWaitSnapshot() has no way of
> detecting the situation short of reimplementing the
> XactLockTableInsert() instead of calling it.

Right. But we fairly trivially can change that. I'm remarking on it
because other people's, not yours, suggestions aimed at making this
bulletproof. I just wanted to make clear that I don't think that's
necessary at all.

Greetings,

Andres Freund


Re: [HACKERS] Issues with logical replication

From
Petr Jelinek
Date:
On 30/11/17 00:47, Andres Freund wrote:
> On 2017-11-30 00:45:44 +0100, Petr Jelinek wrote:
>> I don't understand. I mean sure the SnapBuildWaitSnapshot() can live
>> with it, but the problematic logic happens inside the
>> XactLockTableInsert() and SnapBuildWaitSnapshot() has no way of
>> detecting the situation short of reimplementing the
>> XactLockTableInsert() instead of calling it.
> 
> Right. But we fairly trivially can change that. I'm remarking on it
> because other people's, not yours, suggestions aimed at making this
> bulletproof. I just wanted to make clear that I don't think that's
> necessary at all.
> 

Okay, then I guess we are in agreement. I can confirm that the attached
fixes the issue in my tests. Using SubTransGetTopmostTransaction()
instead of SubTransGetParent() means 3 more ifs in terms of extra CPU
cost for other callers. I don't think it's worth worrying about given we
are waiting for heavyweight lock, but if we did we can just inline the
code directly into SnapBuildWaitSnapshot().

-- 
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: [HACKERS] Issues with logical replication

From
Stas Kelvich
Date:
> On 30 Nov 2017, at 03:30, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote:
>
> On 30/11/17 00:47, Andres Freund wrote:
>> On 2017-11-30 00:45:44 +0100, Petr Jelinek wrote:
>>> I don't understand. I mean sure the SnapBuildWaitSnapshot() can live
>>> with it, but the problematic logic happens inside the
>>> XactLockTableInsert() and SnapBuildWaitSnapshot() has no way of
>>> detecting the situation short of reimplementing the
>>> XactLockTableInsert() instead of calling it.
>>
>> Right. But we fairly trivially can change that. I'm remarking on it
>> because other people's, not yours, suggestions aimed at making this
>> bulletproof. I just wanted to make clear that I don't think that's
>> necessary at all.
>>
>
> Okay, then I guess we are in agreement. I can confirm that the attached
> fixes the issue in my tests. Using SubTransGetTopmostTransaction()
> instead of SubTransGetParent() means 3 more ifs in terms of extra CPU
> cost for other callers. I don't think it's worth worrying about given we
> are waiting for heavyweight lock, but if we did we can just inline the
> code directly into SnapBuildWaitSnapshot().
>
> --
>  Petr Jelinek                  http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
> <0001-Make-XactLockTableWait-work-for-transactions-that-ar.patch>

+1

Seems that having busy loop is the best idea out of several discussed.

I thought about small sleep at the bottom of that loop if we reached topmost
transaction, but taking into account low probability of that event may be
it is faster to do just busy wait.

Also some clarifying comment in code would be nice.


Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: [HACKERS] Issues with logical replication

From
Simon Riggs
Date:
On 30 November 2017 at 11:30, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote:
> On 30/11/17 00:47, Andres Freund wrote:
>> On 2017-11-30 00:45:44 +0100, Petr Jelinek wrote:
>>> I don't understand. I mean sure the SnapBuildWaitSnapshot() can live
>>> with it, but the problematic logic happens inside the
>>> XactLockTableInsert() and SnapBuildWaitSnapshot() has no way of
>>> detecting the situation short of reimplementing the
>>> XactLockTableInsert() instead of calling it.
>>
>> Right. But we fairly trivially can change that. I'm remarking on it
>> because other people's, not yours, suggestions aimed at making this
>> bulletproof. I just wanted to make clear that I don't think that's
>> necessary at all.
>>
>
> Okay, then I guess we are in agreement. I can confirm that the attached
> fixes the issue in my tests. Using SubTransGetTopmostTransaction()
> instead of SubTransGetParent() means 3 more ifs in terms of extra CPU
> cost for other callers. I don't think it's worth worrying about given we
> are waiting for heavyweight lock, but if we did we can just inline the
> code directly into SnapBuildWaitSnapshot().

This will still fail an Assert in TransactionIdIsInProgress() when
snapshots are overflowed.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] Issues with logical replication

From
Petr Jelinek
Date:
On 30/11/17 11:48, Simon Riggs wrote:
> On 30 November 2017 at 11:30, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote:
>> On 30/11/17 00:47, Andres Freund wrote:
>>> On 2017-11-30 00:45:44 +0100, Petr Jelinek wrote:
>>>> I don't understand. I mean sure the SnapBuildWaitSnapshot() can live
>>>> with it, but the problematic logic happens inside the
>>>> XactLockTableInsert() and SnapBuildWaitSnapshot() has no way of
>>>> detecting the situation short of reimplementing the
>>>> XactLockTableInsert() instead of calling it.
>>>
>>> Right. But we fairly trivially can change that. I'm remarking on it
>>> because other people's, not yours, suggestions aimed at making this
>>> bulletproof. I just wanted to make clear that I don't think that's
>>> necessary at all.
>>>
>>
>> Okay, then I guess we are in agreement. I can confirm that the attached
>> fixes the issue in my tests. Using SubTransGetTopmostTransaction()
>> instead of SubTransGetParent() means 3 more ifs in terms of extra CPU
>> cost for other callers. I don't think it's worth worrying about given we
>> are waiting for heavyweight lock, but if we did we can just inline the
>> code directly into SnapBuildWaitSnapshot().
> 
> This will still fail an Assert in TransactionIdIsInProgress() when
> snapshots are overflowed.
> 

Hmm, which one, why?

I see 2 Asserts there, one is:
>         Assert(nxids == 0);
Which is inside the RecoveryInProgress(), surely on standbys there will
still be no PGXACTs with assigned xids so that should be fine.

The other one is:
>     Assert(TransactionIdIsValid(topxid));
Which should be again fine toplevel xid of toplevel xid is same xid
which is a valid one.

So I think we should be fine there.

-- 
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Issues with logical replication

From
Alvaro Herrera
Date:
Stas Kelvich wrote:

> Seems that having busy loop is the best idea out of several discussed.
> 
> I thought about small sleep at the bottom of that loop if we reached topmost
> transaction, but taking into account low probability of that event may be
> it is faster to do just busy wait.

In other locations where we do similar things we have 1ms sleeps.  I
agree with the need for a comment here.

Proposed patch attached.  I tried your reload.pgb test case in 9.4
(after changing pgoutput to test_decoding and removing the 3rd arg to a
function call) and the crash takes about 3 seconds without patch in my
machine.  No crash with this patch.

Will push this shortly after lunch.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: [HACKERS] Issues with logical replication

From
Alvaro Herrera
Date:
Alvaro Herrera wrote:

> Will push this shortly after lunch.

Pushed.  Will you (Konstantin, Stas, Masahiko) please verify that after
this commit all the problems reported with logical replication are
fixed?

Thanks

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] Issues with logical replication

From
Stas Kelvich
Date:
> On 3 Jan 2018, at 23:35, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> 
> Pushed.  Will you (Konstantin, Stas, Masahiko) please verify that after
> this commit all the problems reported with logical replication are
> fixed?

Checked that with and without extra sleep in AssignTransactionId(). In both
cases patch works.

Thanks!

--
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company