[HACKERS] Issues with logical replication - Mailing list pgsql-hackers
From | Konstantin Knizhnik |
---|---|
Subject | [HACKERS] Issues with logical replication |
Date | |
Msg-id | ab31b954-34b8-763b-686d-5d1e7960b458@postgrespro.ru Whole thread Raw |
Responses |
Re: [HACKERS] Issues with logical replication
|
List | pgsql-hackers |
I have faced two issues with logical replication. Reproducing scenario: 1. start two Postgres instances (I start both at local machine). 2. Initialize pgbench tables at both instances: pgbench -i -s 10 postgres 3. Create publication of pgbench_accounts table at one node: create publication pub for table pgbench_accounts; 4. Create subscription at another node: delete from pgbench_accounts; CREATE SUBSCRIPTION sub connection 'dbname=postgreshost=localhost port=5432 sslmode=disable' publication pub; CREATE OR REPLACE FUNCTION replication_trigger_proc() RETURNS TRIGGER AS $$ BEGIN -- RETURN NEW; END $$ LANGUAGE plpgsql; CREATE TRIGGER replication_trigger BEFORE INSERTOR UPDATE OR DELETE ON pgbench_accounts FOR EACH ROW EXECUTE PROCEDURE replication_trigger_proc(); ALTER TABLE pgbench_accounts ENABLE ALWAYS TRIGGER replication_trigger; 5. Start pgbench at primary node: pgbench -T 1000 -P 2 -c 10 postgres Please notice commented "return new" statement. Invocation of this function cause error and in log we see repeated messages: 2017-10-02 16:47:46.764 MSK [32129] LOG: logical replication table synchronization worker for subscription "sub", table "pgbench_accounts" has started 2017-10-02 16:47:46.771 MSK [32129] ERROR: control reached end of trigger procedure without RETURN 2017-10-02 16:47:46.771 MSK [32129] CONTEXT: PL/pgSQL function replication_trigger_proc() COPY pgbench_accounts, line 1: "1 1 0 " 2017-10-02 16:47:46.772 MSK [28020] LOG: worker process: logical replication worker for subscription 17264 sync 17251 (PID 32129) exited with exit code 1 ... After few minutes of work primary node (with publication) is crashed with the following stack trace: Program terminated with signal SIGABRT, Aborted. #0 0x00007f3608f8ec37 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 56 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory. (gdb) bt #0 0x00007f3608f8ec37 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 #1 0x00007f3608f92028 in __GI_abort () at abort.c:89 #2 0x00000000009f5740 in ExceptionalCondition (conditionName=0xbf6b30 "!(((xid) != ((TransactionId) 0)))", errorType=0xbf69af "FailedAssertion", fileName=0xbf69a8 "lmgr.c", lineNumber=582) at assert.c:54 #3 0x000000000086ac1d in XactLockTableWait (xid=0, rel=0x0, ctid=0x0, oper=XLTW_None) at lmgr.c:582 #4 0x000000000081c9f7 in SnapBuildWaitSnapshot (running=0x2749198, cutoff=898498) at snapbuild.c:1400 #5 0x000000000081c7c7 in SnapBuildFindSnapshot (builder=0x2807910, lsn=798477760, running=0x2749198) at snapbuild.c:1311 #6 0x000000000081c339 in SnapBuildProcessRunningXacts (builder=0x2807910, lsn=798477760, running=0x2749198) at snapbuild.c:1106 #7 0x000000000080a1c7 in DecodeStandbyOp (ctx=0x27ef870, buf=0x7ffd301858d0) at decode.c:314 #8 0x0000000000809ce1 in LogicalDecodingProcessRecord (ctx=0x27ef870, record=0x27efb30) at decode.c:117 #9 0x000000000080ddf9 in DecodingContextFindStartpoint (ctx=0x27ef870) at logical.c:458 #10 0x0000000000823968 in CreateReplicationSlot (cmd=0x27483a8) at walsender.c:934 #11 0x00000000008246ee in exec_replication_command ( cmd_string=0x27b9520 "CREATE_REPLICATION_SLOT \"sub_17264_sync_17251\" TEMPORARY LOGICAL pgoutput USE_SNAPSHOT") at walsender.c:1511 #12 0x000000000088eb44 in PostgresMain (argc=1, argv=0x275b738, dbname=0x275b578 "postgres", username=0x272b800 "knizhnik") at postgres.c:4086 #13 0x00000000007ef9a1 in BackendRun (port=0x27532a0) at postmaster.c:4357 #14 0x00000000007ef0cb in BackendStartup (port=0x27532a0) at postmaster.c:4029 #15 0x00000000007eb68b in ServerLoop () at postmaster.c:1753 #16 0x00000000007eac77 in PostmasterMain (argc=3, argv=0x2729670) at postmaster.c:1361 #17 0x0000000000728552 in main (argc=3, argv=0x2729670) at main.c:228 Now fix the trigger function: CREATE OR REPLACE FUNCTION replication_trigger_proc() RETURNS TRIGGER AS $$ BEGIN RETURN NEW; END $$ LANGUAGE plpgsql; And manually perform at master two updates inside one transaction: postgres=# begin; BEGIN postgres=# update pgbench_accounts set abalance=abalance+1 where aid=26; UPDATE 1 postgres=# update pgbench_accounts set abalance=abalance-1 where aid=26; UPDATE 1 postgres=# commit; <hangs> and in replica log we see: 2017-10-02 18:40:26.094 MSK [2954] LOG: logical replication apply worker for subscription "sub" has started 2017-10-02 18:40:26.101 MSK [2954] ERROR: attempted to lock invisible tuple 2017-10-02 18:40:26.102 MSK [2882] LOG: worker process: logical replication worker for subscription 16403 (PID 2954) exited with exit code 1 Error happens in trigger.c: #3 0x000000000069bddb in GetTupleForTrigger (estate=0x2e36b50, epqstate=0x7ffc4420eda0, relinfo=0x2dcfe90, tid=0x2dd08ac, lockmode=LockTupleNoKeyExclusive, newSlot=0x7ffc4420ec40) at trigger.c:3103 #4 0x000000000069b259 in ExecBRUpdateTriggers (estate=0x2e36b50, epqstate=0x7ffc4420eda0, relinfo=0x2dcfe90, tupleid=0x2dd08ac, fdw_trigtuple=0x0, slot=0x2dd0240) at trigger.c:2748 #5 0x00000000006d2395 in ExecSimpleRelationUpdate (estate=0x2e36b50, epqstate=0x7ffc4420eda0, searchslot=0x2dd0358, slot=0x2dd0240) at execReplication.c:461 #6 0x0000000000820894 in apply_handle_update (s=0x7ffc442163b0) at worker.c:736 #7 0x0000000000820d83 in apply_dispatch (s=0x7ffc442163b0) at worker.c:892 #8 0x000000000082121b in LogicalRepApplyLoop (last_received=2225693736) at worker.c:1095 #9 0x000000000082219d in ApplyWorkerMain (main_arg=0) at worker.c:1647 #10 0x00000000007dd496 in StartBackgroundWorker () at bgworker.c:835 #11 0x00000000007f0889 in do_start_bgworker (rw=0x2d75f50) at postmaster.c:5680 #12 0x00000000007f0bc3 in maybe_start_bgworkers () at postmaster.c:5884 #13 0x00000000007efbeb in sigusr1_handler (postgres_signal_arg=10) at postmaster.c:5073 #14 <signal handler called> #15 0x00007fbe83a054a3 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:81 #16 0x00000000007eb517 in ServerLoop () at postmaster.c:1717 #17 0x00000000007eac48 in PostmasterMain (argc=3, argv=0x2d4e660) at postmaster.c:1361 #18 0x0000000000728523 in main (argc=3, argv=0x2d4e660) at main.c:228 -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
pgsql-hackers by date: