Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions |
Date | |
Msg-id | 929224.1600046876@sss.pgh.pa.us Whole thread Raw |
In response to | Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
|
List | pgsql-hackers |
I wrote: > Probably this requires a relcache inval at the wrong time; > although we have recent passes from CLOBBER_CACHE_ALWAYS animals, > so that can't be the whole triggering condition. I wonder whether > it is relevant that all of the complaining animals are JIT-enabled. Hmmm ... I take that back. hyrax has indeed passed since this went in, but *it doesn't run any TAP tests*. So the buildfarm offers no information about whether the replication tests work under CLOBBER_CACHE_ALWAYS. Realizing that, I built an installation that way and tried to run the subscription tests. Results so far: * Running 010_truncate.pl by itself passed for me. So there's still some unexplained factor needed to trigger the buildfarm failures. (I'm wondering about concurrent autovacuum activity now...) * Starting over, it appears that 001_rep_changes.pl almost immediately gets into an infinite loop. It does not complete the third test step, rather infinitely waiting for progress to be made. The publisher log shows a repeating loop like 2020-09-13 21:16:05.734 EDT [928529] tap_sub LOG: could not send data to client: Broken pipe 2020-09-13 21:16:05.734 EDT [928529] tap_sub CONTEXT: slot "tap_sub", output plugin "pgoutput", in the commit callback,associated LSN 0/1660628 2020-09-13 21:16:05.843 EDT [928581] 001_rep_changes.pl LOG: statement: SELECT pg_current_wal_lsn() <= replay_lsn AND state= 'streaming' FROM pg_catalog.pg_stat_replication WHERE application_name = 'tap_sub'; 2020-09-13 21:16:05.861 EDT [928582] tap_sub LOG: statement: SELECT pg_catalog.set_config('search_path', '', false); 2020-09-13 21:16:05.929 EDT [928582] tap_sub LOG: received replication command: IDENTIFY_SYSTEM 2020-09-13 21:16:05.930 EDT [928582] tap_sub LOG: received replication command: START_REPLICATION SLOT "tap_sub" LOGICAL0/1652820 (proto_version '2', publication_names '"tap_pub","tap_pub_ins_only"') 2020-09-13 21:16:05.930 EDT [928582] tap_sub LOG: starting logical decoding for slot "tap_sub" 2020-09-13 21:16:05.930 EDT [928582] tap_sub DETAIL: Streaming transactions committing after 0/1652820, reading WAL from0/1651B20. 2020-09-13 21:16:05.930 EDT [928582] tap_sub LOG: logical decoding found consistent point at 0/1651B20 2020-09-13 21:16:05.930 EDT [928582] tap_sub DETAIL: There are no running transactions. 2020-09-13 21:16:21.560 EDT [928600] 001_rep_changes.pl LOG: statement: SELECT pg_current_wal_lsn() <= replay_lsn AND state= 'streaming' FROM pg_catalog.pg_stat_replication WHERE application_name = 'tap_sub'; 2020-09-13 21:16:37.291 EDT [928610] 001_rep_changes.pl LOG: statement: SELECT pg_current_wal_lsn() <= replay_lsn AND state= 'streaming' FROM pg_catalog.pg_stat_replication WHERE application_name = 'tap_sub'; 2020-09-13 21:16:52.959 EDT [928627] 001_rep_changes.pl LOG: statement: SELECT pg_current_wal_lsn() <= replay_lsn AND state= 'streaming' FROM pg_catalog.pg_stat_replication WHERE application_name = 'tap_sub'; 2020-09-13 21:17:06.866 EDT [928636] tap_sub LOG: statement: SELECT pg_catalog.set_config('search_path', '', false); 2020-09-13 21:17:06.934 EDT [928636] tap_sub LOG: received replication command: IDENTIFY_SYSTEM 2020-09-13 21:17:06.934 EDT [928636] tap_sub LOG: received replication command: START_REPLICATION SLOT "tap_sub" LOGICAL0/1652820 (proto_version '2', publication_names '"tap_pub","tap_pub_ins_only"') 2020-09-13 21:17:06.934 EDT [928636] tap_sub ERROR: replication slot "tap_sub" is active for PID 928582 2020-09-13 21:17:07.811 EDT [928638] tap_sub LOG: statement: SELECT pg_catalog.set_config('search_path', '', false); 2020-09-13 21:17:07.880 EDT [928638] tap_sub LOG: received replication command: IDENTIFY_SYSTEM 2020-09-13 21:17:07.881 EDT [928638] tap_sub LOG: received replication command: START_REPLICATION SLOT "tap_sub" LOGICAL0/1652820 (proto_version '2', publication_names '"tap_pub","tap_pub_ins_only"') 2020-09-13 21:17:07.881 EDT [928638] tap_sub ERROR: replication slot "tap_sub" is active for PID 928582 2020-09-13 21:17:08.618 EDT [928641] 001_rep_changes.pl LOG: statement: SELECT pg_current_wal_lsn() <= replay_lsn AND state= 'streaming' FROM pg_catalog.pg_stat_replication WHERE application_name = 'tap_sub'; 2020-09-13 21:17:08.753 EDT [928642] tap_sub LOG: statement: SELECT pg_catalog.set_config('search_path', '', false); 2020-09-13 21:17:08.821 EDT [928642] tap_sub LOG: received replication command: IDENTIFY_SYSTEM 2020-09-13 21:17:08.821 EDT [928642] tap_sub LOG: received replication command: START_REPLICATION SLOT "tap_sub" LOGICAL0/1652820 (proto_version '2', publication_names '"tap_pub","tap_pub_ins_only"') 2020-09-13 21:17:08.821 EDT [928642] tap_sub ERROR: replication slot "tap_sub" is active for PID 928582 2020-09-13 21:17:09.689 EDT [928645] tap_sub LOG: statement: SELECT pg_catalog.set_config('search_path', '', false); 2020-09-13 21:17:09.756 EDT [928645] tap_sub LOG: received replication command: IDENTIFY_SYSTEM 2020-09-13 21:17:09.757 EDT [928645] tap_sub LOG: received replication command: START_REPLICATION SLOT "tap_sub" LOGICAL0/1652820 (proto_version '2', publication_names '"tap_pub","tap_pub_ins_only"') 2020-09-13 21:17:09.757 EDT [928645] tap_sub ERROR: replication slot "tap_sub" is active for PID 928582 2020-09-13 21:17:09.841 EDT [928582] tap_sub LOG: could not send data to client: Broken pipe 2020-09-13 21:17:09.841 EDT [928582] tap_sub CONTEXT: slot "tap_sub", output plugin "pgoutput", in the commit callback,associated LSN 0/1660628 while the subscriber is repeating 2020-09-13 21:15:01.598 EDT [928528] LOG: logical replication apply worker for subscription "tap_sub" has started 2020-09-13 21:16:02.178 EDT [928528] ERROR: terminating logical replication worker due to timeout 2020-09-13 21:16:02.179 EDT [920797] LOG: background worker "logical replication worker" (PID 928528) exited with exit code1 2020-09-13 21:16:02.606 EDT [928571] LOG: logical replication apply worker for subscription "tap_sub" has started 2020-09-13 21:16:03.117 EDT [928571] ERROR: could not start WAL streaming: ERROR: replication slot "tap_sub" is activefor PID 928529 2020-09-13 21:16:03.118 EDT [920797] LOG: background worker "logical replication worker" (PID 928571) exited with exit code1 2020-09-13 21:16:03.544 EDT [928574] LOG: logical replication apply worker for subscription "tap_sub" has started 2020-09-13 21:16:04.053 EDT [928574] ERROR: could not start WAL streaming: ERROR: replication slot "tap_sub" is activefor PID 928529 2020-09-13 21:16:04.054 EDT [920797] LOG: background worker "logical replication worker" (PID 928574) exited with exit code1 2020-09-13 21:16:04.479 EDT [928576] LOG: logical replication apply worker for subscription "tap_sub" has started 2020-09-13 21:16:04.990 EDT [928576] ERROR: could not start WAL streaming: ERROR: replication slot "tap_sub" is activefor PID 928529 2020-09-13 21:16:04.990 EDT [920797] LOG: background worker "logical replication worker" (PID 928576) exited with exit code1 2020-09-13 21:16:05.415 EDT [928579] LOG: logical replication apply worker for subscription "tap_sub" has started 2020-09-13 21:17:05.994 EDT [928579] ERROR: terminating logical replication worker due to timeout I'm out of patience to investigate this for tonight, but there is something extremely broken here; maybe more than one something. regards, tom lane
pgsql-hackers by date: