Re: Replication slot stats misgivings - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: Replication slot stats misgivings |
Date | |
Msg-id | CAA4eK1K6-8pyt6prAP44tjnw4m4KP=1_EUMOH97nfP0TqUOJkg@mail.gmail.com Whole thread Raw |
In response to | Re: Replication slot stats misgivings (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: Replication slot stats misgivings
|
List | pgsql-hackers |
On Tue, Apr 27, 2021 at 5:40 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Apr 27, 2021 at 8:58 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > I have pushed this patch and seeing one buildfarm failure: > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mandrill&dt=2021-04-27%2009%3A23%3A14 > > starting permutation: s1_init s1_begin s1_insert_tbl1 s1_insert_tbl2 > s2_alter_tbl1_char s1_commit s2_get_changes > + isolationtester: canceling step s1_init after 314 seconds > step s1_init: SELECT 'init' FROM > pg_create_logical_replication_slot('isolation_slot', 'test_decoding'); > ?column? > > I am analyzing this. > After checking below logs corresponding to this test, it seems test has been executed and create_slot was successful: 2021-04-27 11:06:43.770 UTC [17694956:52] isolation/concurrent_ddl_dml STATEMENT: SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 'test_decoding'); 2021-04-27 11:07:11.748 UTC [5243096:9] LOG: checkpoint starting: time 2021-04-27 11:09:24.332 UTC [5243096:10] LOG: checkpoint complete: wrote 14 buffers (0.1%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.716 s, sync=0.001 s, total=132.584 s; sync files=0, longest=0.000 s, average=0.000 s; distance=198 kB, estimate=406 kB 2021-04-27 11:09:40.116 UTC [6226046:1] [unknown] LOG: connection received: host=[local] 2021-04-27 11:09:40.117 UTC [17694956:53] isolation/concurrent_ddl_dml LOG: statement: BEGIN; 2021-04-27 11:09:40.117 UTC [17694956:54] isolation/concurrent_ddl_dml LOG: statement: INSERT INTO tbl1 (val1, val2) VALUES (1, 1); 2021-04-27 11:09:40.118 UTC [17694956:55] isolation/concurrent_ddl_dml LOG: statement: INSERT INTO tbl2 (val1, val2) VALUES (1, 1); 2021-04-27 11:09:40.119 UTC [10944636:49] isolation/concurrent_ddl_dml LOG: statement: ALTER TABLE tbl1 ALTER COLUMN val2 TYPE character varying; I am not sure but there is some possibility that even though create slot is successful, the isolation tester got successful in canceling it, maybe because create_slot is just finished at the same time. As we can see from logs, during this test checkpoint also happened which could also lead to the slowness of this particular command. Also, I see a lot of messages like below which indicate stats collector is also quite slow: 2021-04-27 10:57:59.385 UTC [18743536:1] LOG: using stale statistics instead of current ones because stats collector is not responding I am not sure if the timeout happened because the machine is slow or is it in any way related to code. I am seeing some previous failures due to timeout on this machine [1][2]. In those failures, I see the "using stale stats...." message. Also, I am not able to see why it can fail due to this patch? [1] - https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mandrill&dt=2021-02-23%2004%3A23%3A56 [2] - https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mandrill&dt=2020-12-24%2005%3A31%3A43 -- With Regards, Amit Kapila.
pgsql-hackers by date: