More info on pg_stat_activity Wait Event Name when is DataFileRead - Mailing list pgsql-hackers
From | PegoraroF10 |
---|---|
Subject | More info on pg_stat_activity Wait Event Name when is DataFileRead |
Date | |
Msg-id | 1618596859756-0.post@n3.nabble.com Whole thread Raw |
Responses |
Re: More info on pg_stat_activity Wait Event Name when is DataFileRead
|
List | pgsql-hackers |
I have a Publication/Subscription. Then 10 days ago my replication just stopped, no one interesting message log on both sides. Then select from pg_stat_replication was empty and some time later, an hour or two, pg_stat_replication came back showing a record like it should. Well, it´s replicating ? No, no one new record on replica, some more time and pg_stat_replication was empty again. The only log was created was on primary, initially wal_sender_timeout was 15 minutes. I tried changing it to 1 hour, maybe is a huge transaction, but no, the problem remains the same. When I set timeout to 0 on Sender and Receiver, then I see that on replica pg_stat_activity was DataFileRead on Wait Event Name field and IO on Wait Event Type field, and that status remains the same for hours. Documentation says DataFileRead is Waiting for a read from a relation data file, so it does not specify which file is being waited. So, there is a problem on some table on my replica but ... what file is this ? Well, it´ll spend a lot of time but lets do a vacuum full. Then it took for 1 or 2 days but found two corrupted tables. Those tables I couldn´t do select or dump, so I just droped them and then finally replication came back. This long explaining was only to show, at least for me, that would be desirable to have an additional information when Postgres is waiting for a file. What if DataFileRead showing relfilenode it´s waiting for ? These logs are on primary, on replica I didn´t find any. And strangely, those errors did not occur every wal_sender_timeout time, even when was configured with 15 minutes they occured 2 or 3 times a day only. log_time;user_name;session_start_time;error_severity;message 2021-04-06 16:21:23.385;replicate;2021-04-06 16:21:23.000;LOG;starting logical decoding for slot sub_google_bkp 2021-04-06 16:21:23.386;replicate;2021-04-06 16:21:23.000;LOG;logical decoding found consistent point at 5C1/5C7A3D48 2021-04-06 16:36:24.175;replicate;2021-04-06 16:21:23.000;LOG;terminating walsender process due to replication timeout 2021-04-06 17:47:15.744;replicate;2021-04-06 17:47:15.000;LOG;starting logical decoding for slot sub_google_bkp 2021-04-06 17:47:15.745;replicate;2021-04-06 17:47:15.000;LOG;logical decoding found consistent point at 5C1/5CBFBF38 2021-04-06 18:02:15.757;replicate;2021-04-06 17:47:15.000;LOG;terminating walsender process due to replication timeout 2021-04-07 11:59:27.831;replicate;2021-04-07 11:59:27.000;LOG;starting logical decoding for slot sub_google_bkp 2021-04-07 11:59:27.832;replicate;2021-04-07 11:59:27.000;LOG;logical decoding found consistent point at 5C1/5CBFBF38 2021-04-07 12:14:27.867;replicate;2021-04-07 11:59:27.000;LOG;terminating walsender process due to replication timeout 2021-04-07 21:45:22.230;replicate;2021-04-07 21:45:22.000;LOG;starting logical decoding for slot sub_google_bkp 2021-04-07 21:45:22.231;replicate;2021-04-07 21:45:22.000;LOG;logical decoding found consistent point at 5C1/5CBFD438 2021-04-07 22:45:22.586;replicate;2021-04-07 21:45:22.000;LOG;terminating walsender process due to replication timeout 2021-04-08 00:06:26.253;replicate;2021-04-08 00:06:25.000;LOG;starting logical decoding for slot sub_google_bkp 2021-04-08 00:06:26.255;replicate;2021-04-08 00:06:25.000;LOG;logical decoding found consistent point at 5C1/5CCF7E20 2021-04-08 02:15:10.342;replicate;2021-04-08 00:06:25.000;LOG;terminating walsender process due to replication timeout -- Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
pgsql-hackers by date: