Hello Postgres Hackers -We are having a reoccurring issue on 2 of our replicas where replication stops due to this message:"incorrect resource manager data checksum in record at ..."This has been occurring on average once every 1 to 2 weeks during large data imports (100s of GBs being written)on one of two replicas.Fixing the issue has been relatively straight forward: shutdown replica, remove the bad wal file, restart replica andthe good wal file is retrieved from the master.We are doing streaming replication using replication slots.However twice now, the master had already removed the WAL file so the file had to retrieved from the wal archive.The WAL log directories on the master and the replicas are on ZFS file systems.All servers are running RHEL 7.7 (Maipo)PostgreSQL 10.11ZFS v0.7.13-1The issue seems similar to https://www.postgresql.org/message-id/CANQ55Tsoa6%3Dvk2YkeVUN7qO-2YdqJf_AMVQxqsVTYJm0qqQQuw%40mail.gmail.com and to https://github.com/timescale/timescaledb/issues/1443One quirk in our ZFS setup is ZFS is not handling our RAID array, so ZFS sees our array as a single device.....<snip>
pgsql-hackers by date:
Соглашаюсь с условиями обработки персональных данных