On Sun, 2025-09-14 at 10:30 +0200, Pawel Kudzia wrote:
> I've hit a silent data corruption for TOAST data - leading to some
> infinite loop when accessing bytea column for very particular row. I
> did not suffer data loss - data from streaming replica was fine, I've
> used it to rebuild the main server.
>
> I'm wondering if there's any proactive way of detecting that type of
> an issue rather than discovering pile-up of SELECT queries leading to
> CPU starvation or finding hanged backup jobs.
>
> Thanks in advance for your suggestions!
>
> I was originally running PostgreSQL 17.2 installed from project's deb
> packages, under Debian 12 on amd64. Environment is - without any
> recent crashes, with ECC memory and server-grade hardware. I run
> pg_dumpall every 24h, on a certain day it hanged. On re-run it hanged
> again, each time leaving single PosgreSQL process using 100% of single
> CPU core.
>
> [...]
>
> PostgreSQL's log does not have any error messages indicating data corruption.
>
> This server does not have checksums enabled, but - as I understand -
> such checksums cannot be checked online anyway. So - how can I detect
> similar corruption?
A proactive way of detecting TOAST corruption... how about
pg_dump -f /dev/null yourdatabase
If there is TOAST corruption, that should give you an error.
If your standby does not have the same problem, that increases the
likelihood that the cause is a hardware problem. Of course, it could
still be a software bug.
Yours,
Laurenz Albe