On Sun, Apr 9, 2023 at 9:10 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> 2023-04-08 16:50:03.177 EDT [2023-04-08 16:50:03 EDT 3257645:3] 004_io_direct.pl LOG: statement: select count(*)
fromt1
> 2023-04-08 16:50:03.316 EDT [2023-04-08 16:50:03 EDT 3257646:1] ERROR: invalid page in block 56 of relation
base/5/16384
> The fact that the error is happening in a parallel worker seems
> interesting ...
That's because it's running with debug_parallel_query=regress. I've
been trying to repro that but no luck... A different kind of failure
also showed up, where it counted the wrong number of tuples:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=crake&dt=2023-04-08%2015%3A52%3A03
A paranoid explanation would be that this system is failing to provide
basic I/O coherency, we're writing pages out and not reading them back
in. Or of course there is a dumb bug... but why only here? Can of
course be timing-sensitive and it's interesting that crake suffers
from the "no unpinned buffers available" thing (which should now be
gone) with higher frequency; I'm keen to see if the dodgy-read problem
continues with a similar frequency now.