On 2023-04-08 Sa 17:23, Andres Freund wrote:
Hi,
On 2023-04-08 17:10:19 -0400, Tom Lane wrote:
Thomas Munro <thomas.munro@gmail.com> writes:
Now crake is doing this:
2023-04-08 16:50:03.177 EDT [2023-04-08 16:50:03 EDT 3257645:3] 004_io_direct.pl LOG: statement: select count(*) from t1
2023-04-08 16:50:03.316 EDT [2023-04-08 16:50:03 EDT 3257646:1] ERROR: invalid page in block 56 of relation base/5/16384
2023-04-08 16:50:03.316 EDT [2023-04-08 16:50:03 EDT 3257646:2] STATEMENT: select count(*) from t1
2023-04-08 16:50:03.317 EDT [2023-04-08 16:50:03 EDT 3257645:4] 004_io_direct.pl ERROR: invalid page in block 56 of relation base/5/16384
2023-04-08 16:50:03.317 EDT [2023-04-08 16:50:03 EDT 3257645:5] 004_io_direct.pl STATEMENT: select count(*) from t1
2023-04-08 16:50:03.319 EDT [2023-04-08 16:50:02 EDT 3257591:4] LOG: background worker "parallel worker" (PID 3257646) exited with exit code 1
The fact that the error is happening in a parallel worker seems
interesting ...
There were a few prior instances of that error. One that I hadn't seen before
is this:
[11:35:07.190](0.001s) # Failed test 'read back from shared'
# at /home/andrew/bf/root/HEAD/pgsql/src/test/modules/test_misc/t/004_io_direct.pl line 43.
[11:35:07.190](0.000s) # got: '10000'
# expected: '10098'
For one it points to the arguments to is() being switched around, but that's a
sideshow.
It's also odd that it's just crake having the issue. It's just a linux host,
afaics. Andrew, is there any chance you can run that test in isolation and see
whether it reproduces? If so, does the problem vanish, if you comment out the
io_direct= in the test? Curious whether this is actually an O_DIRECT issue, or
whether it's an independent issue exposed by the new test.
I wonder if we should make the test use data checksum - if we continue to see
the wrong query results, the corruption is more likely to be in memory.
I can run the test in isolation, and it's get an error reliably.
cheers
andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.com