Indeed, I can't reproduce this with (our) checksums on. I also can't
reproduce it with O_DIRECT off. I also can't reproduce it if I use
"mkdir pgdata && chattr +C pgdata && initdb -D pgdata" to have a
pgdata directory with copy-on-write and (their) checksums disabled.
But it reproduces quite easily with COW on (default behaviour) with
io_direct=data, debug_parallel_query=debug, create table as ...;
update ...; select count(*) ...; from that test.
Unfortunately my mental model of btrfs is extremely limited, basically
just "something a bit like ZFS". FWIW I've been casually following
along with OpenZFS's ongoing O_DIRECT project, and I know that the
plan there is to make a temporary stable copy if checksums and other
features are on (a bit like PostgreSQL does for the same reason, as
you reminded us). Time will tell how that works out but it *seems*
like all available modes would therefore work correctly for us, with
different tradeoffs (ie if you want the fastest zero-copy I/O, don't
use checksums, compression, etc).
Here, btrfs seems to be taking a different path that I can't quite
make out... I see no warning/error about a checksum failure like [1],
and we apparently managed to read something other than a mix of the
old and new page contents (which, based on your hypothesis, should
just leave it indeterminate whether the hint bit changes were captured
or not, and the rest of the page should be stable, right). It's like
the page time-travelled or got scrambled in some other way, but it
didn't tell us? I'll try to dig further...
[1] https://archive.kernel.org/oldwiki/btrfs.wiki.kernel.org/index.php/Gotchas.html#Direct_IO_and_CRCs