Re: Non-reproducible AIO failure - Mailing list pgsql-hackers
From | Alexander Lakhin |
---|---|
Subject | Re: Non-reproducible AIO failure |
Date | |
Msg-id | 5637b54e-fc7e-4b58-a803-b39d56d71750@gmail.com Whole thread Raw |
In response to | Re: Non-reproducible AIO failure (Andres Freund <andres@anarazel.de>) |
List | pgsql-hackers |
Hello hackers,
27.05.2025 16:35, Andres Freund пишет:
27.05.2025 16:35, Andres Freund пишет:
On 2025-05-25 20:05:49 -0400, Tom Lane wrote:Thomas Munro <thomas.munro@gmail.com> writes:Could you guys please share your exact repro steps?I've just been running 027_stream_regress.pl over and over. It's not a recommendable answer though because the failure probability is tiny, under 1%. It sounded like Alexander had a better way.Just FYI, I've been trying to reproduce this as well, without a single failure so far. Despite running all tests for a few hundred times (~2 days) and 027_stream_regress.pl many hundreds of times (~1 day). This is on a m4 mac mini. I'm wondering if there's some hardware specific memory ordering issue or disk speed based timing issue that I'm just not hitting.
I'm sorry, but I need several days more to present a working reproducer.
I was lucky enough to catch the assert on my first attempt, without much
effort, but then something changed on that MacBook (it's not mine, I
connect to it remotely when it's available) and I can not reproduce it
anymore.
Just today, I discovered that 027_stream_regress is running very slow
there just because of shared_preload_libraries:
# after the 027_stream_regress test run
echo "shared_preload_libraries = 'pg_stat_statements'" >/tmp/extra.config
TEMP_CONFIG=/tmp/extra.config NO_TEMP_INSTALL=1 /usr/bin/time make -s check
1061,29 real 56,09 user 27,69 sys
vs
NO_TEMP_INSTALL=1 /usr/bin/time make -s check
36,42 real 27,11 user 13,98 sys
Probably it's an effect of antivirus (I see wdavdaemon_unprivileged eating
CPU time), and I uninstalled it before, but now it's installed again
(maybe by some policy). So I definitely need more time to figure out the
exact recipe for triggering the assert.
As to the configure options, when I tried to reproduce the issue on other
(non-macOS) machines, I used options from sifaka:
-DWRITE_READ_PARSE_PLAN_TREES -DSTRESS_SORT_INT_MIN -DENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS,
but then I added -DREAD_STREAM_DISABLE_FAST_PATH to stress read_stream,
and then I just copied that command and ran it on MacBook...
So I think the complete compilation command was (and I'm seeing it in
the history):
CFLAGS="-DREAD_STREAM_DISABLE_FAST_PATH -DWRITE_READ_PARSE_PLAN_TREES -DSTRESS_SORT_INT_MIN -DENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS" ./configure --enable-injection-points --enable-cassert --enable-debug --enable-tap-tests --prefix=/tmp/pg -q && make -s -j8 && make -s install && make -s check
... then running 5 027_stream_regress tests in parallel ...
I had also applied a patch to repeat "test: brin" line, but I'm not sure
it does matter.
Sorry for the lack of useful information again.
Best regards,
Alexander Lakhin
Neon (https://neon.tech)
pgsql-hackers by date: