Re: GNU/Hurd portability patches - Mailing list pgsql-hackers
From | Alexander Lakhin |
---|---|
Subject | Re: GNU/Hurd portability patches |
Date | |
Msg-id | b87a0112-6235-4d87-886b-d1c79c0e0543@gmail.com Whole thread Raw |
In response to | Re: GNU/Hurd portability patches (Michael Banck <mbanck@gmx.net>) |
Responses |
Re: GNU/Hurd portability patches
|
List | pgsql-hackers |
Hi Michael, 12.10.2025 11:31, Michael Banck wrote: > > Any way to easily reproduce this? It happened only once on fruitcrow so > far. I'd say it happens pretty often when `make check` doesn't hang (so it takes an hour or two for me to reproduce). Though now that you've mentioned MAX_CONNECTIONS => '3', I also tried: EXTRA_REGRESS_OPTS="--max-connections=3" make -s check and it passed 6 iterations for me. Iteration 7 failed with: not ok 213 + partition_aggregate 1027 ms --- /home/demo/postgresql/src/test/regress/expected/partition_aggregate.out 2025-10-11 10:04:36.000000000 +0100 +++ /home/demo/postgresql/src/test/regress/results/partition_aggregate.out 2025-10-12 13:02:05.000000000 +0100 @@ -1476,14 +1476,14 @@ (15 rows) SELECT x, sum(y), avg(y), sum(x+y), count(*) FROM pagg_tab_para GROUP BY x HAVING avg(y) < 7 ORDER BY 1, 2, 3; - x | sum | avg | sum | count -----+------+--------------------+-------+------- - 0 | 5000 | 5.0000000000000000 | 5000 | 1000 - 1 | 6000 | 6.0000000000000000 | 7000 | 1000 - 10 | 5000 | 5.0000000000000000 | 15000 | 1000 - 11 | 6000 | 6.0000000000000000 | 17000 | 1000 - 20 | 5000 | 5.0000000000000000 | 25000 | 1000 - 21 | 6000 | 6.0000000000000000 | 27000 | 1000 + x | sum | avg | sum | count +----+------+----------------------------+-------+------- + 0 | 5000 | 5.0000000000000000 | 5000 | 1000 + 1 | 6000 | 6.0000000000000000 | 7000 | 1000 + 10 | 5000 | 0.000000052757140846001326 | 15000 | 1000 + 11 | 6000 | 6.0000000000000000 | 17000 | 1000 + 20 | 5000 | 5.0000000000000000 | 25000 | 1000 + 21 | 6000 | 6.0000000000000000 | 27000 | 1000 (6 rows) Then another 6 iterations passed, seventh one hanged. Then 10 iterations passed. With EXTRA_REGRESS_OPTS="--max-connections=10" make -s check, I got: 2025-10-12 13:52:58.559 BST client backend[15475] pg_regress/constraints STATEMENT: ALTER TABLE notnull_tbl2 ALTER a DROP NOT NULL; !!!wrapper_handler[15479]| postgres_signal_arg: 30, PG_NSIG: 33 !!!wrapper_handler[15476]| postgres_signal_arg: 30, PG_NSIG: 33 !!!wrapper_handler[15476]| postgres_signal_arg: 28481392, PG_NSIG: 33 TRAP: failed Assert("postgres_signal_arg < PG_NSIG"), File: "pqsignal.c", Line: 94, PID: 15476 postgres(ExceptionalCondition+0x5a) [0x1006af78a] postgres(+0x70f59a) [0x10070f59a] /lib/x86_64-gnu/libc.so.0.3(+0x39fee) [0x102b89fee] /lib/x86_64-gnu/libc.so.0.3(+0x39fdd) [0x102b89fdd] on iteration 5. So we can conclude that the issue with signals is better reproduced with higher concurrency. 28481392 (0x1b29770) is pretty close to 28476608 (0x1b284c0), which I showed before, so numbers are apparently not random. > I had to reboot fruitcrow last night because it had crashed, but that > was the first time in literally weeks. I tend to reboot it once a week, > but otherwise it ran pretty stable. Today I also tried to test my machine with stress-ng: stress-ng -v --class os --sequential 20 --timeout 120s It hanged/crashed at tests access, brk, close, enosys and never reached the end... Some tests might pass after restart, some fail consistently... For example: Fatal glibc error: ../sysdeps/mach/hurd/mig-reply.c:73 (__mig_dealloc_reply_port): assertion failed: port == arg stress-ng: info: [9395] stressor terminated with unexpected signal 6 'SIGABRT' backtrace: stress-ng-enosys [run](+0xace81) [0x1000ace81] stress-ng-enosys [run](+0x927b6c) [0x100927b6c] /lib/x86_64-gnu/libc.so.0.3(+0x39fee) [0x1029c8fee] /lib/x86_64-gnu/libc.so.0.3(+0x21aec) [0x1029b0aec] > It took me a while to get there though before I applied for it to be a > buildfarm animal, here is what I did: > > 1) (builfarm client specific): removed "HEAD => ['debug_parallel_query = > regress']," and set "MAX_CONNECTIONS => '3'," in build-farm.conf, to > reduce concurrency. Thank you for the info! I didn't specify debug_parallel_query for `make check`, but num_connections really makes the difference. > 2. Gave it 4G of memory to the VM via KVM. Also set -M q35, but I guess > you are already doing that as it does not boot properly otherwise IME. Mine has 4GB too. > 3. Removed swap (this is already the case for the x86-64 2025 Debian > image, but it was not the case for the earlier 2023 i386 image when I > started this project). Paging to disk has been problematic and prone to > issues (critical parts getting paged out accidently), but this has been > fixed over the summer so in principle running a current gnumach/hurd > package combination from unstable should be fine again. Yes, I have no swap enabled. > 4. Removed tmpfs translators (so that the default-pager is not used > anywhere, in conjunction with not setting swap, see above), by setting > RAMLOCK=no and RAMTMP=no in /etc/default/tmpfs, as well as commenting > out 'mount_run mount_noupdate'/'mount_tmp mount_noupdate' in > /etc/init.d/mountall.sh and 'mount_run "$MNTMODE"' in > /etc/init.d/mountkernfs.sh (maybe there is a more minimal change, but > that is what I have right now). I have RAMLOCK=no and RAMTMP=no in my /etc/default/tmpfs and can't see any tmpfs mounts. Thank you for your help! Best regards, Alexander
pgsql-hackers by date: