Re: GNU/Hurd portability patches - Mailing list pgsql-hackers

From Alexander Lakhin
Subject Re: GNU/Hurd portability patches
Date
Msg-id b87a0112-6235-4d87-886b-d1c79c0e0543@gmail.com
Whole thread Raw
In response to Re: GNU/Hurd portability patches  (Michael Banck <mbanck@gmx.net>)
Responses Re: GNU/Hurd portability patches
List pgsql-hackers
Hi Michael,

12.10.2025 11:31, Michael Banck wrote:
 >
 > Any way to easily reproduce this? It happened only once on fruitcrow so
 > far.

I'd say it happens pretty often when `make check` doesn't hang (so it
takes an hour or two for me to reproduce).

Though now that you've mentioned MAX_CONNECTIONS => '3', I also tried:
EXTRA_REGRESS_OPTS="--max-connections=3" make -s check
and it passed 6 iterations for me. Iteration 7 failed with:
not ok 213   + partition_aggregate                      1027 ms

--- /home/demo/postgresql/src/test/regress/expected/partition_aggregate.out 2025-10-11 10:04:36.000000000 +0100
+++ /home/demo/postgresql/src/test/regress/results/partition_aggregate.out 2025-10-12 13:02:05.000000000 +0100
@@ -1476,14 +1476,14 @@
  (15 rows)

  SELECT x, sum(y), avg(y), sum(x+y), count(*) FROM pagg_tab_para GROUP BY x HAVING avg(y) < 7 ORDER BY 1, 2, 3;
- x  | sum  |        avg         |  sum  | count
-----+------+--------------------+-------+-------
-  0 | 5000 | 5.0000000000000000 |  5000 |  1000
-  1 | 6000 | 6.0000000000000000 |  7000 |  1000
- 10 | 5000 | 5.0000000000000000 | 15000 |  1000
- 11 | 6000 | 6.0000000000000000 | 17000 |  1000
- 20 | 5000 | 5.0000000000000000 | 25000 |  1000
- 21 | 6000 | 6.0000000000000000 | 27000 |  1000
+ x  | sum  |            avg             |  sum  | count
+----+------+----------------------------+-------+-------
+  0 | 5000 |         5.0000000000000000 |  5000 |  1000
+  1 | 6000 |         6.0000000000000000 |  7000 |  1000
+ 10 | 5000 | 0.000000052757140846001326 | 15000 |  1000
+ 11 | 6000 |         6.0000000000000000 | 17000 |  1000
+ 20 | 5000 |         5.0000000000000000 | 25000 |  1000
+ 21 | 6000 |         6.0000000000000000 | 27000 |  1000
  (6 rows)

Then another 6 iterations passed, seventh one hanged. Then 10 iterations
passed.

With  EXTRA_REGRESS_OPTS="--max-connections=10" make -s check, I got:
2025-10-12 13:52:58.559 BST client backend[15475] pg_regress/constraints STATEMENT:  ALTER TABLE notnull_tbl2 ALTER a 
DROP NOT NULL;
!!!wrapper_handler[15479]| postgres_signal_arg: 30, PG_NSIG: 33
!!!wrapper_handler[15476]| postgres_signal_arg: 30, PG_NSIG: 33
!!!wrapper_handler[15476]| postgres_signal_arg: 28481392, PG_NSIG: 33
TRAP: failed Assert("postgres_signal_arg < PG_NSIG"), File: "pqsignal.c", Line: 94, PID: 15476
postgres(ExceptionalCondition+0x5a) [0x1006af78a]
postgres(+0x70f59a) [0x10070f59a]
/lib/x86_64-gnu/libc.so.0.3(+0x39fee) [0x102b89fee]
/lib/x86_64-gnu/libc.so.0.3(+0x39fdd) [0x102b89fdd]

on iteration 5.

So we can conclude that the issue with signals is better reproduced with
higher concurrency.

28481392 (0x1b29770) is pretty close to 28476608 (0x1b284c0), which I
showed before, so numbers are apparently not random.

 > I had to reboot fruitcrow last night because it had crashed, but that
 > was the first time in literally weeks. I tend to reboot it once a week,
 > but otherwise it ran pretty stable.

Today I also tried to test my machine with stress-ng:
stress-ng -v --class os --sequential 20 --timeout 120s

It hanged/crashed at tests access, brk, close, enosys and never reached
the end... Some tests might pass after restart, some fail consistently...
For example:
Fatal glibc error: ../sysdeps/mach/hurd/mig-reply.c:73 (__mig_dealloc_reply_port): assertion failed: port == arg
stress-ng: info:  [9395] stressor terminated with unexpected signal 6 'SIGABRT'
backtrace:
   stress-ng-enosys [run](+0xace81) [0x1000ace81]
   stress-ng-enosys [run](+0x927b6c) [0x100927b6c]
   /lib/x86_64-gnu/libc.so.0.3(+0x39fee) [0x1029c8fee]
   /lib/x86_64-gnu/libc.so.0.3(+0x21aec) [0x1029b0aec]

 > It took me a while to get there though before I applied for it to be a
 > buildfarm animal, here is what I did:
 >
 > 1) (builfarm client specific): removed "HEAD => ['debug_parallel_query =
 > regress']," and set "MAX_CONNECTIONS => '3'," in build-farm.conf, to
 > reduce concurrency.

Thank you for the info! I didn't specify debug_parallel_query for
`make check`, but num_connections really makes the difference.

 > 2. Gave it 4G of memory to the VM via KVM. Also set -M q35, but I guess
 > you are already doing that as it does not boot properly otherwise IME.

Mine has 4GB too.

 > 3. Removed swap (this is already the case for the x86-64 2025 Debian
 > image, but it was not the case for the earlier 2023 i386 image when I
 > started this project). Paging to disk has been problematic and prone to
 > issues (critical parts getting paged out accidently), but this has been
 > fixed over the summer so in principle running a current gnumach/hurd
 > package combination from unstable should be fine again.

Yes, I have no swap enabled.

 > 4. Removed tmpfs translators (so that the default-pager is not used
 > anywhere, in conjunction with not setting swap, see above), by setting
 > RAMLOCK=no and RAMTMP=no in /etc/default/tmpfs, as well as commenting
 > out 'mount_run mount_noupdate'/'mount_tmp mount_noupdate' in
 > /etc/init.d/mountall.sh and 'mount_run "$MNTMODE"' in
 > /etc/init.d/mountkernfs.sh (maybe there is a more minimal change, but
 > that is what I have right now).

I have RAMLOCK=no and RAMTMP=no in my /etc/default/tmpfs and can't see any
tmpfs mounts.

Thank you for your help!

Best regards,
Alexander



pgsql-hackers by date:

Previous
From: Tatsuo Ishii
Date:
Subject: Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options
Next
From: Greg Burd
Date:
Subject: Re: IO in wrong state on riscv64