Re: GNU/Hurd portability patches - Mailing list pgsql-hackers
| From | Alexander Lakhin |
|---|---|
| Subject | Re: GNU/Hurd portability patches |
| Date | |
| Msg-id | 2f4f4487-d4e1-461c-b34b-a22ed686eea2@gmail.com Whole thread Raw |
| In response to | Re: GNU/Hurd portability patches (Thomas Munro <thomas.munro@gmail.com>) |
| List | pgsql-hackers |
10.11.2025 22:03, Thomas Munro wrote: > On Tue, Nov 11, 2025 at 8:00 AM Alexander Lakhin <exclusion@gmail.com> wrote: >> With this modification: >> @@ -137,7 +140,7 @@ pqsignal(int signo, pqsigfunc func) >> >> #if !(defined(WIN32) && defined(FRONTEND)) >> act.sa_handler = func; >> - sigemptyset(&act.sa_mask); >> + sigfillset(&act.sa_mask); >> act.sa_flags = SA_RESTART; >> >> I got 100 iterations passed (12 of them hanged) without that Assert >> triggered. > Interesting. Perhaps a minimal program that installs a handler > assert(signo < 32) for both SIGUSR1 and SIGUSR2 might fail too, if > another program loops calling kill(the_other_one, rand() % 2 == 0 ? > SIGUSR1 : SIGUSR2), to support a bug report? Yeah, thank you for the idea! I will try it in the coming days. >> [lots of weird errors in a wide range of code] > I can't make much sense of these failures, but are you saying that > these only happen without that sigfillset(&act.sa_mask) change, that > is, when the signal implementation is misbehaving? If so, I wonder if > the same bug in their signal handling might just be corrupting the > user stack sometimes even when the signal number assertion doesn't > trip. No, I think those failures are unrelated, I hit them just because I executed `make check` many times and some of them definitely occurred with the unmodified code. Now that I have a script that handles OS hangs and restores VM's disk automatically, I can run tests for hours and look for one failure or another if it can be helpful. >> On the assumption that this isn't a general bug, but just a timing issue >> (planning 'SELECT 1' isn't complicated), I see two possibilities: >> >> 1. Ignore the plan times, and replace SELECT 1 with SELECT >> pg_sleep(1e-6), similar to e849bd551. I guess this would reduce test >> coverage so likely not be great? >> >> 2. Make the query a bit more complicated so that the plan time is likely >> to be non-negligable. I actually had to go quite a way to make it pretty >> failsafe, the attached made it fail less than 5 times out of 50000 >> iterations, not sure whether that is acceptable or still considered >> flaky? > Wait, we have tests that fail if the clock doesn't advance? Isn't > that just bogus? Yeah, we have, this was discussed (and one test was hardened) upthread. >> What concerns me is that there is also subscription.sql and maybe could >> be other test(s) that expect at least 1000ns (far from infinite) timer >> resolution. Probably it would make sense to define which timer resolution >> we consider acceptable for tests and then to check if Hurd can provide it. > Ah, I see, so that one is checking if the last reset time advanced to > check that something happened. That also has the theoretical problem > that CLOCK_REALTIME can go backwards sometimes, due to ntpd > adjustments or whatever. In the absence of a "reset_counter" column, > perhaps we could consider a kludge like x->reset_time = > Max(x->reset_time + 1ns, now), just to make sure the value always goes > up on reset, without having any noticeable effect on normal systems... AFAICS, those test cases use pg_clock_gettime_ns() with CLOCK_MONOTONIC (if defined, and it's really defined on Hurd), so it should not matter in this concrete case. Best regards, Alexander
pgsql-hackers by date: