Re: Anti-critical-section assertion failure in mcxt.c reached by walsender - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Anti-critical-section assertion failure in mcxt.c reached by walsender
Date
Msg-id 161637.1620440294@sss.pgh.pa.us
Whole thread Raw
In response to Re: Anti-critical-section assertion failure in mcxt.c reached by walsender  (Andres Freund <andres@anarazel.de>)
Responses Re: Anti-critical-section assertion failure in mcxt.c reached by walsender
List pgsql-hackers
Andres Freund <andres@anarazel.de> writes:
> On 2021-05-07 17:14:18 -0700, Noah Misch wrote:
>> Having a flaky buildfarm member is bad news.  I'll LD_PRELOAD the attached to
>> prevent fsync from reaching the kernel.  Hopefully, that will make the
>> hardware-or-kernel trouble unreachable.  (Changing 008_fsm_truncation.pl
>> wouldn't avoid this, because fsync=off doesn't affect syncs outside the
>> backend.)

> Not sure how reliable that is - there's other paths that could return an
> error, I think. If the root cause is the disk responding weirdly to
> write cache flushes, you could tell the kernel that that the disk has no
> write cache (e.g. echo write through > /sys/block/sda/queue/write_cache).

I seriously doubt Noah has root on that machine.

More to the point, the admin told me it's a VM (or LDOM, whatever that is)
under a Solaris host, so there's no direct hardware access going on
anyway.  He didn't say in so many words, but I suspect the reason he's
suspecting kernel bugs is that there's nothing going wrong so far as the
host OS is concerned.

            regards, tom lane



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Anti-critical-section assertion failure in mcxt.c reached by walsender
Next
From: Bruce Momjian
Date:
Subject: Re: Have I found an interval arithmetic bug?