Re: Anti-critical-section assertion failure in mcxt.c reached by walsender - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Anti-critical-section assertion failure in mcxt.c reached by walsender
Date
Msg-id 20210507194947.etrgj7mpcv73mxef@alap3.anarazel.de
Whole thread Raw
In response to Re: Anti-critical-section assertion failure in mcxt.c reached by walsender  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Anti-critical-section assertion failure in mcxt.c reached by walsender
List pgsql-hackers
Hi,

On 2021-05-07 10:29:58 -0400, Tom Lane wrote:
> I wrote:
> > 1. No wonder we could not reproduce it anywhere else.  I've warned
> > the cfarm admins that their machine may be having hardware issues.
> 
> I heard back from the machine's admin.  The time of the crash I observed
> matches exactly to these events in the kernel log:
> 
> May 07 03:31:39 gcc202 kernel: dm-0: writeback error on inode 2148294407, offset 0, sector 159239256
> May 07 03:31:39 gcc202 kernel: sunvdc: vdc_tx_trigger() failure, err=-11
> May 07 03:31:39 gcc202 kernel: blk_update_request: I/O error, dev vdiskc, sector 157618896 op 0x1:(WRITE) flags
0x4800phys_seg 16 prio class 0
 
> 
> So it's not a mirage.  The admin seems to think it might be a kernel
> bug though.

Isn't this a good reason to have at least some tests run with fsync=on?

It makes a ton of sense for buildfarm animals to disable fsync to
achieve acceptable performance. Having something in there that
nevertheless does some light exercise of the fsync code doesn't seem
bad?

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: Anti-critical-section assertion failure in mcxt.c reached by walsender
Next
From: David Rowley
Date:
Subject: Re: plan with result cache is very slow when work_mem is not enough