Re: Upgrade 10.5->10.6 : db crash BUS ERROR (sig 10), reproducible - Mailing list pgsql-admin

From Peter
Subject Re: Upgrade 10.5->10.6 : db crash BUS ERROR (sig 10), reproducible
Date
Msg-id 20190308012012.GA49481@gate.oper.dinoex.org
Whole thread Raw
In response to Upgrade 10.5->10.6 : db crash BUS ERROR (sig 10), reproducible  (Peter <pmc@citylink.dinoex.sub.org>)
Responses Re: Upgrade 10.5->10.6 : db crash BUS ERROR (sig 10), reproducible  (Andrew Gierth <andrew@tao11.riddles.org.uk>)
Re: Upgrade 10.5->10.6 : db crash BUS ERROR (sig 10), reproducible  (Andrew Gierth <andrew@tao11.riddles.org.uk>)
Re: Upgrade 10.5->10.6 : db crash BUS ERROR (sig 10), reproducible  (Andrew Gierth <andrew@tao11.riddles.org.uk>)
List pgsql-admin
Hi Tom, Andrew,

 much thanks for the replies! Alright, lets fill in some concrete
data:

> I'm assuming from the CC that this is on FreeBSD, but on what
> architecture?

When on my evening errands I recognized that I should have mentioned
this - FreeBSD is correct; it is built on amd64 for i386, and run on
i386.

Version: 
  FreeBSD 11.2-RELEASE-p9 #0 r343946M#C51:82
Build-Options:
  OPTIONS_FILE_UNSET+=DEBUG
  OPTIONS_FILE_UNSET+=DOCS
  OPTIONS_FILE_UNSET+=DTRACE
  OPTIONS_FILE_SET+=GSSAPI
  OPTIONS_FILE_SET+=INTDATE
  OPTIONS_FILE_UNSET+=LDAP
  OPTIONS_FILE_SET+=NLS
  OPTIONS_FILE_UNSET+=OPTIMIZED_CFLAGS
  OPTIONS_FILE_UNSET+=PAM
  OPTIONS_FILE_SET+=SSL
  OPTIONS_FILE_SET+=TZDATA
  OPTIONS_FILE_SET+=XML
Extra Compiler-Options:
  -march=pentium3
Init-Options:
  --data-checksums --encoding=utf-8 --lc-collate=de_DE.UTF-8
  --lc-ctype=de_DE.UTF-8 --lc-messages=en_US.UTF-8
  --lc-monetary=en_US.UTF-8 --lc-numeric=en_US.UTF-8
  --lc-time=en_US.UTF-8
Run-Options:
  -w -m fast -o --config_file=/usr/local/etc/postgresql/postgresql.conf

Furthermore, FreeBSD did impose a change for R. 10.6: it forces the
use of gcc on i386 (gcc-8 in this case). Earlier versions were built
with system compiler Clang. The commitlog says this about the matter:

! r484807 | girgen | 2018-11-12 16:54:19 +0100 (Mon, 12 Nov 2018) | 5 lines
!
! Fix build problems on i386
!
! Use GCC seems to be proper way to do it. SSE2 would not be available
! for all CPU:s.

> Did it drop a core file (look in the data dir for postgres.core) and if
> so can you get a backtrace?

Looking... yes, there is a core. Lets grab a first-fault core,
as that one obviousely is from the failed recover:

! (gdb) core postgres.core.1st
! Core was generated by `postgres: bgworker: parallel worker for PID 68755 '.
! Program terminated with signal 10, Bus error.
! Reading symbols from <etc etc>
! #0  0x0838bdf2 in pg_checksum_page ()
! (gdb) bt
! #0  0x0838bdf2 in pg_checksum_page ()
! #1  0x0838a2b8 in PageIsVerified ()
! #2  0x5a824500 in ?? ()
! #3  0x00000000 in ?? ()

The second one looks this way:

! (gdb) core postgres.core 
! Core was generated by `postgres: startup process recovering 000000010000002C000000C6'.
! Program terminated with signal 10, Bus error.
! Reading symbols from <lots of files>
! #0  0x0838bdf2 in pg_checksum_page ()
! (gdb) bt
! #0  0x0838bdf2 in pg_checksum_page ()
! #1  0x0838a2b8 in PageIsVerified ()
! #2  0x59e14500 in ?? ()
! #3  0x00000000 in ?? ()

Anything more I can do here? (Advice on how to build with debugging
support is appreciated.)

> You can check whether your CPU supports SSE2 by looking at the Features=
> line in /var/run/dmesg.boot. It seems unlikely that it does not, because
> SSE2 was introduced in 2000 with the Pentium 4.

No need to check; I am absolutely certain that it does NOT.
https://www.asus.com/supportonly/CUV4X-DLS/HelpDesk_CPU/

But, Your explanation seems not to answer the fundamental question: if 
the database at 10.6 is still supposed to be able to run without SSE2?

> It seems pretty unlikely that that'd have anything to do with a
> bus-error failure, anyway.  But this report contains far too little
> information to let anyone do anything but speculate.

Whateever information You like to have, just ask and I will gladly do
my best to obtain it, as I get around. (This is a reproducible on a 
very well maintained piece of software - this is rather fun.)


Some more experiments & observations:

The crash happens at a specific query - I get parse,bind, but no execute 
timing.
Furthermore, when I try and set

! max_parallel_workers_per_gather = 0     

then the query goes thru and delivers proper results. But then after
few minutes I get this one:

! postgres[71256]: [8-1] :[] LOG: 00000: checkpointer process (PID 71258) 
! was terminated by signal 10: Bus error


Different approach, same result:

! dynamic_shared_memory_type = posix   -> crash immediate
! dynamic_shared_memory_type = sysv    -> crash immediate
! dynamic_shared_memory_type = mmap    -> crash immediate
! dynamic_shared_memory_type = none    -> crash later in checkpointer


regards,
PMc


pgsql-admin by date:

Previous
From: Ron
Date:
Subject: Re: Upgrade 10.5->10.6 : db crash BUS ERROR (sig 10), reproducible
Next
From: Andrew Gierth
Date:
Subject: Re: Upgrade 10.5->10.6 : db crash BUS ERROR (sig 10), reproducible