Thread: Reproducible "Bus error" in 9.2.3 during database dump restoration (Ubuntu Server 12.04 LTS)
Reproducible "Bus error" in 9.2.3 during database dump restoration (Ubuntu Server 12.04 LTS)
From
Dmitry Koterov
Date:
Hello.
I have a database dump file (unfortunately with proprietary information) which leads to the following error in logs during its restoration (even after initdb - it is stable reproducible, at the same large table, the same time):
LOG: server process (PID 18705) was terminated by signal 7: Bus error
DETAIL: Failed process was running: COPY br_agent_log (id, agent_id, stamp, trace, message) FROM stdin;
LOG: terminating any other active server processes
WARNING: terminating connection because of crash of another server process
...
and then, after recovery:
...
redo done at 0/12DDB7A8
...
LOG: database system is ready to accept connections
ERROR: could not read block 1 in file "base/57390/11783": read only 4448 of 8192 bytes at character 39
ERROR: could not read block 1 in file "base/57390/11783": read only 4448 of 8192 bytes at character 39
I think it could look like a memory corruption in PG? BTW 9.1.8 does not have such problem - the restoration is OK.
Possibly I could help with this crash investigation? How to do it better? Maybe you have a tutorial article about it which shows the preferable error reporting format?
Re: Reproducible "Bus error" in 9.2.3 during database dump restoration (Ubuntu Server 12.04 LTS)
From
Kevin Grittner
Date:
Dmitry Koterov <dmitry@koterov.ru> wrote: > LOG: server process (PID 18705) was terminated by signal 7: Bus error So far I have only heard of this sort of error when PostgreSQL is running in a virtual machine and the VM software is buggy. If you are not running in a VM, my next two suspects would be hardware/BIOS configuration issues, or an antivirus product. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Reproducible "Bus error" in 9.2.3 during database dump restoration (Ubuntu Server 12.04 LTS)
From
Merlin Moncure
Date:
On Tue, Mar 5, 2013 at 3:04 PM, Kevin Grittner <kgrittn@ymail.com> wrote: > Dmitry Koterov <dmitry@koterov.ru> wrote: > >> LOG: server process (PID 18705) was terminated by signal 7: Bus error > > So far I have only heard of this sort of error when PostgreSQL is > running in a virtual machine and the VM software is buggy. If you > are not running in a VM, my next two suspects would be > hardware/BIOS configuration issues, or an antivirus product. for posterity, what's the hardware platform? software bus errors are more likely on non x86 hardware. merlin
Re: Reproducible "Bus error" in 9.2.3 during database dump restoration (Ubuntu Server 12.04 LTS)
From
Dmitry Koterov
Date:
x86_64, PostgreSQL 9.2. is run within an OpenVZ container and generates SIGBUS.
PostgreSQL 9.1 has no such problem.
(OpenVZ is a linux kernel-level virtualization which adds namespaces for processes, networking, quotas etc. It works not like e.g. Xen or VMWare, because all containers share the same kernel.)
On Wed, Mar 6, 2013 at 7:51 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
On Tue, Mar 5, 2013 at 3:04 PM, Kevin Grittner <kgrittn@ymail.com> wrote:for posterity, what's the hardware platform? software bus errors are
> Dmitry Koterov <dmitry@koterov.ru> wrote:
>
>> LOG: server process (PID 18705) was terminated by signal 7: Bus error
>
> So far I have only heard of this sort of error when PostgreSQL is
> running in a virtual machine and the VM software is buggy. If you
> are not running in a VM, my next two suspects would be
> hardware/BIOS configuration issues, or an antivirus product.
more likely on non x86 hardware.
merlin
Re: Reproducible "Bus error" in 9.2.3 during database dump restoration (Ubuntu Server 12.04 LTS)
From
Craig Ringer
Date:
On 03/11/2013 09:20 PM, Dmitry Koterov wrote:
x86_64, PostgreSQL 9.2. is run within an OpenVZ container and generates SIGBUS.Related to SHM vs mmapped files? Seems unlikely, but I guess it could affect low-enough level work like kernel TLB usage.PostgreSQL 9.1 has no such problem.(OpenVZ is a linux kernel-level virtualization which adds namespaces for processes, networking, quotas etc. It works not like e.g. Xen or VMWare, because all containers share the same kernel.)
At what point in Pg's execution does the SIGBUS occur? Is it always at the same place or few places in the code? It would be helpful if you could enable core files writing and get backtraces from core files or (since it's reproducible) by attaching a debugger directly to a Pg backend. See http://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD
If you restore just the first half of the table or just the last half does the crash still happen? If it still happens in one part but not in another, can you do a binary search* to isolate the smallest chunk of the input file that still reliably causes the crash?
* (ie: split the file roughly in half at a record boundary and test each half. Discard the half that doesn't crash, keep the half that crashes. Repeat the process using the kept half as input until you find the smallest chunk that still crashes, or get down to a single record that causes the problem.)
Does the same data cause a crash when restored in another VM on the same OpenVZ container? What about when restored to another machine with the same OS and Pg version outside OpenVZ?
-- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services