17.06.2011, 00:28, "Kevin Grittner" <Kevin.Grittner@wicourts.gov>:
> ***** **********<zlobnynigga@yandex.ru>; wrote:
>
>> [4-1] 2011-06-16 17:40:27 UTC LOG: startup process (PID 15292)
>> was terminated by signal 7: Bus error
>> Signal 7 means hardware problems. But all 10 replicas crashed
>> within 10 minutes, say from 13:35 to 13:45.
>> One important thing - all replicas and master are running on
>> openvz
>
> Were the PostgreSQL clusters sharing any hardware?
>
>> there is no way to reject virtualization (it is a long story =))
>>
>> Please, I do not want to discuss my decision to set buffers to
>> 12Gb and postgresql optimization at all. I just want to undestand
>> why I'm getting such errors.
>
> On the face of it, the most likely cause would seem to be hardware
> or the virtual environment. Without knowing more about the exact
> messages on the replicas and how they compared to each other and the
> master it's hard to know whether any of the replica failures were
> from passing corrupted data from the master to the replicas, versus
> having a common hardware/vm flaw.
>
> -Kevin
I noticed that crash takes place when shared buffers are almost full, i.e. SELECT SUM(size) FROM adm.buffercache()
returns11670 at about one minute before crash. Furthermore, last night I set buffers to 11Gb, at it is working, no
crash,all buffers are used (11120).
I still do not believe that this is hardware problem. Each replica and master runs on dedicated server, no hardware is
shared.There is only postgresql on each server, no any other software(just crond, zabbix, atop).
Actually openvz is used only for portability(easily add new replicas or migrate one of them to new server).
Messages on replicas are all the same: "could not read block", then "signal 7". I copypasted error log as is, that is
allI know.
Master did not crash, I think because it processes less SELECT queries, therefore his buffers do not reach limit.