Thread: Segmentation fault on startup

Segmentation fault on startup

From

Helmut Bender

Date:

01 February 2021, 17:53:54

Hi,

I'm running a nextcloud server in a docker container on an RasPi 4 (only 
SSD, no SD), which uses PostgreSQL 10 as server.
The containers are started via docker compose. The PostgreSQL service 
looks like

services:
         db:
                 image: postgres:10-alpine
                 restart: always
                 volumes:
                         - nextcloud_db:/var/lib/postgresql/data
                 environment:
                         - POSTGRES_PASSWORD=xxx
                         - POSTGRES_DB=nextcloud
                         - POSTGRES_USER=nextcloud

Anything worked smoothly for some month now.

Today I had to restart the RasPi. Now the container fails to start with 
a segmentation violation.
I tried to start the server manually, I set
    log_min_messages = info
    log_min_error_statement = info
    log_error_verbosity = verbose

But all I get is
waiting for server to start....1970-05-04 03:17:36.010 UTC [31] LOG: 
00000: listening on IPv4 address "0.0.0.0", port 5432
1970-05-04 03:17:36.010 UTC [31] LOCATION:  StreamServerPort, pqcomm.c:590
1970-05-04 03:17:36.010 UTC [31] LOG:  00000: listening on IPv6 address 
"::", port 5432
1970-05-04 03:17:36.010 UTC [31] LOCATION:  StreamServerPort, pqcomm.c:590
1970-05-04 03:17:36.010 UTC [31] LOG:  00000: listening on Unix socket 
"/var/run/postgresql/.s.PGSQL.5432"
1970-05-04 03:17:36.010 UTC [31] LOCATION:  StreamServerPort, pqcomm.c:585
.1970-05-04 03:17:36.010 UTC [31] LOG:  00000: startup process (PID 32) 
was terminated by signal 11: Segmentation fault
1970-05-04 03:17:36.010 UTC [31] LOCATION:  LogChildExit, postmaster.c:3639
1970-05-04 03:17:36.010 UTC [31] LOG:  00000: aborting startup due to 
startup process failure
1970-05-04 03:17:36.010 UTC [31] LOCATION:  reaper, postmaster.c:2893
1970-05-04 03:17:36.010 UTC [31] LOG:  00000: database system is shut down
1970-05-04 03:17:36.010 UTC [31] LOCATION:  UnlinkLockFiles, miscinit.c:764
  stopped waiting
pg_ctl: could not start server
Examine the log output.


Well. I got stuck. I have no idea how I can find out what went wrong let 
alone how to repair my database.
I have a dump of the data, but without running server I can't do 
anything with it either...

Can someone tell me, what I could do?

Thank you.

-- 
Gruß Helmut

Re: Segmentation fault on startup

From

Tom Lane

Date:

01 February 2021, 20:40:27

Helmut Bender <pgsql@helmut-bender.de> writes:
> I'm running a nextcloud server in a docker container on an RasPi 4 (only 
> SSD, no SD), which uses PostgreSQL 10 as server.

10.what?  We're already up to 15 patch releases for that branch.

> Today I had to restart the RasPi. Now the container fails to start with 
> a segmentation violation.

Not good --- sounds like you have data corruption.  After an OS crash
this is something that's quite possible if you haven't taken the time
to qualify the storage subsystem's honoring of fsync.

It is barely possible that it's a PG bug that we've fixed, so if you
are not on 10.15 then an update would be worth trying.  But I don't
have a lot of hope for that.

> I have a dump of the data, but without running server I can't do 
> anything with it either...

If it's a reasonably recent dump, you might end up just having to
re-initdb and restore the dump.

If the missing data is very valuable to you, there are people around
who specialize in trying to recover data from corrupted databases
(see "Professional Services" on our website).  But it's expensive
and there's no guarantee how much can be recovered.

As with all else computer-related, there's no substitute for a
good backup plan :-(

            regards, tom lane

Re: Segmentation fault on startup

From

Helmut Bender

Date:

02 February 2021, 18:22:41

(again to the list...)

Am 01.02.21 um 21:40 schrieb Tom Lane:
> Helmut Bender <pgsql@helmut-bender.de> writes:
>> I'm running a nextcloud server in a docker container on an RasPi 4 (only
>> SSD, no SD), which uses PostgreSQL 10 as server.
> 
> 10.what?  We're already up to 15 patch releases for that branch.

As I use the docker image, it seems to be at 10.15.

>> Today I had to restart the RasPi. Now the container fails to start with
>> a segmentation violation.
> 
> Not good --- sounds like you have data corruption.  After an OS crash
> this is something that's quite possible if you haven't taken the time
> to qualify the storage subsystem's honoring of fsync.

Well, it was a regular reboot... don't know what happend.

> If it's a reasonably recent dump, you might end up just having to
> re-initdb and restore the dump.

OK, so there's no way to repair? Well, I make a daily backup, so that is 
not the problem.

> As with all else computer-related, there's no substitute for a
> good backup plan :-(

Oh yes.
And when you do, be shure to backup anything you need.
I managed to fiddle the backup into my container (which I updated to 
PGSQL 11 btw). BUT - it complained about missing roles.

So don't forget to
pg_dumpall --roles-only
when you pg_dump!

Thank you for your tips, it's running again. :-D

-- 
Gruß Helmut

Re: Segmentation fault on startup

From

Helmut Bender

Date:

13 February 2021, 09:16:30

Hi,

a little follow-up to this case...

since redis didn't work correctly, too, I looked around for a solution 
for that, too.

It seems that the alpine image 3.13 for arm7 is broken at the moment... 
see the answer here:
https://stackoverflow.com/questions/66091978/corrupt-date-with-redis6-alpine-on-raspi

And since I used the :10-alpine image for postgres, it apparently was 
also affected by this bug.

So - not the reboot crashed postgres, but alpine.

Am 02.02.21 um 19:22 schrieb Helmut Bender:
>>> Today I had to restart the RasPi. Now the container fails to start with
>>> a segmentation violation.
>>
>> Not good --- sounds like you have data corruption.  After an OS crash
>> this is something that's quite possible if you haven't taken the time
>> to qualify the storage subsystem's honoring of fsync.
> 
> Well, it was a regular reboot... don't know what happend.
> 
> Thank you for your tips, it's running again. :-D
> 

-- 
Gruß Helmut