Thread: Segmentation fault on startup
Hi, I'm running a nextcloud server in a docker container on an RasPi 4 (only SSD, no SD), which uses PostgreSQL 10 as server. The containers are started via docker compose. The PostgreSQL service looks like services: db: image: postgres:10-alpine restart: always volumes: - nextcloud_db:/var/lib/postgresql/data environment: - POSTGRES_PASSWORD=xxx - POSTGRES_DB=nextcloud - POSTGRES_USER=nextcloud Anything worked smoothly for some month now. Today I had to restart the RasPi. Now the container fails to start with a segmentation violation. I tried to start the server manually, I set log_min_messages = info log_min_error_statement = info log_error_verbosity = verbose But all I get is waiting for server to start....1970-05-04 03:17:36.010 UTC [31] LOG: 00000: listening on IPv4 address "0.0.0.0", port 5432 1970-05-04 03:17:36.010 UTC [31] LOCATION: StreamServerPort, pqcomm.c:590 1970-05-04 03:17:36.010 UTC [31] LOG: 00000: listening on IPv6 address "::", port 5432 1970-05-04 03:17:36.010 UTC [31] LOCATION: StreamServerPort, pqcomm.c:590 1970-05-04 03:17:36.010 UTC [31] LOG: 00000: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432" 1970-05-04 03:17:36.010 UTC [31] LOCATION: StreamServerPort, pqcomm.c:585 .1970-05-04 03:17:36.010 UTC [31] LOG: 00000: startup process (PID 32) was terminated by signal 11: Segmentation fault 1970-05-04 03:17:36.010 UTC [31] LOCATION: LogChildExit, postmaster.c:3639 1970-05-04 03:17:36.010 UTC [31] LOG: 00000: aborting startup due to startup process failure 1970-05-04 03:17:36.010 UTC [31] LOCATION: reaper, postmaster.c:2893 1970-05-04 03:17:36.010 UTC [31] LOG: 00000: database system is shut down 1970-05-04 03:17:36.010 UTC [31] LOCATION: UnlinkLockFiles, miscinit.c:764 stopped waiting pg_ctl: could not start server Examine the log output. Well. I got stuck. I have no idea how I can find out what went wrong let alone how to repair my database. I have a dump of the data, but without running server I can't do anything with it either... Can someone tell me, what I could do? Thank you. -- Gruß Helmut
Helmut Bender <pgsql@helmut-bender.de> writes: > I'm running a nextcloud server in a docker container on an RasPi 4 (only > SSD, no SD), which uses PostgreSQL 10 as server. 10.what? We're already up to 15 patch releases for that branch. > Today I had to restart the RasPi. Now the container fails to start with > a segmentation violation. Not good --- sounds like you have data corruption. After an OS crash this is something that's quite possible if you haven't taken the time to qualify the storage subsystem's honoring of fsync. It is barely possible that it's a PG bug that we've fixed, so if you are not on 10.15 then an update would be worth trying. But I don't have a lot of hope for that. > I have a dump of the data, but without running server I can't do > anything with it either... If it's a reasonably recent dump, you might end up just having to re-initdb and restore the dump. If the missing data is very valuable to you, there are people around who specialize in trying to recover data from corrupted databases (see "Professional Services" on our website). But it's expensive and there's no guarantee how much can be recovered. As with all else computer-related, there's no substitute for a good backup plan :-( regards, tom lane
(again to the list...) Am 01.02.21 um 21:40 schrieb Tom Lane: > Helmut Bender <pgsql@helmut-bender.de> writes: >> I'm running a nextcloud server in a docker container on an RasPi 4 (only >> SSD, no SD), which uses PostgreSQL 10 as server. > > 10.what? We're already up to 15 patch releases for that branch. As I use the docker image, it seems to be at 10.15. >> Today I had to restart the RasPi. Now the container fails to start with >> a segmentation violation. > > Not good --- sounds like you have data corruption. After an OS crash > this is something that's quite possible if you haven't taken the time > to qualify the storage subsystem's honoring of fsync. Well, it was a regular reboot... don't know what happend. > If it's a reasonably recent dump, you might end up just having to > re-initdb and restore the dump. OK, so there's no way to repair? Well, I make a daily backup, so that is not the problem. > As with all else computer-related, there's no substitute for a > good backup plan :-( Oh yes. And when you do, be shure to backup anything you need. I managed to fiddle the backup into my container (which I updated to PGSQL 11 btw). BUT - it complained about missing roles. So don't forget to pg_dumpall --roles-only when you pg_dump! Thank you for your tips, it's running again. :-D -- Gruß Helmut
Hi, a little follow-up to this case... since redis didn't work correctly, too, I looked around for a solution for that, too. It seems that the alpine image 3.13 for arm7 is broken at the moment... see the answer here: https://stackoverflow.com/questions/66091978/corrupt-date-with-redis6-alpine-on-raspi And since I used the :10-alpine image for postgres, it apparently was also affected by this bug. So - not the reboot crashed postgres, but alpine. Am 02.02.21 um 19:22 schrieb Helmut Bender: >>> Today I had to restart the RasPi. Now the container fails to start with >>> a segmentation violation. >> >> Not good --- sounds like you have data corruption. After an OS crash >> this is something that's quite possible if you haven't taken the time >> to qualify the storage subsystem's honoring of fsync. > > Well, it was a regular reboot... don't know what happend. > > Thank you for your tips, it's running again. :-D > -- Gruß Helmut