Re: Postgres 9.2.13 on AIX 7.1 - Mailing list pgsql-bugs

From Rainer Tammer
Subject Re: Postgres 9.2.13 on AIX 7.1
Date
Msg-id 5e4f9356-26cc-bd75-4f82-92d26ce575f7@spg.schulergroup.com
Whole thread Raw
In response to Re: Postgres 9.2.13 on AIX 7.1  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Postgres 9.2.13 on AIX 7.1  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs
Hello,
It did run the server with auto vacuum disabled for ~ 24h - no server shutdown.
After re-enabling auto vacuum the server dies in less then 9 hours:

Started: 2021-08-25 08:12:29
Dies: 2021-08-25 16:22:33


During the time of the shutdown there was no access to the server.
No running applications and no psql cli sessions.

I will let it run over night and see it the server is going down again.
There is no software installed on this AIX LPAR which uses this instance or sends signals to the server.
I did only do some interaction during the day to see if the server is working correctly.
Unfortunately I can not really see in the main process which other process did sent the signal SIGINT.

This is the only correlation I see:

2021-08-25 16:22:27 CEST  DEBUG:  server process (PID 19005776) exited with exit code 0
2021-08-25 16:22:33 CEST  DEBUG:  postmaster received signal 2
2021-08-25 16:22:33 CEST  LOG:  received fast shutdown request
2021-08-25 16:22:33 CEST  LOG:  aborting any active transactions
2021-08-25 16:22:33 CEST  LOG:  autovacuum launcher shutting down


The time gap is 6s.... so it might be a bit far away from the last process exit.

I could migrate the test DB to 9.6.23 and see if the problem persists.
Would it be worth adding additional code before every signal to trace the source ID and the target PID as well as the source/target process name?

The OS is at the latest patch level.
The compiler is at the latest patch level.
The 9.2.x is at the latest patch level.

I can also run a trace tomorrow, this would give me some information:

Sample output (shortened):

Wed Aug 25 17:58:51 2021
System: AIX 7.2 Node: host
Machine: 000000000000
Internet Address: 00000000 1.1.1.1
At trace startup, the system contained 16 cpus, of which 16 were traced.
Buffering: Kernel Heap
This is from a 64-bit kernel.
Tracing only these hooks, 14e0

ID   PROCESS NAME   PID      TID      I SYSTEM CALL    ELAPSED_SEC     DELTA_MSEC   APPL    SYSCALL KERNEL  INTERRUPT

001  trace          23789978 87687537                  0.000000000       0.000000                   TRACE ON channel 0
                                                                                                    Wed Aug 25 17:58:51 2021
14E  postgres:      18743746 85000571                  7.903995939    2994.175459                   kill: signal SIGUSR1 to process 25166296 postgres
14E  --1-           -1       85393753                  7.904962367       0.966428                   kill: signal SIGUSR2 to process 18743746 postgres:
14E  --1-           -1       85393753                  7.946566507      41.604140                   kill: signal SIGUSR2 to process 18743746 postgres:
14E  postgres:      18743746 85000571                 17.902007437    2992.131623                   kill: signal SIGUSR1 to process 25166296 postgres
14E  --1-           -1       94437835                 17.903004949       0.997512                   kill: signal SIGUSR2 to process 18743746 postgres:
14E  --1-           -1       94437835                 17.935897005      32.892056                   kill: signal SIGUSR2 to process 18743746 postgres:
14E  postgres:      18743746 85000571                 28.001327251    3091.401199                   kill: signal SIGUSR1 to process 25166296 postgres
14E  --1-           -1       40042983                 28.002307781       0.980530                   kill: signal SIGUSR2 to process 18743746 postgres:
14E  --1-           -1       40042983                 28.032432646      30.124865                   kill: signal SIGUSR2 to process 18743746 postgres:
14E  postgres:      18743746 85000571                 37.901060572    2991.083160                   kill: signal SIGUSR1 to process 25166296 postgres
14E  --1-           -1       88539511                 37.902072470       1.011898                   kill: signal SIGUSR2 to process 18743746 postgres:
14E  --1-           -1       88539511                 37.936426058      34.353588                   kill: signal SIGUSR2 to process 18743746 postgres:



I do not observe this with V8.x servers.

That stupid problem is taking my nerves!!

Bye
  Rainer

On 25.08.2021 17:13, Tom Lane wrote:
Rainer Tammer <pgsql@spg.schulergroup.com> writes:
2021-08-25 16:22:33 CEST  DEBUG: postmaster received signal 2
2021-08-25 16:22:33 CEST  LOG:  received fast shutdown request
Well, something sent the postmaster SIGINT.  There isn't any
mechanism within Postgres itself that would do that; you need
to look for outside causes.
			regards, tom lane


pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: BUG #17160: PostgreSQL13.4:Build failure with GNU Compiler.
Next
From: Tom Lane
Date:
Subject: Re: Postgres 9.2.13 on AIX 7.1