[GENERAL] PostgreSQL walsender process doesn't exist after "pg_ctl stop -mfast" - Mailing list pgsql-general
From | y39chen |
---|---|
Subject | [GENERAL] PostgreSQL walsender process doesn't exist after "pg_ctl stop -mfast" |
Date | |
Msg-id | 1510550710287-0.post@n3.nabble.com Whole thread Raw |
Responses |
Re: [GENERAL] PostgreSQL walsender process doesn't exist after "pg_ctl stop -m fast"
|
List | pgsql-general |
We encounter one problem that PostgreSQL walsender process doesn't exist after "pg_ctl stop -m fast". Uses PostgreSQL 9.6.2 Steps: 1)active postgres server is up; 2)standby postgres intance take pg_basebackup 3)usin command "pg_ctl stop -W -m fast -D /mnt/db/DBTestPostgres/db_data" to stop active postgres server; 4)active postgres server can't stop successfully. It hanging by walsender process and can't exit (we wait about 10min) 5)kill -9 walsender process, it exit right now. 1. make postgres coredump, the log as bellow Stack trace of thread 2288: #0 0x00007f7fae6ea013 __select (libc.so.6) #1 0x000000000046df6b ServerLoop (postgres) #2 0x000000000060d528 PostmasterMain (postgres) #3 0x000000000046f2dc main (postgres) #4 0x00007f7fae62a461 __libc_start_main (libc.so.6) #5 0x000000000046f35a _start (postgres) Nov 10 10:08:02 mn-1 systemd-coredump[8994]: Process 2645 (postgres) of user 550 dumped core. Stack trace of thread2645: #0 0x00007f7fae6f1d63 epoll_wait (libc.so.6) #1 0x000000000063ce69 WaitEventSetWait (postgres) #2 0x000000000063d1ab WaitLatchOrSocket (postgres) #3 0x0000000000622ead WalSndLoop (postgres) #4 0x0000000000623a8f exec_replication_command (postgres) #5 0x00000000006558fd PostgresMain (postgres) #6 0x000000000046e631 ServerLoop (postgres) #7 0x000000000060d528 PostmasterMain (postgres) #8 0x000000000046f2dc main (postgres) #9 0x00007f7fae62a461 __libc_start_main (libc.so.6) #10 0x000000000046f35a _start (postgres) 2. active postgres intance log: Nov 09 12:29:53 mn-1 postgres[6073]: [3-1] DEBUG: autovacuum: processing database "DBTestPostgres" Nov 09 12:30:13 mn-1 postgres[6204]: [3-1] DEBUG: autovacuum: processing database "postgreswd" Nov 09 12:30:20 mn-1 postgres[2231]: [35-1] DEBUG: checkpoint sync: number=1 file=base/16385/16467 time=14.990 msec Nov 09 12:30:20 mn-1 postgres[2231]: [36-1] LOG: checkpoint complete: wrote 1065 buffers (6.5%); 0 transaction log file(s) added, 0 removed, 2 recycled; write=59.892 s, sync=0.015 s, total=59.927 s; sync files=1, longest=0.014 s, average=0.014 s; distance=12487 kB, estimate=19411 kB Nov 09 12:30:20 mn-1 postgres[2231]: [37-1] LOG: checkpoint starting: force wait Nov 09 12:30:20 mn-1 postgres[2231]: [38-1] DEBUG: performing replication slot checkpoint Nov 09 12:30:21 mn-1 postgres[2231]: [39-1] DEBUG: checkpoint sync: number=1 file=base/16385/16467 time=59.400 msec Nov 09 12:30:21 mn-1 postgres[2231]: [40-1] LOG: checkpoint complete: wrote 1 buffers (0.0%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=1.102 s, sync=0.059 s, total=1.168 s; sync files=1, longest=0.059 s, average=0.059 s; distance=3900 kB, estimate=17860 kB Nov 09 12:30:23 mn-1 postgres[2235]: [28-1] DEBUG: archived transaction log file "000000020000000000000193" Nov 09 12:30:23 mn-1 postgres[2235]: [29-1] DEBUG: archived transaction log file "000000020000000000000193.00000108.backup" Nov 09 12:30:24 mn-1 postgres[6263]: [3-1] DEBUG: received replication command: IDENTIFY_SYSTEM Nov 09 12:30:24 mn-1 postgres[6263]: [4-1] DEBUG: received replication command: START_REPLICATION 0/65000000 TIMELINE 2 Nov 09 12:30:24 mn-1 postgres[6263]: [5-1] DEBUG: standby "walreceiver" has now caught up with primary Nov 09 12:30:24 mn-1 dbim-postgres[2039]: [2039-139870943506432] .... /opt/nokia/libexec/SS_RCPPostgress/bin/pg_ctl stop -W -m fast -D /mnt/db/DBTestPostgres/db_data Nov 09 12:30:24 mn-1 postgres[2157]: [3-1] LOG: received fast shutdown request Nov 09 12:30:24 mn-1 postgres[2157]: [4-1] LOG: aborting any active transactions Nov 09 12:30:24 mn-1 postgres[6073]: [4-1] err-2: terminating autovacuum process due to administrator command Nov 09 12:30:24 mn-1 postgres[2234]: [3-1] LOG: autovacuum launcher shutting down Nov 09 12:30:24 mn-1 postgres[2231]: [41-1] LOG: shutting down ...... Nov 09 12:30:44 mn-1 postgres[6408]: [5-1] err-2: the database system is shutting down Nov 09 12:30:46 mn-1 postgres[6411]: [5-1] err-2: the database system is shutting down Nov 09 12:30:48 mn-1 postgres[6415]: [5-1] err-2: the database system is shutting down Nov 09 12:30:50 mn-1 postgres[6434]: [5-1] err-2: the database system is shutting down Nov 09 12:30:52 mn-1 postgres[6442]: [5-1] err-2: the database system is shutting down 3. slave instance log: Nov 09 12:30:23 mn-0 postgres[11641]: [1-1] LOG: ending log output to stderr Nov 09 12:30:23 mn-0 dbim-postgres[11104]: LOG: ending log output to stderr Nov 09 12:30:23 mn-0 dbim-postgres[11104]: HINT: Future log output will go to log destination "syslog". Nov 09 12:30:23 mn-0 postgres[11641]: [1-2] HINT: Future log output will go to log destination "syslog". Nov 09 12:30:23 mn-0 postgres[11642]: [2-1] LOG: database system was interrupted; last known up at 2017-11-09 10:30:21 GMT Nov 09 12:30:23 mn-0 postgres[11642]: [3-1] LOG: entering standby mode Nov 09 12:30:23 mn-0 postgres[11642]: [4-1] DEBUG: checkpoint record is at 0/64C00140 Nov 09 12:30:23 mn-0 postgres[11642]: [5-1] DEBUG: redo record is at 0/64C00108; shutdown FALSE Nov 09 12:30:23 mn-0 postgres[11642]: [6-1] DEBUG: next transaction ID: 0:464; next OID: 16473 Nov 09 12:30:23 mn-0 postgres[11642]: [7-1] DEBUG: next MultiXactId: 1; next MultiXactOffset: 0 Nov 09 12:30:23 mn-0 postgres[11642]: [8-1] DEBUG: oldest unfrozen transaction ID: 418, in database 1 Nov 09 12:30:23 mn-0 postgres[11642]: [9-1] DEBUG: oldest MultiXactId: 1, in database 1 Nov 09 12:30:23 mn-0 postgres[11642]: [10-1] DEBUG: commit timestamp Xid oldest/newest: 0/0 Nov 09 12:30:23 mn-0 postgres[11642]: [11-1] DEBUG: transaction ID wrap limit is 2147484065, limited by database with OID 1 Nov 09 12:30:23 mn-0 postgres[11642]: [12-1] DEBUG: MultiXactId wrap limit is 2147483648, limited by database with OID 1 Nov 09 12:30:23 mn-0 postgres[11642]: [13-1] DEBUG: starting up replication slots Nov 09 12:30:23 mn-0 postgres[11642]: [14-1] DEBUG: resetting unlogged relations: cleanup 1 init 0 Nov 09 12:30:23 mn-0 postgres[11642]: [15-1] DEBUG: initializing for hot standby Nov 09 12:30:23 mn-0 postgres[11642]: [16-1] LOG: redo starts at 0/64C00108 Nov 09 12:30:23 mn-0 postgres[11642]: [17-1] DEBUG: recovery snapshots are now enabled Nov 09 12:30:23 mn-0 postgres[11642]: [17-2] CONTEXT: xlog redo at 0/64C00108 for Standby/RUNNING_XACTS: nextXid 464 latestCompletedXid 463 oldestRunningXid 464 Nov 09 12:30:23 mn-0 postgres[11642]: [18-1] DEBUG: end of backup reached Nov 09 12:30:23 mn-0 postgres[11642]: [18-2] CONTEXT: xlog redo at 0/64C001B0 for XLOG/BACKUP_END: 0/64C00108 Nov 09 12:30:23 mn-0 postgres[11642]: [19-1] LOG: consistent recovery state reached at 0/64C001D8 Nov 09 12:30:23 mn-0 postgres[11641]: [2-1] LOG: database system is ready to accept read only connections Nov 09 12:30:23 mn-0 postgres[11650]: [3-1] LOG: started streaming WAL from primary at 0/65000000 on timeline 2 Nov 09 12:30:24 mn-0 postgres[11642]: [20-1] DEBUG: transaction ID wrap limit is 2147484065, limited by database with OID 1 Nov 09 12:30:24 mn-0 postgres[11642]: [20-2] CONTEXT: xlog redo at 0/65400028 for XLOG/CHECKPOINT_SHUTDOWN: redo 0/65400028; tli 2; prev tli 2; fpw true; xid 0:464; oid 16473; multi 1; offset 0; oldest xid 418 in DB 1; oldest multi 1 in DB 1; oldest/newest commit timestamp xid: 0/0; oldest running xid 0; shutdown Nov 09 12:30:24 mn-0 postgres[11650]: [4-1] LOG: replication terminated by primary server Nov 09 12:30:24 mn-0 postgres[11650]: [4-2] DETAIL: End of WAL reached on timeline 2 at 0/65400098. Nov 09 12:30:24 mn-0 postgres[11650]: [5-1] err-2: could not send end-of-streaming message to primary: no COPY in progress Nov 09 12:30:24 mn-0 postgres[11650]: [5-2] Nov 09 12:30:24 mn-0 postgres[11642]: [21-1] LOG: invalid record length at 0/65400098: wanted 24, got 0 Nov 09 12:30:24 mn-0 postgres[11701]: [3-1] err-2: could not connect to the primary server: err-2: the database system is shutting down Nov 09 12:30:24 mn-0 postgres[11701]: [3-2] Nov 09 12:30:26 mn-0 postgres[11642]: [22-1] DEBUG: invalid record length at 0/65400098: wanted 24, got 0 Nov 09 12:30:26 mn-0 postgres[11720]: [3-1] err-2: could not connect to the primary server: err-2: the database system is shutting down Nov 09 12:30:26 mn-0 postgres[11720]: [3-2] Nov 09 12:30:28 mn-0 postgres[11642]: [23-1] DEBUG: invalid record length at 0/65400098: wanted 24, got 0 Nov 09 12:30:28 mn-0 postgres[11739]: [3-1] err-2: could not connect to the primary server: err-2: the database system is shutting down Nov 09 12:30:28 mn-0 postgres[11739]: [3-2] Nov 09 12:30:30 mn-0 postgres[11642]: [24-1] DEBUG: invalid record length at 0/65400098: wanted 24, got 0 Nov 09 12:30:30 mn-0 postgres[11767]: [3-1] err-2: could not connect to the primary server: err-2: the database system is shutting down Nov 09 12:30:30 mn-0 postgres[11767]: [3-2] Nov 09 12:30:32 mn-0 postgres[11642]: [25-1] DEBUG: invalid record length at 0/65400098: wanted 24, got 0 Nov 09 12:30:32 mn-0 postgres[11770]: [3-1] err-2: could not connect to the primary server: err-2: the database system is shutting down Nov 09 12:30:54 mn-0 postgres[11642]: [36-1] DEBUG: invalid record length at 0/65400098: wanted 24, got 0 Nov 09 12:30:54 mn-0 postgres[12088]: [3-1] err-2: could not connect to the primary server: err-2: the database system is shutting down Nov 09 12:30:54 mn-0 postgres[12088]: [3-2] -- Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
pgsql-general by date: