[GENERAL] PostgreSQL walsender process doesn't exist after "pg_ctl stop -mfast" - Mailing list pgsql-general

From y39chen
Subject [GENERAL] PostgreSQL walsender process doesn't exist after "pg_ctl stop -mfast"
Date
Msg-id 1510550710287-0.post@n3.nabble.com
Whole thread Raw
Responses Re: [GENERAL] PostgreSQL walsender process doesn't exist after "pg_ctl stop -m fast"
List pgsql-general
We encounter one problem that PostgreSQL walsender process doesn't exist
after "pg_ctl stop -m fast".
Uses PostgreSQL 9.6.2
Steps:
1)active postgres server  is up;
2)standby postgres intance take pg_basebackup 
3)usin command "pg_ctl stop -W -m fast -D /mnt/db/DBTestPostgres/db_data" to
stop active postgres server;
4)active postgres server can't stop successfully. It hanging by walsender
process and can't exit (we wait about 10min)
5)kill -9 walsender process, it exit right now. 

1. make postgres coredump, the log as bellow
        Stack trace of thread 2288:                                            #0  0x00007f7fae6ea013 __select
 
(libc.so.6)                                            #1  0x000000000046df6b
ServerLoop (postgres)                                            #2  0x000000000060d528
PostmasterMain (postgres)                                            #3  0x000000000046f2dc main
(postgres)                                            #4  0x00007f7fae62a461
__libc_start_main (libc.so.6)                                            #5  0x000000000046f35a _start
(postgres)

Nov 10 10:08:02 mn-1 systemd-coredump[8994]: Process 2645 (postgres) of user
550 dumped core.                                                                                        Stack trace of
thread2645:                                            #0  0x00007f7fae6f1d63
 
epoll_wait (libc.so.6)                                            #1  0x000000000063ce69
WaitEventSetWait (postgres)                                            #2  0x000000000063d1ab
WaitLatchOrSocket (postgres)                                            #3  0x0000000000622ead
WalSndLoop (postgres)                                            #4  0x0000000000623a8f
exec_replication_command (postgres)                                            #5  0x00000000006558fd
PostgresMain (postgres)                                            #6  0x000000000046e631
ServerLoop (postgres)                                            #7  0x000000000060d528
PostmasterMain (postgres)                                            #8  0x000000000046f2dc main
(postgres)                                            #9  0x00007f7fae62a461
__libc_start_main (libc.so.6)                                            #10 0x000000000046f35a _start
(postgres)


2. active postgres intance log:

Nov 09 12:29:53 mn-1 postgres[6073]: [3-1] DEBUG:  autovacuum: processing
database "DBTestPostgres"
Nov 09 12:30:13 mn-1 postgres[6204]: [3-1] DEBUG:  autovacuum: processing
database "postgreswd"
Nov 09 12:30:20 mn-1 postgres[2231]: [35-1] DEBUG:  checkpoint sync:
number=1 file=base/16385/16467 time=14.990 msec
Nov 09 12:30:20 mn-1 postgres[2231]: [36-1] LOG:  checkpoint complete: wrote
1065 buffers (6.5%); 0 transaction log file(s) added, 0 removed, 2 recycled;
write=59.892 s, sync=0.015 s, total=59.927 s; sync files=1, longest=0.014 s,
average=0.014 s; distance=12487 kB, estimate=19411 kB
Nov 09 12:30:20 mn-1 postgres[2231]: [37-1] LOG:  checkpoint starting: force
wait
Nov 09 12:30:20 mn-1 postgres[2231]: [38-1] DEBUG:  performing replication
slot checkpoint
Nov 09 12:30:21 mn-1 postgres[2231]: [39-1] DEBUG:  checkpoint sync:
number=1 file=base/16385/16467 time=59.400 msec
Nov 09 12:30:21 mn-1 postgres[2231]: [40-1] LOG:  checkpoint complete: wrote
1 buffers (0.0%); 0 transaction log file(s) added, 0 removed, 0 recycled;
write=1.102 s, sync=0.059 s, total=1.168 s; sync files=1, longest=0.059 s,
average=0.059 s; distance=3900 kB, estimate=17860 kB
Nov 09 12:30:23 mn-1 postgres[2235]: [28-1] DEBUG:  archived transaction log
file "000000020000000000000193"
Nov 09 12:30:23 mn-1 postgres[2235]: [29-1] DEBUG:  archived transaction log
file "000000020000000000000193.00000108.backup"
Nov 09 12:30:24 mn-1 postgres[6263]: [3-1] DEBUG:  received replication
command: IDENTIFY_SYSTEM
Nov 09 12:30:24 mn-1 postgres[6263]: [4-1] DEBUG:  received replication
command: START_REPLICATION 0/65000000 TIMELINE 2
Nov 09 12:30:24 mn-1 postgres[6263]: [5-1] DEBUG:  standby "walreceiver" has
now caught up with primary

Nov 09 12:30:24 mn-1 dbim-postgres[2039]: [2039-139870943506432] ....
/opt/nokia/libexec/SS_RCPPostgress/bin/pg_ctl stop -W -m fast -D
/mnt/db/DBTestPostgres/db_data
Nov 09 12:30:24 mn-1 postgres[2157]: [3-1] LOG:  received fast shutdown
request
Nov 09 12:30:24 mn-1 postgres[2157]: [4-1] LOG:  aborting any active
transactions
Nov 09 12:30:24 mn-1 postgres[6073]: [4-1] err-2:  terminating autovacuum
process due to administrator command
Nov 09 12:30:24 mn-1 postgres[2234]: [3-1] LOG:  autovacuum launcher
shutting down
Nov 09 12:30:24 mn-1 postgres[2231]: [41-1] LOG:  shutting down
......
Nov 09 12:30:44 mn-1 postgres[6408]: [5-1] err-2:  the database system is
shutting down
Nov 09 12:30:46 mn-1 postgres[6411]: [5-1] err-2:  the database system is
shutting down
Nov 09 12:30:48 mn-1 postgres[6415]: [5-1] err-2:  the database system is
shutting down
Nov 09 12:30:50 mn-1 postgres[6434]: [5-1] err-2:  the database system is
shutting down
Nov 09 12:30:52 mn-1 postgres[6442]: [5-1] err-2:  the database system is
shutting down

3. slave instance log:
Nov 09 12:30:23 mn-0 postgres[11641]: [1-1] LOG:  ending log output to
stderr
Nov 09 12:30:23 mn-0 dbim-postgres[11104]: LOG:  ending log output to stderr
Nov 09 12:30:23 mn-0 dbim-postgres[11104]: HINT:  Future log output will go
to log destination "syslog".
Nov 09 12:30:23 mn-0 postgres[11641]: [1-2] HINT:  Future log output will go
to log destination "syslog".
Nov 09 12:30:23 mn-0 postgres[11642]: [2-1] LOG:  database system was
interrupted; last known up at 2017-11-09 10:30:21 GMT
Nov 09 12:30:23 mn-0 postgres[11642]: [3-1] LOG:  entering standby mode
Nov 09 12:30:23 mn-0 postgres[11642]: [4-1] DEBUG:  checkpoint record is at
0/64C00140
Nov 09 12:30:23 mn-0 postgres[11642]: [5-1] DEBUG:  redo record is at
0/64C00108; shutdown FALSE
Nov 09 12:30:23 mn-0 postgres[11642]: [6-1] DEBUG:  next transaction ID:
0:464; next OID: 16473
Nov 09 12:30:23 mn-0 postgres[11642]: [7-1] DEBUG:  next MultiXactId: 1;
next MultiXactOffset: 0
Nov 09 12:30:23 mn-0 postgres[11642]: [8-1] DEBUG:  oldest unfrozen
transaction ID: 418, in database 1
Nov 09 12:30:23 mn-0 postgres[11642]: [9-1] DEBUG:  oldest MultiXactId: 1,
in database 1
Nov 09 12:30:23 mn-0 postgres[11642]: [10-1] DEBUG:  commit timestamp Xid
oldest/newest: 0/0
Nov 09 12:30:23 mn-0 postgres[11642]: [11-1] DEBUG:  transaction ID wrap
limit is 2147484065, limited by database with OID 1
Nov 09 12:30:23 mn-0 postgres[11642]: [12-1] DEBUG:  MultiXactId wrap limit
is 2147483648, limited by database with OID 1
Nov 09 12:30:23 mn-0 postgres[11642]: [13-1] DEBUG:  starting up replication
slots
Nov 09 12:30:23 mn-0 postgres[11642]: [14-1] DEBUG:  resetting unlogged
relations: cleanup 1 init 0
Nov 09 12:30:23 mn-0 postgres[11642]: [15-1] DEBUG:  initializing for hot
standby
Nov 09 12:30:23 mn-0 postgres[11642]: [16-1] LOG:  redo starts at 0/64C00108
Nov 09 12:30:23 mn-0 postgres[11642]: [17-1] DEBUG:  recovery snapshots are
now enabled
Nov 09 12:30:23 mn-0 postgres[11642]: [17-2] CONTEXT:  xlog redo at
0/64C00108 for Standby/RUNNING_XACTS: nextXid 464 latestCompletedXid 463
oldestRunningXid 464
Nov 09 12:30:23 mn-0 postgres[11642]: [18-1] DEBUG:  end of backup reached
Nov 09 12:30:23 mn-0 postgres[11642]: [18-2] CONTEXT:  xlog redo at
0/64C001B0 for XLOG/BACKUP_END: 0/64C00108
Nov 09 12:30:23 mn-0 postgres[11642]: [19-1] LOG:  consistent recovery state
reached at 0/64C001D8
Nov 09 12:30:23 mn-0 postgres[11641]: [2-1] LOG:  database system is ready
to accept read only connections
Nov 09 12:30:23 mn-0 postgres[11650]: [3-1] LOG:  started streaming WAL from
primary at 0/65000000 on timeline 2
Nov 09 12:30:24 mn-0 postgres[11642]: [20-1] DEBUG:  transaction ID wrap
limit is 2147484065, limited by database with OID 1
Nov 09 12:30:24 mn-0 postgres[11642]: [20-2] CONTEXT:  xlog redo at
0/65400028 for XLOG/CHECKPOINT_SHUTDOWN: redo 0/65400028; tli 2; prev tli 2;
fpw true; xid 0:464; oid 16473; multi 1; offset 0; oldest xid 418 in DB 1;
oldest multi 1 in DB 1; oldest/newest commit timestamp xid: 0/0; oldest
running xid 0; shutdown
Nov 09 12:30:24 mn-0 postgres[11650]: [4-1] LOG:  replication terminated by
primary server
Nov 09 12:30:24 mn-0 postgres[11650]: [4-2] DETAIL:  End of WAL reached on
timeline 2 at 0/65400098.
Nov 09 12:30:24 mn-0 postgres[11650]: [5-1] err-2:  could not send
end-of-streaming message to primary: no COPY in progress
Nov 09 12:30:24 mn-0 postgres[11650]: [5-2]
Nov 09 12:30:24 mn-0 postgres[11642]: [21-1] LOG:  invalid record length at
0/65400098: wanted 24, got 0
Nov 09 12:30:24 mn-0 postgres[11701]: [3-1] err-2:  could not connect to the
primary server: err-2:  the database system is shutting down
Nov 09 12:30:24 mn-0 postgres[11701]: [3-2]
Nov 09 12:30:26 mn-0 postgres[11642]: [22-1] DEBUG:  invalid record length
at 0/65400098: wanted 24, got 0
Nov 09 12:30:26 mn-0 postgres[11720]: [3-1] err-2:  could not connect to the
primary server: err-2:  the database system is shutting down
Nov 09 12:30:26 mn-0 postgres[11720]: [3-2]
Nov 09 12:30:28 mn-0 postgres[11642]: [23-1] DEBUG:  invalid record length
at 0/65400098: wanted 24, got 0
Nov 09 12:30:28 mn-0 postgres[11739]: [3-1] err-2:  could not connect to the
primary server: err-2:  the database system is shutting down
Nov 09 12:30:28 mn-0 postgres[11739]: [3-2]
Nov 09 12:30:30 mn-0 postgres[11642]: [24-1] DEBUG:  invalid record length
at 0/65400098: wanted 24, got 0
Nov 09 12:30:30 mn-0 postgres[11767]: [3-1] err-2:  could not connect to the
primary server: err-2:  the database system is shutting down
Nov 09 12:30:30 mn-0 postgres[11767]: [3-2]
Nov 09 12:30:32 mn-0 postgres[11642]: [25-1] DEBUG:  invalid record length
at 0/65400098: wanted 24, got 0
Nov 09 12:30:32 mn-0 postgres[11770]: [3-1] err-2:  could not connect to the
primary server: err-2:  the database system is shutting down

Nov 09 12:30:54 mn-0 postgres[11642]: [36-1] DEBUG:  invalid record length
at 0/65400098: wanted 24, got 0
Nov 09 12:30:54 mn-0 postgres[12088]: [3-1] err-2:  could not connect to the
primary server: err-2:  the database system is shutting down
Nov 09 12:30:54 mn-0 postgres[12088]: [3-2]




--
Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

pgsql-general by date:

Previous
From: John R Pierce
Date:
Subject: Re: [GENERAL] Migrating plattaform
Next
From: Andreas Joseph Krogh
Date:
Subject: [GENERAL] PG-10 + ICU and abbreviated keys