Home > mailing lists

Re: Can someone help explain what's going on from the attached logs? - Mailing list pgsql-general

From	Chris Redekop
Subject	Re: Can someone help explain what's going on from the attached logs?
Date	October 26, 2011 17:49:37
Msg-id	CAC2SuR+gXF+78hszhD2biKW9W6JhW3d5vinFicp8wi5PReUbLw@mail.gmail.com Whole thread Raw
In response to	Re: Can someone help explain what's going on from the attached logs? (Simon Riggs <simon@2ndQuadrant.com>)
Responses	Re: Can someone help explain what's going on from the attached logs? (Simon Riggs <simon@2ndQuadrant.com>)
List	pgsql-general

Tree view

Caveat #2 applies here
http://developer.postgresql.org/pgdocs/postgres/hot-standby.html#HOT-STANDBY-CAVEATS

The consistent state is delayed until your long running transactions
end, which is workload dependent but transient.

I'm not quite sure how this correlates to what I'm seeing in 9.1.1. When attempting to start a hotstandby while the primary is under load it seems to get stuck in that 'starting up' mode even when there are no open transactions. If I get it into this state, and then remove all the load from the primary it still will not finish starting up. If I select from pg_stat_activity on the primary it shows a couple connections, but they all have null 'xact_start's and <IDLE> 'current_query's. Even if I then kill all the connections to the primary (via pg_terminate_backend) the hotstandby still will not finish starting up. I would assume there can't be any transactions in progress if there are no connections to the primary. Attempting to restart the hotstandby when in this state produces the same result. If there is a transaction in progress for the duration of the backup (or something like that) can it cause it to get into this state?

Here is a debug3 log of this startup from 9.1.1:

2011-10-26 11:25:32.931 MDT [22640]: [1-1] LOG: database system was shut down in recovery at 2011-10-26 11:25:02 MDT

2011-10-26 11:25:32.931 MDT [22640]: [2-1] DEBUG: standby_mode = 'on'

2011-10-26 11:25:32.931 MDT [22640]: [3-1] DEBUG: primary_conninfo = 'host=othernode port=5432 user=postgres application_name=sync_rep_test'

2011-10-26 11:25:32.931 MDT [22640]: [4-1] DEBUG: trigger_file = '/db/data/trigger_file'

2011-10-26 11:25:32.931 MDT [22640]: [5-1] LOG: entering standby mode

2011-10-26 11:25:32.931 MDT [22640]: [6-1] DEBUG: checkpoint record is at 7/96667628

2011-10-26 11:25:32.931 MDT [22640]: [7-1] DEBUG: redo record is at 7/960000A8; shutdown FALSE

2011-10-26 11:25:32.931 MDT [22640]: [8-1] DEBUG: next transaction ID: 0/11855637; next OID: 3219844

2011-10-26 11:25:32.931 MDT [22640]: [9-1] DEBUG: next MultiXactId: 1895; next MultiXactOffset: 3810

2011-10-26 11:25:32.931 MDT [22640]: [10-1] DEBUG: oldest unfrozen transaction ID: 1669, in database 1

2011-10-26 11:25:32.931 MDT [22640]: [11-1] DEBUG: transaction ID wrap limit is 2147485316, limited by database with OID 1

2011-10-26 11:25:32.932 MDT [22640]: [12-1] DEBUG: resetting unlogged relations: cleanup 1 init 0

2011-10-26 11:25:32.939 MDT [22640]: [13-1] DEBUG: initializing for hot standby

2011-10-26 11:25:32.939 MDT [22640]: [14-1] LOG: redo starts at 7/960000A8

2011-10-26 11:25:33.027 MDT [22640]: [15-1] DEBUG: running transaction data initialized

2011-10-26 11:25:33.027 MDT [22640]: [16-1] CONTEXT: xlog redo running xacts: nextXid 11855637 latestCompletedXid 11855636 oldestRunningXid 11855250; 195 xacts: 11855359 11855360 ... 11855444 11855445; subxid ovf

2011-10-26 11:25:33.028 MDT [22640]: [17-1] DEBUG: 0 KnownAssignedXids (num=0 tail=0 head=0)

2011-10-26 11:25:33.028 MDT [22640]: [18-1] CONTEXT: xlog redo running xacts: nextXid 11855637 latestCompletedXid 11855636 oldestRunningXid 11855250; 195 xacts: 11855359 11855360 ... 11855444 11855445; subxid ovf

2011-10-26 11:25:33.028 MDT [22640]: [19-1] LOG: consistent state delayed because recovery snapshot incomplete

2011-10-26 11:25:33.028 MDT [22640]: [20-1] CONTEXT: xlog redo running xacts: nextXid 11855637 latestCompletedXid 11855636 oldestRunningXid 11855250; 195 xacts: 11855359 11855360 ... 11855444 11855445; subxid ovf

2011-10-26 11:25:33.514 MDT [22640]: [21-1] LOG: consistent recovery state reached at 7/98480E28

2011-10-26 11:25:33.515 MDT [22640]: [22-1] LOG: record with zero length at 7/98480E28

2011-10-26 11:25:33.515 MDT [22640]: [23-1] DEBUG: could not open file "pg_xlog/000000010000000700000098" (log file 7, segment 152): No such file or directory

2011-10-26 11:25:33.515 MDT [22642]: [1-1] DEBUG: find_in_dynamic_libpath: trying "/usr/local/postgresql-9.1.1/lib/libpqwalreceiver"

2011-10-26 11:25:33.515 MDT [22642]: [2-1] DEBUG: find_in_dynamic_libpath: trying "/usr/local/postgresql-9.1.1/lib/libpqwalreceiver.so"

DEBUG: received replication command: IDENTIFY_SYSTEM

DEBUG: received replication command: START_REPLICATION 7/98000000

2011-10-26 11:25:33.519 MDT [22642]: [3-1] LOG: streaming replication successfully connected to primary

2011-10-26 11:25:33.519 MDT [22642]: [4-1] DEBUG: sending write 0/0 flush 0/0 apply 7/98480E28

2011-10-26 11:25:33.522 MDT [22642]: [5-1] DEBUG: sending write 7/98020000 flush 0/0 apply 7/98480E28

2011-10-26 11:25:33.568 MDT [22642]: [6-1] DEBUG: sending write 7/98020000 flush 7/98020000 apply 7/98480E28

2011-10-26 11:25:33.569 MDT [22642]: [7-1] DEBUG: sending write 7/98040000 flush 7/98020000 apply 7/98480E28

...

2011-10-26 11:25:33.924 MDT [22642]: [49-1] DEBUG: sending write 7/98460000 flush 7/98400000 apply 7/98480E28

2011-10-26 11:25:33.943 MDT [22642]: [50-1] DEBUG: sending write 7/98460000 flush 7/98460000 apply 7/98480E28

DEBUG: standby "sync_rep_test" has now caught up with primary

2011-10-26 11:25:33.943 MDT [22642]: [51-1] DEBUG: sending write 7/98480E28 flush 7/98460000 apply 7/98480E28

2011-10-26 11:25:33.959 MDT [22642]: [52-1] DEBUG: sending write 7/98480E28 flush 7/98480E28 apply 7/98480E28

2011-10-26 11:25:34.192 MDT [22634]: [33-1] DEBUG: forked new backend, pid=22644 socket=7

2011-10-26 11:25:34.192 MDT [22644]: [1-1] FATAL: the database system is starting up

2011-10-26 11:25:34.192 MDT [22644]: [2-1] DEBUG: shmem_exit(1): 0 callbacks to make

2011-10-26 11:25:34.192 MDT [22644]: [3-1] DEBUG: proc_exit(1): 1 callbacks to make

2011-10-26 11:25:34.192 MDT [22644]: [4-1] DEBUG: exit(1)

2011-10-26 11:25:34.192 MDT [22644]: [5-1] DEBUG: shmem_exit(-1): 0 callbacks to make

2011-10-26 11:25:34.192 MDT [22644]: [6-1] DEBUG: proc_exit(-1): 0 callbacks to make

2011-10-26 11:25:34.192 MDT [22634]: [34-1] DEBUG: server process (PID 22644) exited with exit code 1

2011-10-26 11:25:35.171 MDT [22634]: [35-1] DEBUG: forked new backend, pid=22655 socket=7

2011-10-26 11:25:35.171 MDT [22655]: [1-1] FATAL: the database system is starting up

2011-10-26 11:25:35.171 MDT [22655]: [2-1] DEBUG: shmem_exit(1): 0 callbacks to make

2011-10-26 11:25:35.171 MDT [22655]: [3-1] DEBUG: proc_exit(1): 1 callbacks to make

2011-10-26 11:25:35.171 MDT [22655]: [4-1] DEBUG: exit(1)

2011-10-26 11:25:35.171 MDT [22655]: [5-1] DEBUG: shmem_exit(-1): 0 callbacks to make

2011-10-26 11:25:35.171 MDT [22655]: [6-1] DEBUG: proc_exit(-1): 0 callbacks to make

2011-10-26 11:25:35.172 MDT [22634]: [36-1] DEBUG: server process (PID 22655) exited with exit code 1

and so on...

pgsql-general by date:

From: Mike Blackwell
Date: 26 October 2011, 17:26:30
Subject: Re: All and ANY

From: John R Pierce
Date: 26 October 2011, 17:53:27
Subject: Re: All and ANY

Re: Can someone help explain what's going on from the attached logs? - Mailing list pgsql-general

Previous

Next