Thread: [GENERAL] Recovery Assistance
Hi,
I have a Atlassian Confluence Wiki that depends on postgres, but I haven't much experience with postgres other than for this purpose.
A few days ago, the hard disk filled, so all services stopped working.
When the admin realised this he increased the disk size (its in a cloud, so that was easy to do) however I think the way it shutdown left Postgres in an inconsistent state for some reason.
Now when I start it up I get this message in the log over and over again:
"FATAL: the database system is starting up"
I have a backup, which I have successfully recovered, but it is 24 hours old, the next backup was the cause of the disk filling. So I'm using this as exercise in learning a bit more about postgres.
I did some research and found a number of options. I took a file level backup with the service not running then tried 2 things. I haven't found much else on what to do though.
Attempt 1 - Reset Log
It sounded like this shouldn't be my first option (it wasn't) but it did sound like what I needed to do.
I ran this command
./pg_resetxlog /var/lib/postgresql/9.3/main -f
It worked a treat, the database did startup ok.
However when I tried to dump the DB:
root@atlassian:/home/myuser# sudo -u postgres pg_dump confluencedb > $now-confluencedb.sql
pg_dump: Dumping the contents of table "bodycontent" failed: PQgetResult() failed.
pg_dump: Error message from server: ERROR: unexpected chunk size 104 (expected 1996) in chunk 3 of 22 for toast value 48862 in pg_toast_20028
pg_dump: The command was: COPY public.bodycontent (bodycontentid, body, contentid, bodytypeid) TO stdout;
The dump failed, so I assume this did leave my database in an inconsistent state.
Attempt 2 - startup manually and let it try recovery
I restored my file level backup and started again.
This time I tried to startup manually on the command line to see the output (I'd done it as a service startup a number of times to nearly the same effect too)
postgres@atlassian:/usr/lib/postgresql/9.3/bin$ ./pg_ctl -D /etc/postgresql/9.3/main start
server starting
postgres@atlassian:/usr/lib/postgresql/9.3/bin$ 2017-01-27 20:36:08 AEDT LOG: database system was interrupted while in recovery at 2017-01-27 20:13:26 AEDT
2017-01-27 20:36:08 AEDT HINT: This probably means that some data is corrupted and you will have to use the last backup for recovery.
2017-01-27 20:36:08 AEDT LOG: database system was not properly shut down; automatic recovery in progress
2017-01-27 20:36:08 AEDT LOG: redo starts at 5/528B4558
2017-01-27 20:36:18 AEDT LOG: redo done at 5/A3FFF9E8
2017-01-27 20:36:18 AEDT LOG: last completed transaction was at log time 2017-01-24 02:08:00.023064+11
2017-01-27 23:00:01 AEDT FATAL: the database system is starting up
2017-01-27 23:00:01 AEDT FATAL: the database system is starting up
I ran this command
./pg_resetxlog /var/lib/postgresql/9.3/main -f
It worked a treat, the database did startup ok.
However when I tried to dump the DB:
root@atlassian:/home/myuser# sudo -u postgres pg_dump confluencedb > $now-confluencedb.sql
pg_dump: Dumping the contents of table "bodycontent" failed: PQgetResult() failed.
pg_dump: Error message from server: ERROR: unexpected chunk size 104 (expected 1996) in chunk 3 of 22 for toast value 48862 in pg_toast_20028
pg_dump: The command was: COPY public.bodycontent (bodycontentid, body, contentid, bodytypeid) TO stdout;
The dump failed, so I assume this did leave my database in an inconsistent state.
Attempt 2 - startup manually and let it try recovery
I restored my file level backup and started again.
This time I tried to startup manually on the command line to see the output (I'd done it as a service startup a number of times to nearly the same effect too)
postgres@atlassian:/usr/lib/postgresql/9.3/bin$ ./pg_ctl -D /etc/postgresql/9.3/main start
server starting
postgres@atlassian:/usr/lib/postgresql/9.3/bin$ 2017-01-27 20:36:08 AEDT LOG: database system was interrupted while in recovery at 2017-01-27 20:13:26 AEDT
2017-01-27 20:36:08 AEDT HINT: This probably means that some data is corrupted and you will have to use the last backup for recovery.
2017-01-27 20:36:08 AEDT LOG: database system was not properly shut down; automatic recovery in progress
2017-01-27 20:36:08 AEDT LOG: redo starts at 5/528B4558
2017-01-27 20:36:18 AEDT LOG: redo done at 5/A3FFF9E8
2017-01-27 20:36:18 AEDT LOG: last completed transaction was at log time 2017-01-24 02:08:00.023064+11
2017-01-27 23:00:01 AEDT FATAL: the database system is starting up
2017-01-27 23:00:01 AEDT FATAL: the database system is starting up
I've left it that way for 12 hours, and its still not allowing connections.
I assume its doing some kind of consistency check?
Does anyone have any suggestions on what I should be doing to try and restore this database?
- The amount of change is minimal in the DB after 6pm it should be basically no change overnight.
- The log above seems to suggest it has completed a redo ok, is that right?
- The last completed transaction time 2017-01-24 02:08:00.023064+11 the log mentions would be fine to use, so loosing even a few hours before that would be more than adequate.
I'm just not clear what the database is doing now, or how I should be trying to recover it.
Any help anyone can suggest would be great! I've given myself this weekend to attempt to recover more than the last backup, after that I need to restore the service for my team to use and suck up the lost last day of updates.
Thanks,
Brian
On 01/27/2017 01:31 PM, Brian Mills wrote: > Hi, > > I have a Atlassian Confluence Wiki that depends on postgres, but I > haven't much experience with postgres other than for this purpose. > > A few days ago, the hard disk filled, so all services stopped working. > When the admin realised this he increased the disk size (its in a cloud, > so that was easy to do) however I think the way it shutdown left > Postgres in an inconsistent state for some reason. > Now when I start it up I get this message in the log over and over again: > "FATAL: the database system is starting up" > > I have a backup, which I have successfully recovered, but it is 24 hours > old, the next backup was the cause of the disk filling. So I'm using > this as exercise in learning a bit more about postgres. > > I did some research and found a number of options. I took a file level > backup with the service not running then tried 2 things. I haven't found > much else on what to do though. > > *Attempt 1 - Reset Log * > > It sounded like this shouldn't be my first option (it wasn't) but it did > sound like what I needed to do. > I ran this command > ./pg_resetxlog /var/lib/postgresql/9.3/main -f > It worked a treat, the database did startup ok. > However when I tried to dump the DB: > root@atlassian:/home/myuser# sudo -u postgres pg_dump confluencedb > > $now-confluencedb.sql > pg_dump: Dumping the contents of table "bodycontent" failed: > PQgetResult() failed. > pg_dump: Error message from server: ERROR: unexpected chunk size 104 > (expected 1996) in chunk 3 of 22 for toast value 48862 in pg_toast_20028 > pg_dump: The command was: COPY public.bodycontent (bodycontentid, body, > contentid, bodytypeid) TO stdout; > The dump failed, so I assume this did leave my database in an > inconsistent state. > > > *Attempt 2 - startup manually and let it try recovery* > > I restored my file level backup and started again. > This time I tried to startup manually on the command line to see the > output (I'd done it as a service startup a number of times to nearly the > same effect too) > > postgres@atlassian:/usr/lib/postgresql/9.3/bin$ ./pg_ctl -D > /etc/postgresql/9.3/main start > server starting > postgres@atlassian:/usr/lib/postgresql/9.3/bin$ 2017-01-27 20:36:08 AEDT > LOG: database system was interrupted while in recovery at 2017-01-27 > 20:13:26 AEDT > 2017-01-27 20:36:08 AEDT HINT: This probably means that some data is > corrupted and you will have to use the last backup for recovery. > 2017-01-27 20:36:08 AEDT LOG: database system was not properly shut > down; automatic recovery in progress > 2017-01-27 20:36:08 AEDT LOG: redo starts at 5/528B4558 > 2017-01-27 20:36:18 AEDT LOG: redo done at 5/A3FFF9E8 > 2017-01-27 20:36:18 AEDT LOG: last completed transaction was at log > time 2017-01-24 02:08:00.023064+11 > 2017-01-27 23:00:01 AEDT FATAL: the database system is starting up > 2017-01-27 23:00:01 AEDT FATAL: the database system is starting up What does ps ax | grep post show? Is the cluster set up to log to a file, if so what does it show? Does the system log show anything relevant? > > I've left it that way for 12 hours, and its still not allowing connections. > I assume its doing some kind of consistency check? What does it say when you attempt a connection? > > Does anyone have any suggestions on what I should be doing to try and > restore this database? > > - The amount of change is minimal in the DB after 6pm it should be > basically no change overnight. > - The log above seems to suggest it has completed a redo ok, is that right? > - The last completed transaction time 2017-01-24 02:08:00.023064+11 the > log mentions would be fine to use, so loosing even a few hours before > that would be more than adequate. > > I'm just not clear what the database is doing now, or how I should be > trying to recover it. > > Any help anyone can suggest would be great! I've given myself this > weekend to attempt to recover more than the last backup, after that I > need to restore the service for my team to use and suck up the lost last > day of updates. > > Thanks, > Brian -- Adrian Klaver adrian.klaver@aklaver.com
First of all, Thank you for your time to assist me learning. I really appreciate it.
root# ps ax | grep post
1364 ? Ss 0:00 /usr/lib/postfix/master
5198 pts/3 S 0:00 su postgres
5221 pts/3 S 0:00 /usr/lib/postgresql/9.3/bin/postgres -D /etc/postgresql/9.3/main
5222 ? Ss 0:10 postgres: startup process recovering 0000000100000005000000A3
7930 pts/4 S+ 0:00 grep --color=auto post
Its a single machine postgres database server, so I'm assuming there is no cluster log. If there is one, where would I look for it?
The only log in /var/log/postgres is postgresql-9.3-main.log
which shows (tail):
2017-01-27 20:27:01 AEDT LOG: database system was shut down at 2017-01-27 20:26:29 AEDT
2017-01-27 20:27:01 AEDT LOG: MultiXact member wraparound protections are now enabled
2017-01-27 20:27:01 AEDT LOG: autovacuum launcher started
2017-01-27 20:27:01 AEDT LOG: database system is ready to accept connections
2017-01-27 20:27:02 AEDT LOG: incomplete startup packet
2017-01-27 20:28:54 AEDT ERROR: unexpected chunk size 104 (expected 1996) in chunk 3 of 22 for toast value 48862 in pg_toast_20028
2017-01-27 20:28:54 AEDT STATEMENT: COPY public.bodycontent (bodycontentid, body, contentid, bodytypeid) TO stdout;
2017-01-27 20:30:13 AEDT LOG: received fast shutdown request
2017-01-27 20:30:13 AEDT LOG: aborting any active transactions
2017-01-27 20:30:13 AEDT LOG: autovacuum launcher shutting down
2017-01-27 20:30:13 AEDT LOG: shutting down
2017-01-27 20:30:13 AEDT LOG: database system is shut down
I ran the screen utility so I could leave the db started using the pg_ctl command. The later logs in that session have not progressed, its last entry is still
2017-01-27 23:00:01 AEDT FATAL: the database system is starting up
Which is still later datetime than the /var/log/postgres... log.
Connection attempt shows the same log
postgres@atlassian:/home/myuser$ psql
psql: FATAL: the database system is starting up
Nothing in the syslog that seems connected.
Brian Mills
CTO

Mob: 0410660003
Level 1 | 600 Chapel Street | South Yarra| VIC | 3141 | Australia
On 28 January 2017 at 12:05, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 01/27/2017 01:31 PM, Brian Mills wrote:Hi,
I have a Atlassian Confluence Wiki that depends on postgres, but I
haven't much experience with postgres other than for this purpose.
A few days ago, the hard disk filled, so all services stopped working.
When the admin realised this he increased the disk size (its in a cloud,
so that was easy to do) however I think the way it shutdown left
Postgres in an inconsistent state for some reason.
Now when I start it up I get this message in the log over and over again:
"FATAL: the database system is starting up"
I have a backup, which I have successfully recovered, but it is 24 hours
old, the next backup was the cause of the disk filling. So I'm using
this as exercise in learning a bit more about postgres.
I did some research and found a number of options. I took a file level
backup with the service not running then tried 2 things. I haven't found
much else on what to do though.
*Attempt 1 - Reset Log *
It sounded like this shouldn't be my first option (it wasn't) but it did
sound like what I needed to do.
I ran this command
./pg_resetxlog /var/lib/postgresql/9.3/main -f
It worked a treat, the database did startup ok.
However when I tried to dump the DB:
root@atlassian:/home/myuser# sudo -u postgres pg_dump confluencedb >
$now-confluencedb.sql
pg_dump: Dumping the contents of table "bodycontent" failed:
PQgetResult() failed.
pg_dump: Error message from server: ERROR: unexpected chunk size 104
(expected 1996) in chunk 3 of 22 for toast value 48862 in pg_toast_20028
pg_dump: The command was: COPY public.bodycontent (bodycontentid, body,
contentid, bodytypeid) TO stdout;
The dump failed, so I assume this did leave my database in an
inconsistent state.
*Attempt 2 - startup manually and let it try recovery*
I restored my file level backup and started again.
This time I tried to startup manually on the command line to see the
output (I'd done it as a service startup a number of times to nearly the
same effect too)
postgres@atlassian:/usr/lib/postgresql/9.3/bin$ ./pg_ctl -D
/etc/postgresql/9.3/main start
server starting
postgres@atlassian:/usr/lib/postgresql/9.3/bin$ 2017-01-27 20:36:08 AEDT
LOG: database system was interrupted while in recovery at 2017-01-27
20:13:26 AEDT
2017-01-27 20:36:08 AEDT HINT: This probably means that some data is
corrupted and you will have to use the last backup for recovery.
2017-01-27 20:36:08 AEDT LOG: database system was not properly shut
down; automatic recovery in progress
2017-01-27 20:36:08 AEDT LOG: redo starts at 5/528B4558
2017-01-27 20:36:18 AEDT LOG: redo done at 5/A3FFF9E8
2017-01-27 20:36:18 AEDT LOG: last completed transaction was at log
time 2017-01-24 02:08:00.023064+11
2017-01-27 23:00:01 AEDT FATAL: the database system is starting up
2017-01-27 23:00:01 AEDT FATAL: the database system is starting up
What does ps ax | grep post show?
Is the cluster set up to log to a file, if so what does it show?
Does the system log show anything relevant?
I've left it that way for 12 hours, and its still not allowing connections.
I assume its doing some kind of consistency check?
What does it say when you attempt a connection?--
Does anyone have any suggestions on what I should be doing to try and
restore this database?
- The amount of change is minimal in the DB after 6pm it should be
basically no change overnight.
- The log above seems to suggest it has completed a redo ok, is that right?
- The last completed transaction time 2017-01-24 02:08:00.023064+11 the
log mentions would be fine to use, so loosing even a few hours before
that would be more than adequate.
I'm just not clear what the database is doing now, or how I should be
trying to recover it.
Any help anyone can suggest would be great! I've given myself this
weekend to attempt to recover more than the last backup, after that I
need to restore the service for my team to use and suck up the lost last
day of updates.
Thanks,
Brian
Adrian Klaver
adrian.klaver@aklaver.com
On 01/27/2017 05:40 PM, Brian Mills wrote: > First of all, Thank you for your time to assist me learning. I really > appreciate it. > > root# ps ax | grep post > 1364 ? Ss 0:00 /usr/lib/postfix/master > 5198 pts/3 S 0:00 su postgres > 5221 pts/3 S 0:00 /usr/lib/postgresql/9.3/bin/postgres -D > /etc/postgresql/9.3/main > 5222 ? Ss 0:10 postgres: startup process recovering > 0000000100000005000000A3 > 7930 pts/4 S+ 0:00 grep --color=auto post So if you check back does the recovering XXXXXXXX part change? If so Postgres is walking through the WAL files as it should. > > Its a single machine postgres database server, so I'm assuming there is > no cluster log. If there is one, where would I look for it? > The only log in /var/log/postgres is postgresql-9.3-main.log That would be it. A single Postgres instance has multiple databases in it, by default it starts with template0, template1 and postgres databases. Then add whatever databases you created and you have a cluster of databases. > which shows (tail): > > 2017-01-27 20:27:01 AEDT LOG: database system was shut down at > 2017-01-27 20:26:29 AEDT > 2017-01-27 20:27:01 AEDT LOG: MultiXact member wraparound protections > are now enabled > 2017-01-27 20:27:01 AEDT LOG: autovacuum launcher started > 2017-01-27 20:27:01 AEDT LOG: database system is ready to accept > connections > 2017-01-27 20:27:02 AEDT LOG: incomplete startup packet > 2017-01-27 20:28:54 AEDT ERROR: unexpected chunk size 104 (expected > 1996) in chunk 3 of 22 for toast value 48862 in pg_toast_20028 > 2017-01-27 20:28:54 AEDT STATEMENT: COPY public.bodycontent > (bodycontentid, body, contentid, bodytypeid) TO stdout; > 2017-01-27 20:30:13 AEDT LOG: received fast shutdown request > 2017-01-27 20:30:13 AEDT LOG: aborting any active transactions > 2017-01-27 20:30:13 AEDT LOG: autovacuum launcher shutting down > 2017-01-27 20:30:13 AEDT LOG: shutting down > 2017-01-27 20:30:13 AEDT LOG: database system is shut down That would be your Attempt 1 log. > > I ran the screen utility so I could leave the db started using > the pg_ctl command. The later logs in that session have not progressed, > its last entry is still > 2017-01-27 23:00:01 AEDT FATAL: the database system is starting up > Which is still later datetime than the /var/log/postgres... log. So it is just logging to stdout and not to the log file. > > Connection attempt shows the same log > postgres@atlassian:/home/myuser$ psql > psql: FATAL: the database system is starting up > > Nothing in the syslog that seems connected. > > *Brian Mills* > CTO > > > *Mob: *0410660003 <tel:0410660003> > *Melbourne* 03 9012 3460 <tel:03%209012%203460> or 03 8376 6327 > <tel:03%208376%206327> *|* * **Sydney* 02 8064 3600 > <tel:02%208064%203600> *|* *Brisbane* 07 3173 1570 <tel:07%203173%201570> > Level 1 *|* 600 Chapel Street *|* South > Yarra*|* VIC *|* 3141 *|* Australia > > <https://www.facebook.com/TryBooking/> <https://twitter.com/trybooking> <https://www.linkedin.com/company/trybooking-com> > > On 28 January 2017 at 12:05, Adrian Klaver <adrian.klaver@aklaver.com > <mailto:adrian.klaver@aklaver.com>> wrote: > > On 01/27/2017 01:31 PM, Brian Mills wrote: > > Hi, > > I have a Atlassian Confluence Wiki that depends on postgres, but I > haven't much experience with postgres other than for this purpose. > > A few days ago, the hard disk filled, so all services stopped > working. > When the admin realised this he increased the disk size (its in > a cloud, > so that was easy to do) however I think the way it shutdown left > Postgres in an inconsistent state for some reason. > Now when I start it up I get this message in the log over and > over again: > "FATAL: the database system is starting up" > > I have a backup, which I have successfully recovered, but it is > 24 hours > old, the next backup was the cause of the disk filling. So I'm using > this as exercise in learning a bit more about postgres. > > I did some research and found a number of options. I took a file > level > backup with the service not running then tried 2 things. I > haven't found > much else on what to do though. > > *Attempt 1 - Reset Log * > > It sounded like this shouldn't be my first option (it wasn't) > but it did > sound like what I needed to do. > I ran this command > ./pg_resetxlog /var/lib/postgresql/9.3/main -f > It worked a treat, the database did startup ok. > However when I tried to dump the DB: > root@atlassian:/home/myuser# sudo -u postgres pg_dump confluencedb > > $now-confluencedb.sql > pg_dump: Dumping the contents of table "bodycontent" failed: > PQgetResult() failed. > pg_dump: Error message from server: ERROR: unexpected chunk > size 104 > (expected 1996) in chunk 3 of 22 for toast value 48862 in > pg_toast_20028 > pg_dump: The command was: COPY public.bodycontent > (bodycontentid, body, > contentid, bodytypeid) TO stdout; > The dump failed, so I assume this did leave my database in an > inconsistent state. > > > *Attempt 2 - startup manually and let it try recovery* > > I restored my file level backup and started again. > This time I tried to startup manually on the command line to see the > output (I'd done it as a service startup a number of times to > nearly the > same effect too) > > postgres@atlassian:/usr/lib/postgresql/9.3/bin$ ./pg_ctl -D > /etc/postgresql/9.3/main start > server starting > postgres@atlassian:/usr/lib/postgresql/9.3/bin$ 2017-01-27 > 20:36:08 AEDT > LOG: database system was interrupted while in recovery at > 2017-01-27 > 20:13:26 AEDT > 2017-01-27 20:36:08 AEDT HINT: This probably means that some > data is > corrupted and you will have to use the last backup for recovery. > 2017-01-27 20:36:08 AEDT LOG: database system was not properly shut > down; automatic recovery in progress > 2017-01-27 20:36:08 AEDT LOG: redo starts at 5/528B4558 > 2017-01-27 20:36:18 AEDT LOG: redo done at 5/A3FFF9E8 > 2017-01-27 20:36:18 AEDT LOG: last completed transaction was at log > time 2017-01-24 02:08:00.023064+11 > 2017-01-27 23:00:01 AEDT FATAL: the database system is starting up > 2017-01-27 23:00:01 AEDT FATAL: the database system is starting up > > > What does ps ax | grep post show? > > Is the cluster set up to log to a file, if so what does it show? > > Does the system log show anything relevant? > > > I've left it that way for 12 hours, and its still not allowing > connections. > I assume its doing some kind of consistency check? > > > What does it say when you attempt a connection? > > > > Does anyone have any suggestions on what I should be doing to > try and > restore this database? > > - The amount of change is minimal in the DB after 6pm it should be > basically no change overnight. > - The log above seems to suggest it has completed a redo ok, is > that right? > - The last completed transaction time 2017-01-24 > 02:08:00.023064+11 the > log mentions would be fine to use, so loosing even a few hours > before > that would be more than adequate. > > I'm just not clear what the database is doing now, or how I > should be > trying to recover it. > > Any help anyone can suggest would be great! I've given myself this > weekend to attempt to recover more than the last backup, after > that I > need to restore the service for my team to use and suck up the > lost last > day of updates. > > Thanks, > Brian > > > > -- > Adrian Klaver > adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com> > > -- Adrian Klaver adrian.klaver@aklaver.com
Hi,
No, it hasn't changed since the first time I looked at it.
root@atlassian:/home/tbadmin# ps ax | grep post
1364 ? Ss 0:00 /usr/lib/postfix/master
5198 pts/3 S 0:00 su postgres
5221 pts/3 S 0:00 /usr/lib/postgresql/9.3/bin/postgres -D /etc/postgresql/9.3/main
5222 ? Ss 0:10 postgres: startup process recovering 0000000100000005000000A3
11161 pts/4 S+ 0:00 grep --color=auto post
The /var/log/postgres logs haven't got anything new in them.
Also the screen session hasn't got anything new in them, except it adds a line every time I attempt to connect to it using psql. The last attempt below was my connection attempt for this email, a bit over 8 hours after my last connection attempt.
Screen session:
2017-01-28 12:38:35 AEDT FATAL: the database system is starting up
2017-01-28 23:00:01 AEDT FATAL: the database system is starting up
2017-01-28 23:00:01 AEDT FATAL: the database system is starting up
2017-01-29 07:14:00 AEDT FATAL: the database system is starting up
I also still can't connect.
postgres@atlassian:/home/tbadmin$ psql
psql: FATAL: the database system is starting up
Brian Mills
CTO

Mob: 0410660003
Level 1 | 600 Chapel Street | South Yarra| VIC | 3141 | Australia
On 28 January 2017 at 16:08, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 01/27/2017 05:40 PM, Brian Mills wrote:First of all, Thank you for your time to assist me learning. I really
appreciate it.
root# ps ax | grep post
1364 ? Ss 0:00 /usr/lib/postfix/master
5198 pts/3 S 0:00 su postgres
5221 pts/3 S 0:00 /usr/lib/postgresql/9.3/bin/postgres -D
/etc/postgresql/9.3/main
5222 ? Ss 0:10 postgres: startup process recovering
0000000100000005000000A3
7930 pts/4 S+ 0:00 grep --color=auto post
So if you check back does the recovering XXXXXXXX part change?
If so Postgres is walking through the WAL files as it should.
Its a single machine postgres database server, so I'm assuming there is
no cluster log. If there is one, where would I look for it?
The only log in /var/log/postgres is postgresql-9.3-main.log
That would be it. A single Postgres instance has multiple databases in it, by default it starts with template0, template1 and postgres databases. Then add whatever databases you created and you have a cluster of databases.which shows (tail):
2017-01-27 20:27:01 AEDT LOG: database system was shut down at
2017-01-27 20:26:29 AEDT
2017-01-27 20:27:01 AEDT LOG: MultiXact member wraparound protections
are now enabled
2017-01-27 20:27:01 AEDT LOG: autovacuum launcher started
2017-01-27 20:27:01 AEDT LOG: database system is ready to accept
connections
2017-01-27 20:27:02 AEDT LOG: incomplete startup packet
2017-01-27 20:28:54 AEDT ERROR: unexpected chunk size 104 (expected
1996) in chunk 3 of 22 for toast value 48862 in pg_toast_20028
2017-01-27 20:28:54 AEDT STATEMENT: COPY public.bodycontent
(bodycontentid, body, contentid, bodytypeid) TO stdout;
2017-01-27 20:30:13 AEDT LOG: received fast shutdown request
2017-01-27 20:30:13 AEDT LOG: aborting any active transactions
2017-01-27 20:30:13 AEDT LOG: autovacuum launcher shutting down
2017-01-27 20:30:13 AEDT LOG: shutting down
2017-01-27 20:30:13 AEDT LOG: database system is shut down
That would be your Attempt 1 log.
I ran the screen utility so I could leave the db started using
the pg_ctl command. The later logs in that session have not progressed,
its last entry is still
2017-01-27 23:00:01 AEDT FATAL: the database system is starting up
Which is still later datetime than the /var/log/postgres... log.
So it is just logging to stdout and not to the log file.
Connection attempt shows the same log
postgres@atlassian:/home/myuser$ psql
psql: FATAL: the database system is starting up
Nothing in the syslog that seems connected.
*Brian Mills*
CTO
*Mob: *0410660003 <tel:0410660003>
*Melbourne* 03 9012 3460 <tel:03%209012%203460> or 03 8376 6327
<tel:03%208376%206327> *|* * **Sydney* 02 8064 3600
<tel:02%208064%203600> *|* *Brisbane* 07 3173 1570 <tel:07%203173%201570>
Level 1 *|* 600 Chapel Street *|* South
Yarra*|* VIC *|* 3141 *|* Australia
<https://www.facebook.com/TryBooking/> <https://twitter.com/trybookin g> <https://www.linkedin.com/comp any/trybooking-com>
On 28 January 2017 at 12:05, Adrian Klaver <adrian.klaver@aklaver.comadrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.<mailto:adrian.klaver@aklaver.com>> wrote:
On 01/27/2017 01:31 PM, Brian Mills wrote:
Hi,
I have a Atlassian Confluence Wiki that depends on postgres, but I
haven't much experience with postgres other than for this purpose.
A few days ago, the hard disk filled, so all services stopped
working.
When the admin realised this he increased the disk size (its in
a cloud,
so that was easy to do) however I think the way it shutdown left
Postgres in an inconsistent state for some reason.
Now when I start it up I get this message in the log over and
over again:
"FATAL: the database system is starting up"
I have a backup, which I have successfully recovered, but it is
24 hours
old, the next backup was the cause of the disk filling. So I'm using
this as exercise in learning a bit more about postgres.
I did some research and found a number of options. I took a file
level
backup with the service not running then tried 2 things. I
haven't found
much else on what to do though.
*Attempt 1 - Reset Log *
It sounded like this shouldn't be my first option (it wasn't)
but it did
sound like what I needed to do.
I ran this command
./pg_resetxlog /var/lib/postgresql/9.3/main -f
It worked a treat, the database did startup ok.
However when I tried to dump the DB:
root@atlassian:/home/myuser# sudo -u postgres pg_dump confluencedb >
$now-confluencedb.sql
pg_dump: Dumping the contents of table "bodycontent" failed:
PQgetResult() failed.
pg_dump: Error message from server: ERROR: unexpected chunk
size 104
(expected 1996) in chunk 3 of 22 for toast value 48862 in
pg_toast_20028
pg_dump: The command was: COPY public.bodycontent
(bodycontentid, body,
contentid, bodytypeid) TO stdout;
The dump failed, so I assume this did leave my database in an
inconsistent state.
*Attempt 2 - startup manually and let it try recovery*
I restored my file level backup and started again.
This time I tried to startup manually on the command line to see the
output (I'd done it as a service startup a number of times to
nearly the
same effect too)
postgres@atlassian:/usr/lib/postgresql/9.3/bin$ ./pg_ctl -D
/etc/postgresql/9.3/main start
server starting
postgres@atlassian:/usr/lib/postgresql/9.3/bin$ 2017-01-27
20:36:08 AEDT
LOG: database system was interrupted while in recovery at
2017-01-27
20:13:26 AEDT
2017-01-27 20:36:08 AEDT HINT: This probably means that some
data is
corrupted and you will have to use the last backup for recovery.
2017-01-27 20:36:08 AEDT LOG: database system was not properly shut
down; automatic recovery in progress
2017-01-27 20:36:08 AEDT LOG: redo starts at 5/528B4558
2017-01-27 20:36:18 AEDT LOG: redo done at 5/A3FFF9E8
2017-01-27 20:36:18 AEDT LOG: last completed transaction was at log
time 2017-01-24 02:08:00.023064+11
2017-01-27 23:00:01 AEDT FATAL: the database system is starting up
2017-01-27 23:00:01 AEDT FATAL: the database system is starting up
What does ps ax | grep post show?
Is the cluster set up to log to a file, if so what does it show?
Does the system log show anything relevant?
I've left it that way for 12 hours, and its still not allowing
connections.
I assume its doing some kind of consistency check?
What does it say when you attempt a connection?
Does anyone have any suggestions on what I should be doing to
try and
restore this database?
- The amount of change is minimal in the DB after 6pm it should be
basically no change overnight.
- The log above seems to suggest it has completed a redo ok, is
that right?
- The last completed transaction time 2017-01-24
02:08:00.023064+11 the
log mentions would be fine to use, so loosing even a few hours
before
that would be more than adequate.
I'm just not clear what the database is doing now, or how I
should be
trying to recover it.
Any help anyone can suggest would be great! I've given myself this
weekend to attempt to recover more than the last backup, after
that I
need to restore the service for my team to use and suck up the
lost last
day of updates.
Thanks,
Brian
--
Adrian Klavercom>
--
Adrian Klaver
adrian.klaver@aklaver.com
Hello Brian, On Sun, 2017-01-29 at 07:16 +1100, Brian Mills wrote: > Hi, > > No, it hasn't changed since the first time I looked at it. > > root@atlassian:/home/tbadmin# ps ax | grep post > 1364 ? Ss 0:00 /usr/lib/postfix/master > 5198 pts/3 S 0:00 su postgres > 5221 pts/3 S 0:00 /usr/lib/postgresql/9.3/bin/postgres -D > /etc/postgresql/9.3/main > 5222 ? Ss 0:10 postgres: startup process recovering > 0000000100000005000000A3 > 11161 pts/4 S+ 0:00 grep --color=auto post > Does this WAL file exist "0000000100000005000000A3"? Cheers, Rob
Yes, its the last one in the directory, pg_xlog directory
...more files...
-rw------- 1 postgres postgres 16777216 Jan 21 10:05 0000000100000005000000A1
-rw------- 1 postgres postgres 16777216 Jan 22 21:29 0000000100000005000000A2
-rw------- 1 postgres postgres 16777216 Jan 24 02:08 0000000100000005000000A3
Brian Mills
CTO

Mob: 0410660003
Level 1 | 600 Chapel Street | South Yarra| VIC | 3141 | Australia
On 29 January 2017 at 08:18, rob stone <floriparob@gmail.com> wrote:
Hello Brian,
On Sun, 2017-01-29 at 07:16 +1100, Brian Mills wrote:
> Hi,
>
> No, it hasn't changed since the first time I looked at it.
>
> root@atlassian:/home/tbadmin# ps ax | grep post
> 1364 ? Ss 0:00 /usr/lib/postfix/master
> 5198 pts/3 S 0:00 su postgres
> 5221 pts/3 S 0:00 /usr/lib/postgresql/9.3/bin/postgres -D
> /etc/postgresql/9.3/main
> 5222 ? Ss 0:10 postgres: startup process recovering
> 0000000100000005000000A3
> 11161 pts/4 S+ 0:00 grep --color=auto post
>
Does this WAL file exist "0000000100000005000000A3"?
Cheers,
Rob
On 01/28/2017 01:55 PM, Brian Mills wrote: > Yes, its the last one in the directory, pg_xlog directory > > ...more files... > -rw------- 1 postgres postgres 16777216 Jan 21 10:05 > 0000000100000005000000A1 > -rw------- 1 postgres postgres 16777216 Jan 22 21:29 > 0000000100000005000000A2 > -rw------- 1 postgres postgres 16777216 Jan 24 02:08 > 0000000100000005000000A3 Best guess is the last WAL is not complete. From your original post: "Attempt 2 - startup manually and let it try recovery I restored my file level backup and started again. " How was the file level backup done? > > > *Brian Mills* > CTO > > > *Mob: *0410660003 <tel:0410660003> > *Melbourne* 03 9012 3460 <tel:03%209012%203460> or 03 8376 6327 > <tel:03%208376%206327> *|* * **Sydney* 02 8064 3600 > <tel:02%208064%203600> *|* *Brisbane* 07 3173 1570 <tel:07%203173%201570> > Level 1 *|* 600 Chapel Street *|* South > Yarra*|* VIC *|* 3141 *|* Australia > > <https://www.facebook.com/TryBooking/> <https://twitter.com/trybooking> <https://www.linkedin.com/company/trybooking-com> > > On 29 January 2017 at 08:18, rob stone <floriparob@gmail.com > <mailto:floriparob@gmail.com>> wrote: > > Hello Brian, > On Sun, 2017-01-29 at 07:16 +1100, Brian Mills wrote: > > Hi, > > > > No, it hasn't changed since the first time I looked at it. > > > > root@atlassian:/home/tbadmin# ps ax | grep post > > 1364 ? Ss 0:00 /usr/lib/postfix/master > > 5198 pts/3 S 0:00 su postgres > > 5221 pts/3 S 0:00 /usr/lib/postgresql/9.3/bin/postgres -D > > /etc/postgresql/9.3/main > > 5222 ? Ss 0:10 postgres: startup process recovering > > 0000000100000005000000A3 > > 11161 pts/4 S+ 0:00 grep --color=auto post > > > > > Does this WAL file exist "0000000100000005000000A3"? > > Cheers, > Rob > > -- Adrian Klaver adrian.klaver@aklaver.com
I have a consistent sql dump from 24 hour previous.
The file level backup was done with rsync -a of full data directory after the issue occurred so could reset as I learned.
Brian
On Sun, 29 Jan 2017 at 9:18 am, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 01/28/2017 01:55 PM, Brian Mills wrote:
> Yes, its the last one in the directory, pg_xlog directory
>
> ...more files...
> -rw------- 1 postgres postgres 16777216 Jan 21 10:05
> 0000000100000005000000A1
> -rw------- 1 postgres postgres 16777216 Jan 22 21:29
> 0000000100000005000000A2
> -rw------- 1 postgres postgres 16777216 Jan 24 02:08
> 0000000100000005000000A3
Best guess is the last WAL is not complete.
From your original post:
"Attempt 2 - startup manually and let it try recovery
I restored my file level backup and started again. "
How was the file level backup done?
>
>
> *Brian Mills*
> CTO
>
>
> *Mob: *0410660003 <tel:0410660003>
> *Melbourne* 03 9012 3460 <tel:03%209012%203460> or 03 8376 6327
> <tel:03%208376%206327> *|* * **Sydney* 02 8064 3600
> <tel:02%208064%203600> *|* *Brisbane* 07 3173 1570 <tel:07%203173%201570>
> Level 1 *|* 600 Chapel Street *|* South
> Yarra*|* VIC *|* 3141 *|* Australia
>
> <https://www.facebook.com/TryBooking/> <https://twitter.com/trybooking> <https://www.linkedin.com/company/trybooking-com>
>
> On 29 January 2017 at 08:18, rob stone <floriparob@gmail.com
> <mailto:floriparob@gmail.com>> wrote:
>
> Hello Brian,
> On Sun, 2017-01-29 at 07:16 +1100, Brian Mills wrote:
> > Hi,
> >
> > No, it hasn't changed since the first time I looked at it.
> >
> > root@atlassian:/home/tbadmin# ps ax | grep post
> > 1364 ? Ss 0:00 /usr/lib/postfix/master
> > 5198 pts/3 S 0:00 su postgres
> > 5221 pts/3 S 0:00 /usr/lib/postgresql/9.3/bin/postgres -D
> > /etc/postgresql/9.3/main
> > 5222 ? Ss 0:10 postgres: startup process recovering
> > 0000000100000005000000A3
> > 11161 pts/4 S+ 0:00 grep --color=auto post
> >
>
>
> Does this WAL file exist "0000000100000005000000A3"?
>
> Cheers,
> Rob
>
>
--
Adrian Klaver
adrian.klaver@aklaver.com
--
Brian Mills
CTO

Mob: 0410660003
Level 1 | 600 Chapel Street | South Yarra| VIC | 3141 | Australia
I presume this is a binary log file for the database.
Am I able to recover to a point in time using this log file?
What I would do in SQL Server would be recover to a point in time, say a bit before the last completed transaction time the log mentions, then take a backup. Is that possible in postgres?
The log mentions this:
2017-01-27 20:36:18 AEDT LOG: last completed transaction was at log time 2017-01-24 02:08:00.023064+11
(which is moments before, or possibly as the disk filled up doing a db backup dump)
Brian Mills
CTO

Mob: 0410660003
Level 1 | 600 Chapel Street | South Yarra| VIC | 3141 | Australia
On 29 January 2017 at 12:58, Brian Mills <brian@trybooking.com> wrote:
I have a consistent sql dump from 24 hour previous.The file level backup was done with rsync -a of full data directory after the issue occurred so could reset as I learned.BrianOn Sun, 29 Jan 2017 at 9:18 am, Adrian Klaver <adrian.klaver@aklaver.com> wrote:On 01/28/2017 01:55 PM, Brian Mills wrote:
> Yes, its the last one in the directory, pg_xlog directory
>
> ...more files...
> -rw------- 1 postgres postgres 16777216 Jan 21 10:05
> 0000000100000005000000A1
> -rw------- 1 postgres postgres 16777216 Jan 22 21:29
> 0000000100000005000000A2
> -rw------- 1 postgres postgres 16777216 Jan 24 02:08
> 0000000100000005000000A3
Best guess is the last WAL is not complete.
From your original post:
"Attempt 2 - startup manually and let it try recovery
I restored my file level backup and started again. "
How was the file level backup done?
>
>
> *Brian Mills*
> CTO
>
>
> *Mob: *0410660003 <tel:0410660003>
> *Melbourne* 03 9012 3460 <tel:03%209012%203460> or 03 8376 6327
> <tel:03%208376%206327> *|* * **Sydney* 02 8064 3600
> <tel:02%208064%203600> *|* *Brisbane* 07 3173 1570 <tel:07%203173%201570>
> Level 1 *|* 600 Chapel Street *|* South
> Yarra*|* VIC *|* 3141 *|* Australia
>
> <https://www.facebook.com/TryBooking/> <https://twitter.com/ trybooking> <https://www.linkedin.com/ company/trybooking-com>
>
> On 29 January 2017 at 08:18, rob stone <floriparob@gmail.com
> <mailto:floriparob@gmail.com>> wrote:
>
> Hello Brian,
> On Sun, 2017-01-29 at 07:16 +1100, Brian Mills wrote:
> > Hi,
> >
> > No, it hasn't changed since the first time I looked at it.
> >
> > root@atlassian:/home/tbadmin# ps ax | grep post
> > 1364 ? Ss 0:00 /usr/lib/postfix/master
> > 5198 pts/3 S 0:00 su postgres
> > 5221 pts/3 S 0:00 /usr/lib/postgresql/9.3/bin/postgres -D
> > /etc/postgresql/9.3/main
> > 5222 ? Ss 0:10 postgres: startup process recovering
> > 0000000100000005000000A3
> > 11161 pts/4 S+ 0:00 grep --color=auto post
> >
>
>
> Does this WAL file exist "0000000100000005000000A3"?
>
> Cheers,
> Rob
>
>
--
Adrian Klaver
adrian.klaver@aklaver.com--Brian MillsCTOMob: 0410660003Level 1 | 600 Chapel Street | South Yarra| VIC | 3141 | Australia
On 01/28/2017 11:23 PM, Brian Mills wrote: > I presume this is a binary log file for the database. > > Am I able to recover to a point in time using this log file? > > What I would do in SQL Server would be recover to a point in time, say a > bit before the last completed transaction time the log mentions, then > take a backup. Is that possible in postgres? Yes, though I am not sure you have the setup to do it. I would suggest reading the below to see how much of it applies: https://www.postgresql.org/docs/9.3/static/continuous-archiving.html In particular: 24.3.4. Recovering Using a Continuous Archive Backup https://www.postgresql.org/docs/9.3/static/recovery-target-settings.html > > The log mentions this: > 2017-01-27 20:36:18 AEDT LOG: last completed transaction was at log > time 2017-01-24 02:08:00.023064+11 > > (which is moments before, or possibly as the disk filled up doing a db > backup dump) > > *Brian Mills* > CTO > > > *Mob: *0410660003 <tel:0410660003> > *Melbourne* 03 9012 3460 <tel:03%209012%203460> or 03 8376 6327 > <tel:03%208376%206327> *|* * **Sydney* 02 8064 3600 > <tel:02%208064%203600> *|* *Brisbane* 07 3173 1570 <tel:07%203173%201570> > Level 1 *|* 600 Chapel Street *|* South > Yarra*|* VIC *|* 3141 *|* Australia > > <https://www.facebook.com/TryBooking/> <https://twitter.com/trybooking> <https://www.linkedin.com/company/trybooking-com> > > On 29 January 2017 at 12:58, Brian Mills <brian@trybooking.com > <mailto:brian@trybooking.com>> wrote: > > I have a consistent sql dump from 24 hour previous. > > The file level backup was done with rsync -a of full data directory > after the issue occurred so could reset as I learned. > > Brian > > > On Sun, 29 Jan 2017 at 9:18 am, Adrian Klaver > <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>> wrote: > > On 01/28/2017 01:55 PM, Brian Mills wrote: > > Yes, its the last one in the directory, pg_xlog directory > > > > ...more files... > > -rw------- 1 postgres postgres 16777216 Jan 21 10:05 > > 0000000100000005000000A1 > > -rw------- 1 postgres postgres 16777216 Jan 22 21:29 > > 0000000100000005000000A2 > > -rw------- 1 postgres postgres 16777216 Jan 24 02:08 > > 0000000100000005000000A3 > > Best guess is the last WAL is not complete. > > From your original post: > "Attempt 2 - startup manually and let it try recovery > > I restored my file level backup and started again. " > > How was the file level backup done? > > > > > > > *Brian Mills* > > CTO > > > > > > *Mob: *0410660003 <tel:0410%20660%20003> <tel:0410660003 > <tel:0410%20660%20003>> > > *Melbourne* 03 9012 3460 <tel:(03)%209012%203460> > <tel:03%209012%203460> or 03 8376 6327 <tel:(03)%208376%206327> > > <tel:03%208376%206327> *|* * **Sydney* 02 8064 3600 > <tel:(02)%208064%203600> > > <tel:02%208064%203600> *|* *Brisbane* 07 3173 1570 > <tel:(07)%203173%201570> <tel:07%203173%201570> > > Level 1 *|* 600 Chapel Street *|* South > > Yarra*|* VIC *|* 3141 *|* Australia > > > > <https://www.facebook.com/TryBooking/ > <https://www.facebook.com/TryBooking/>> > <https://twitter.com/trybooking > <https://twitter.com/trybooking>> > <https://www.linkedin.com/company/trybooking-com > <https://www.linkedin.com/company/trybooking-com>> > > > > On 29 January 2017 at 08:18, rob stone <floriparob@gmail.com > <mailto:floriparob@gmail.com> > > <mailto:floriparob@gmail.com <mailto:floriparob@gmail.com>>> > wrote: > > > > Hello Brian, > > On Sun, 2017-01-29 at 07:16 +1100, Brian Mills wrote: > > > Hi, > > > > > > No, it hasn't changed since the first time I looked at it. > > > > > > root@atlassian:/home/tbadmin# ps ax | grep post > > > 1364 ? Ss 0:00 /usr/lib/postfix/master > > > 5198 pts/3 S 0:00 su postgres > > > 5221 pts/3 S 0:00 > /usr/lib/postgresql/9.3/bin/postgres -D > > > /etc/postgresql/9.3/main > > > 5222 ? Ss 0:10 postgres: startup process > recovering > > > 0000000100000005000000A3 > > > 11161 pts/4 S+ 0:00 grep --color=auto post > > > > > > > > > Does this WAL file exist "0000000100000005000000A3"? > > > > Cheers, > > Rob > > > > > > > -- > Adrian Klaver > adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com> > > -- > *Brian Mills* > CTO > > > *Mob: *0410660003 > *Melbourne* 03 9012 3460 or 03 8376 6327 *|* * **Sydney* 02 8064 > 3600 *|* *Brisbane* 07 3173 1570 > Level 1 *|* 600 Chapel Street *|* South > Yarra*|* VIC *|* 3141 *|* Australia > > <https://www.facebook.com/TryBooking/> <https://twitter.com/trybooking> <https://www.linkedin.com/company/trybooking-com> > > -- Adrian Klaver adrian.klaver@aklaver.com
On 01/28/2017 11:23 PM, Brian Mills wrote: > I presume this is a binary log file for the database. > > Am I able to recover to a point in time using this log file? > > What I would do in SQL Server would be recover to a point in time, say a > bit before the last completed transaction time the log mentions, then > take a backup. Is that possible in postgres? Had another thought. If I remember correctly you are using this as an exercise in Postgres recovery. If that is indeed the case you might try: 1) Stop the Postgres instance you have running now. 2) Move the WAL file that Postgres is currently stalled on, 0000000100000005000000A3, out of pg_xlog. 3) Restart the Postgres instance. My guess it it will not bring it back to the exact point you want, but close. You can get a general idea by running(before and after removing the WAL), as the postgres user: pg_controldata -D /etc/postgresql/9.3/main > > The log mentions this: > 2017-01-27 20:36:18 AEDT LOG: last completed transaction was at log > time 2017-01-24 02:08:00.023064+11 > > (which is moments before, or possibly as the disk filled up doing a db > backup dump) > > *Brian Mills* > CTO > -- Adrian Klaver adrian.klaver@aklaver.com
OK. I think I'm on to something here, I first reset back to my file level backup.
I created a recovery.conf file in the root of the data directory like this:
---------------------
restore_command = 'cp /mnt/archive/%f %p'
recovery_target_time = '2017-01-24 02:08:00.023064+11'
recovery_target_inclusive = 'true'
pause_at_recovery_target = 'false'
---------------------
Note, the archive directory had no files in it, I left the WAL files in the pg_xlog directory.
Then I started up the database again:
postgres@atlassian:~/9.3/main$ /usr/lib/postgresql/9.3/bin/ pg_ctl -D /etc/postgresql/9.3/main start
server starting
postgres@atlassian:~/9.3/main$ 2017-01-30 10:07:28 AEDT LOG: database system was interrupted while in recovery at 2017-01-27 20:13:26 AEDT
2017-01-30 10:07:28 AEDT HINT: This probably means that some data is corrupted and you will have to use the last backup for recovery.
2017-01-30 10:07:28 AEDT LOG: starting point-in-time recovery to 2017-01-24 02:08:00.023064+11
2017-01-30 10:07:28 AEDT LOG: database system was not properly shut down; automatic recovery in progress
2017-01-30 10:07:28 AEDT WARNING: WAL was generated with wal_level=minimal, data may be missing
2017-01-30 10:07:28 AEDT HINT: This happens if you temporarily set wal_level=minimal without taking a new base backup.
2017-01-30 10:07:28 AEDT LOG: redo starts at 5/528B4558
2017-01-30 10:07:40 AEDT LOG: consistent recovery state reached at 5/A3FFFA30
cp: cannot stat ‘/mnt/archive/ 0000000100000005000000A3’: No such file or directory
cp: cannot stat ‘/mnt/archive/ 0000000100000005000000A4’: No such file or directory
2017-01-30 10:07:40 AEDT LOG: redo done at 5/A3FFF9E8
2017-01-30 10:07:40 AEDT LOG: last completed transaction was at log time 2017-01-24 02:08:00.023064+11
cp: cannot stat ‘/mnt/archive/ 0000000100000005000000A3’: No such file or directory
cp: cannot stat ‘/mnt/archive/00000002. history’: No such file or directory
2017-01-30 10:07:40 AEDT LOG: selected new timeline ID: 2
cp: cannot stat ‘/mnt/archive/00000001. history’: No such file or directory
2017-01-30 10:07:40 AEDT LOG: archive recovery complete
2017-01-30 10:08:55 AEDT FATAL: the database system is starting up
2017-01-30 10:08:57 AEDT FATAL: the database system is starting up
This time it looks like it has actually finished the startup and recovery. However I think I might have something wrong about the process.
Any thoughts on the above log?
Brian Mills
CTO

Mob: 0410660003
Level 1 | 600 Chapel Street | South Yarra| VIC | 3141 | Australia
On 30 January 2017 at 04:49, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 01/28/2017 11:23 PM, Brian Mills wrote:I presume this is a binary log file for the database.
Am I able to recover to a point in time using this log file?
What I would do in SQL Server would be recover to a point in time, say a
bit before the last completed transaction time the log mentions, then
take a backup. Is that possible in postgres?
Had another thought. If I remember correctly you are using this as an exercise in Postgres recovery. If that is indeed the case you might try:
1) Stop the Postgres instance you have running now.
2) Move the WAL file that Postgres is currently stalled on, 0000000100000005000000A3, out of pg_xlog.
3) Restart the Postgres instance.
My guess it it will not bring it back to the exact point you want, but close. You can get a general idea by running(before and after removing the WAL), as the postgres user:
pg_controldata -D /etc/postgresql/9.3/main
The log mentions this:
2017-01-27 20:36:18 AEDT LOG: last completed transaction was at log
time 2017-01-24 02:08:00.023064+11
(which is moments before, or possibly as the disk filled up doing a db
backup dump)
*Brian Mills*
CTO
--
Adrian Klaver
adrian.klaver@aklaver.com
On 01/29/2017 03:12 PM, Brian Mills wrote: > OK. I think I'm on to something here, I first reset back to my file > level backup. > I created a recovery.conf file in the root of the data directory like this: > --------------------- > restore_command = 'cp /mnt/archive/%f %p' > recovery_target_time = '2017-01-24 02:08:00.023064+11' > recovery_target_inclusive = 'true' > pause_at_recovery_target = 'false' > --------------------- > Note, the archive directory had no files in it, I left the WAL files in > the pg_xlog directory. > > Then I started up the database again: > postgres@atlassian:~/9.3/main$ /usr/lib/postgresql/9.3/bin/pg_ctl -D > /etc/postgresql/9.3/main start > server starting > postgres@atlassian:~/9.3/main$ 2017-01-30 10:07:28 AEDT LOG: database > system was interrupted while in recovery at 2017-01-27 20:13:26 AEDT > 2017-01-30 10:07:28 AEDT HINT: This probably means that some data is > corrupted and you will have to use the last backup for recovery. > 2017-01-30 10:07:28 AEDT LOG: starting point-in-time recovery to > 2017-01-24 02:08:00.023064+11 > 2017-01-30 10:07:28 AEDT LOG: database system was not properly shut > down; automatic recovery in progress > 2017-01-30 10:07:28 AEDT WARNING: WAL was generated with > wal_level=minimal, data may be missing This would be a problem: https://www.postgresql.org/docs/9.3/static/runtime-config-wal.html#RUNTIME-CONFIG-WAL-SETTINGS "But minimal WAL does not contain enough information to reconstruct the data from a base backup and the WAL logs, so either archive or hot_standby level must be used to enable WAL archiving (archive_mode) and streaming replication. " > 2017-01-30 10:07:28 AEDT HINT: This happens if you temporarily set > wal_level=minimal without taking a new base backup. > 2017-01-30 10:07:28 AEDT LOG: redo starts at 5/528B4558 > 2017-01-30 10:07:40 AEDT LOG: consistent recovery state reached at > 5/A3FFFA30 > cp: cannot stat ‘/mnt/archive/0000000100000005000000A3’: No such file or > directory > cp: cannot stat ‘/mnt/archive/0000000100000005000000A4’: No such file or > directory > 2017-01-30 10:07:40 AEDT LOG: redo done at 5/A3FFF9E8 > 2017-01-30 10:07:40 AEDT LOG: last completed transaction was at log > time 2017-01-24 02:08:00.023064+11 > cp: cannot stat ‘/mnt/archive/0000000100000005000000A3’: No such file or > directory > cp: cannot stat ‘/mnt/archive/00000002.history’: No such file or directory > 2017-01-30 10:07:40 AEDT LOG: selected new timeline ID: 2 > cp: cannot stat ‘/mnt/archive/00000001.history’: No such file or directory > 2017-01-30 10:07:40 AEDT LOG: archive recovery complete > 2017-01-30 10:08:55 AEDT FATAL: the database system is starting up > 2017-01-30 10:08:57 AEDT FATAL: the database system is starting up > > This time it looks like it has actually finished the startup and > recovery. However I think I might have something wrong about the process. > Any thoughts on the above log? See above. > > > *Brian Mills* > CTO > > > *Mob: *0410660003 <tel:0410660003> > *Melbourne* 03 9012 3460 <tel:03%209012%203460> or 03 8376 6327 > <tel:03%208376%206327> *|* * **Sydney* 02 8064 3600 > <tel:02%208064%203600> *|* *Brisbane* 07 3173 1570 <tel:07%203173%201570> > Level 1 *|* 600 Chapel Street *|* South > Yarra*|* VIC *|* 3141 *|* Australia > > <https://www.facebook.com/TryBooking/> <https://twitter.com/trybooking> <https://www.linkedin.com/company/trybooking-com> > > On 30 January 2017 at 04:49, Adrian Klaver <adrian.klaver@aklaver.com > <mailto:adrian.klaver@aklaver.com>> wrote: > > On 01/28/2017 11:23 PM, Brian Mills wrote: > > I presume this is a binary log file for the database. > > Am I able to recover to a point in time using this log file? > > What I would do in SQL Server would be recover to a point in > time, say a > bit before the last completed transaction time the log mentions, > then > take a backup. Is that possible in postgres? > > > Had another thought. If I remember correctly you are using this as > an exercise in Postgres recovery. If that is indeed the case you > might try: > > 1) Stop the Postgres instance you have running now. > > 2) Move the WAL file that Postgres is currently stalled on, > 0000000100000005000000A3, out of pg_xlog. > > 3) Restart the Postgres instance. > > My guess it it will not bring it back to the exact point you want, > but close. You can get a general idea by running(before and after > removing the WAL), as the postgres user: > > pg_controldata -D /etc/postgresql/9.3/main > > > The log mentions this: > 2017-01-27 20:36:18 AEDT LOG: last completed transaction was at log > time 2017-01-24 02:08:00.023064+11 > > (which is moments before, or possibly as the disk filled up > doing a db > backup dump) > > *Brian Mills* > CTO > > > > > -- > Adrian Klaver > adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com> > > -- Adrian Klaver adrian.klaver@aklaver.com