Thread: Recovery will take 10 hours

Recovery will take 10 hours

From

Brendan Duddridge

Date:

20 April 2006, 16:30:19

Hi,

We had a database issue today that caused us to have to restore to
our most recent backup. We are using PITR so we have 3120 WAL files
that need to be applied to the database.

After 45 minutes, it has restored only 230 WAL files. At this rate,
it's going to take about 10 hours to restore our database.

Most of the time, the server is not using very much CPU time or I/O
time. So I'm wondering what can be done to speed up the process?

The database is about 20 GB. The WAL files are compressed with gzip
to about 4 MB. Expanded, the WAL files would take 48 GB.

We are using PostgreSQL 8.1.3 on OS X Server 10.4.6 connected to an
XServe RAID. The pg_xlog is on its own separate RAID and so are the
table spaces.

Here's a representative sample of doing iostat:

hulk1:/Library/PostgreSQL admin$ iostat 5
           disk1           disk2           disk0       cpu
   KB/t tps  MB/s   KB/t tps  MB/s   KB/t tps  MB/s  us sy id
19.31 101  1.91  14.39  51  0.71  37.37   4  0.13  15 10 76
   8.00  21  0.16   0.00   0  0.00  90.22   2  0.16   0  2 98
   8.00  32  0.25   0.00   0  0.00   0.00   0  0.00   0  1 98
   8.00  76  0.60   0.00   0  0.00   0.00   0  0.00   0  1 99
   8.00 587  4.59  1024.00   4  4.00   0.00   0  0.00   4  7 88
   8.00 675  5.27  956.27   6  5.60   0.00   0  0.00   6  6 88
11.32 1705 18.84   5.70   1  0.01  16.36   7  0.12   1  6 93
   8.00  79  0.62  1024.00   3  3.20   0.00   0  0.00   2  2 96
   8.00  68  0.53   0.00   0  0.00   0.00   0  0.00   0  2 98
   8.00  76  0.59   0.00   0  0.00   0.00   0  0.00   0  1 99
   8.02  89  0.69   0.00   0  0.00   0.00   0  0.00   1  1 98
   8.00 572  4.47  911.11   4  3.20   0.00   0  0.00   5  5 91
13.53 1227 16.21  781.55   4  3.21  12.14   2  0.03   3  6 90
   8.00  54  0.42   0.00   0  0.00  90.22   2  0.16   1  1 98
   8.00  68  0.53   0.00   0  0.00   0.00   0  0.00   0  1 99
   8.00 461  3.60  1024.00   3  3.20   0.00   0  0.00   3  6 91
   8.00 671  5.24  964.24   7  6.40   0.00   0  0.00   6  8 86
   7.99 248  1.94   0.00   0  0.00   0.00   0  0.00   1  3 96
15.06 1050 15.44  911.11   4  3.20  12.12   3  0.03   2  5 93
19.84 176  3.41   5.70   1  0.01   0.00   0  0.00   0  1 99


disk1 is the RAID volume that has the table spaces on it. disk2 is
the pg_xlog and disk0 is the boot disk.

So you can see the CPU is idle much of the time and the IO only
occurs in short bursts. Each line in the iostat results is 5 seconds
apart.

If there were something we could do to speed up the process, would it
be possible to kill the postgres process, tweak some parameter
somewhere and then start it up again? Or would we have to restore our
base backup again and start over?

How can I make this go faster?


Thanks,

____________________________________________________________________
Brendan Duddridge | CTO | 403-277-5591 x24 |  brendan@clickspace.com

ClickSpace Interactive Inc.
Suite L100, 239 - 10th Ave. SE
Calgary, AB  T2G 0V9

http://www.clickspace.com

Re: Recovery will take 10 hours

From

Tom Lane

Date:

20 April 2006, 17:17:10

Brendan Duddridge <brendan@clickspace.com> writes:
> We had a database issue today that caused us to have to restore to
> our most recent backup. We are using PITR so we have 3120 WAL files
> that need to be applied to the database.
> After 45 minutes, it has restored only 230 WAL files. At this rate,
> it's going to take about 10 hours to restore our database.
> Most of the time, the server is not using very much CPU time or I/O
> time. So I'm wondering what can be done to speed up the process?

That seems a bit odd --- should be eating one or the other, one would
think.  Try strace'ing the recovery process to see what it's doing.

> If there were something we could do to speed up the process, would it
> be possible to kill the postgres process, tweak some parameter
> somewhere and then start it up again? Or would we have to restore our
> base backup again and start over?

You could start it up again, but it'd want to read through all the WAL
it's already looked at, so I'd not recommend this until/unless you're
pretty sure you've fixed the performance issue.  Right at the moment,
I think this is a golden opportunity to study the performance of WAL
recovery --- it's not something we've tried to optimize particularly.

            regards, tom lane

Re: Recovery will take 10 hours

From

Brendan Duddridge

Date:

20 April 2006, 18:13:53

Hi Tom,

Do you mean do a kill -QUIT on the postgres process in order to
generate a stack trace?

Will that affect the currently running process in any bad way? And
where would the output go? stdout?

Thanks,

____________________________________________________________________
Brendan Duddridge | CTO | 403-277-5591 x24 |  brendan@clickspace.com

ClickSpace Interactive Inc.
Suite L100, 239 - 10th Ave. SE
Calgary, AB  T2G 0V9

http://www.clickspace.com

On Apr 20, 2006, at 2:17 PM, Tom Lane wrote:

> Brendan Duddridge <brendan@clickspace.com> writes:
>> We had a database issue today that caused us to have to restore to
>> our most recent backup. We are using PITR so we have 3120 WAL files
>> that need to be applied to the database.
>> After 45 minutes, it has restored only 230 WAL files. At this rate,
>> it's going to take about 10 hours to restore our database.
>> Most of the time, the server is not using very much CPU time or I/O
>> time. So I'm wondering what can be done to speed up the process?
>
> That seems a bit odd --- should be eating one or the other, one would
> think.  Try strace'ing the recovery process to see what it's doing.
>
>> If there were something we could do to speed up the process, would it
>> be possible to kill the postgres process, tweak some parameter
>> somewhere and then start it up again? Or would we have to restore our
>> base backup again and start over?
>
> You could start it up again, but it'd want to read through all the WAL
> it's already looked at, so I'd not recommend this until/unless you're
> pretty sure you've fixed the performance issue.  Right at the moment,
> I think this is a golden opportunity to study the performance of WAL
> recovery --- it's not something we've tried to optimize particularly.
>
>             regards, tom lane
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match
>

Re: Recovery will take 10 hours

From

Tom Lane

Date:

20 April 2006, 18:20:07

Brendan Duddridge <brendan@clickspace.com> writes:
> Do you mean do a kill -QUIT on the postgres process in order to
> generate a stack trace?

Not at all!  I'm talking about tracing the kernel calls it's making.
Depending on your platform, the tool for this is called strace,
ktrace, truss, or maybe even just trace.  With strace you'd do
something like

    strace -p PID-of-process 2>outfile
    ... wait 30 sec or so ...
    control-C

Not sure about the APIs for the others but they're probably roughly
similar ... read the man page ...

            regards, tom lane

Re: Recovery will take 10 hours

From

"Luke Lonergan"

Date:

20 April 2006, 18:21:38

Brendan,

strace –p <pid> -c

Then do a “CTRL-C” after a minute to get the stats of system calls.

- Luke

On 4/20/06 2:13 PM, "Brendan Duddridge" <brendan@clickspace.com> wrote:

Hi Tom,

Do you mean do a kill -QUIT on the postgres process in order to
generate a stack trace?

Will that affect the currently running process in any bad way? And
where would the output go? stdout?

Thanks,

____________________________________________________________________
Brendan Duddridge | CTO | 403-277-5591 x24 | brendan@clickspace.com

ClickSpace Interactive Inc.
Suite L100, 239 - 10th Ave. SE
Calgary, AB T2G 0V9

http://www.clickspace.com

On Apr 20, 2006, at 2:17 PM, Tom Lane wrote:

> Brendan Duddridge <brendan@clickspace.com> writes:
>> We had a database issue today that caused us to have to restore to
>> our most recent backup. We are using PITR so we have 3120 WAL files
>> that need to be applied to the database.
>> After 45 minutes, it has restored only 230 WAL files. At this rate,
>> it's going to take about 10 hours to restore our database.
>> Most of the time, the server is not using very much CPU time or I/O
>> time. So I'm wondering what can be done to speed up the process?
>
> That seems a bit odd --- should be eating one or the other, one would
> think. Try strace'ing the recovery process to see what it's doing.
>
>> If there were something we could do to speed up the process, would it
>> be possible to kill the postgres process, tweak some parameter
>> somewhere and then start it up again? Or would we have to restore our
>> base backup again and start over?
>
> You could start it up again, but it'd want to read through all the WAL
> it's already looked at, so I'd not recommend this until/unless you're
> pretty sure you've fixed the performance issue. Right at the moment,
> I think this is a golden opportunity to study the performance of WAL
> recovery --- it's not something we've tried to optimize particularly.
>
>                       regards, tom lane
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match
>

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
       subscribe-nomail command to majordomo@postgresql.org so that your
       message can get through to the mailing list cleanly

Re: Recovery will take 10 hours

From

Jeff Frost

Date:

20 April 2006, 18:47:55

On Thu, 20 Apr 2006, Brendan Duddridge wrote:

> Hi,
>
> We had a database issue today that caused us to have to restore to our most
> recent backup. We are using PITR so we have 3120 WAL files that need to be
> applied to the database.
>
> After 45 minutes, it has restored only 230 WAL files. At this rate, it's
> going to take about 10 hours to restore our database.
>
> Most of the time, the server is not using very much CPU time or I/O time. So
> I'm wondering what can be done to speed up the process?

Brendan,

Where are the WAL files being stored and how are they being read back?

--
Jeff Frost, Owner     <jeff@frostconsultingllc.com>
Frost Consulting, LLC     http://www.frostconsultingllc.com/
Phone: 650-780-7908    FAX: 650-649-1954

Re: Recovery will take 10 hours

From

Brendan Duddridge

Date:

20 April 2006, 20:11:14

Hi Jeff,

The WAL files are stored on a separate server and accessed through an
NFS mount located at /wal_archive.

However, the restore failed about 5 hours in after we got this error:

[2006-04-20 16:41:28 MDT] LOG: restored log file
"000000010000018F00000034" from archive
[2006-04-20 16:41:35 MDT] LOG: restored log file
"000000010000018F00000035" from archive
[2006-04-20 16:41:38 MDT] LOG: restored log file
"000000010000018F00000036" from archive
sh: line 1: /wal_archive/000000010000018F00000037.gz: No such file or
directory
[2006-04-20 16:41:46 MDT] LOG: could not open file "pg_xlog/
000000010000018F00000037" (log file 399, segment 55): No such file or
directory
[2006-04-20 16:41:46 MDT] LOG: redo done at 18F/36FFF254
sh: line 1: /wal_archive/000000010000018F00000036.gz: No such file or
directory
[2006-04-20 16:41:46 MDT] PANIC: could not open file "pg_xlog/
000000010000018F00000036" (log file 399, segment 54): No such file or
directory
[2006-04-20 16:41:46 MDT] LOG: startup process (PID 9190) was
terminated by signal 6
[2006-04-20 16:41:46 MDT] LOG: aborting startup due to startup
process failure
[2006-04-20 16:41:46 MDT] LOG: logger shutting down



The /wal_archive/000000010000018F00000037.gz is there accessible on
the NFS mount.

Is there a way to continue the restore process from where it left off?

Thanks,

____________________________________________________________________
Brendan Duddridge | CTO | 403-277-5591 x24 |  brendan@clickspace.com

ClickSpace Interactive Inc.
Suite L100, 239 - 10th Ave. SE
Calgary, AB  T2G 0V9

http://www.clickspace.com

On Apr 20, 2006, at 3:19 PM, Jeff Frost wrote:

> On Thu, 20 Apr 2006, Brendan Duddridge wrote:
>
>> Hi,
>>
>> We had a database issue today that caused us to have to restore to
>> our most recent backup. We are using PITR so we have 3120 WAL
>> files that need to be applied to the database.
>>
>> After 45 minutes, it has restored only 230 WAL files. At this
>> rate, it's going to take about 10 hours to restore our database.
>>
>> Most of the time, the server is not using very much CPU time or I/
>> O time. So I'm wondering what can be done to speed up the process?
>
> Brendan,
>
> Where are the WAL files being stored and how are they being read back?
>
> --
> Jeff Frost, Owner     <jeff@frostconsultingllc.com>
> Frost Consulting, LLC     http://www.frostconsultingllc.com/
> Phone: 650-780-7908    FAX: 650-649-1954
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>       subscribe-nomail command to majordomo@postgresql.org so that
> your
>       message can get through to the mailing list cleanly
>

Re: Recovery will take 10 hours

From

Brendan Duddridge

Date:

20 April 2006, 20:15:11

Hi Tom,

I found it... it's called ktrace on OS X Server.

However, as I just finished posting to the list, the process died
with a PANIC error:

[2006-04-20 16:41:28 MDT] LOG: restored log file
"000000010000018F00000034" from archive
[2006-04-20 16:41:35 MDT] LOG: restored log file
"000000010000018F00000035" from archive
[2006-04-20 16:41:38 MDT] LOG: restored log file
"000000010000018F00000036" from archive
sh: line 1: /wal_archive/000000010000018F00000037.gz: No such file or
directory
[2006-04-20 16:41:46 MDT] LOG: could not open file "pg_xlog/
000000010000018F00000037" (log file 399, segment 55): No such file or
directory
[2006-04-20 16:41:46 MDT] LOG: redo done at 18F/36FFF254
sh: line 1: /wal_archive/000000010000018F00000036.gz: No such file or
directory
[2006-04-20 16:41:46 MDT] PANIC: could not open file "pg_xlog/
000000010000018F00000036" (log file 399, segment 54): No such file or
directory
[2006-04-20 16:41:46 MDT] LOG: startup process (PID 9190) was
terminated by signal 6
[2006-04-20 16:41:46 MDT] LOG: aborting startup due to startup
process failure
[2006-04-20 16:41:46 MDT] LOG: logger shutting down


Would turning off fsync make it go faster? Maybe it won't take 10
hours again if we start from scratch.

Also, what if we did just start it up again? Will postgres realize
that the existing wal_archive files have already been processed and
just skip along until it finds one it hasn't processed yet?

Thanks,

____________________________________________________________________
Brendan Duddridge | CTO | 403-277-5591 x24 |  brendan@clickspace.com

ClickSpace Interactive Inc.
Suite L100, 239 - 10th Ave. SE
Calgary, AB  T2G 0V9

http://www.clickspace.com

On Apr 20, 2006, at 3:19 PM, Tom Lane wrote:

> Brendan Duddridge <brendan@clickspace.com> writes:
>> Do you mean do a kill -QUIT on the postgres process in order to
>> generate a stack trace?
>
> Not at all!  I'm talking about tracing the kernel calls it's making.
> Depending on your platform, the tool for this is called strace,
> ktrace, truss, or maybe even just trace.  With strace you'd do
> something like
>
>     strace -p PID-of-process 2>outfile
>     ... wait 30 sec or so ...
>     control-C
>
> Not sure about the APIs for the others but they're probably roughly
> similar ... read the man page ...
>
>             regards, tom lane
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match
>

Re: Recovery will take 10 hours

From

Brendan Duddridge

Date:

20 April 2006, 20:20:18

Oops... forgot to mention that both files that postgres said were
missing are in fact there:

A partial listing from our wal_archive directory:

-rw------- 1 postgres staff 4971129 Apr 19 20:08
000000010000018F00000036.gz
-rw------- 1 postgres staff 4378284 Apr 19 20:09
000000010000018F00000037.gz

There didn't seem to be any issues with the NFS mount. Perhaps it
briefly disconnected and came back right away.


Thanks!


____________________________________________________________________
Brendan Duddridge | CTO | 403-277-5591 x24 |  brendan@clickspace.com

ClickSpace Interactive Inc.
Suite L100, 239 - 10th Ave. SE
Calgary, AB  T2G 0V9

http://www.clickspace.com

On Apr 20, 2006, at 5:11 PM, Brendan Duddridge wrote:

> Hi Jeff,
>
> The WAL files are stored on a separate server and accessed through
> an NFS mount located at /wal_archive.
>
> However, the restore failed about 5 hours in after we got this error:
>
> [2006-04-20 16:41:28 MDT] LOG: restored log file
> "000000010000018F00000034" from archive
> [2006-04-20 16:41:35 MDT] LOG: restored log file
> "000000010000018F00000035" from archive
> [2006-04-20 16:41:38 MDT] LOG: restored log file
> "000000010000018F00000036" from archive
> sh: line 1: /wal_archive/000000010000018F00000037.gz: No such file
> or directory
> [2006-04-20 16:41:46 MDT] LOG: could not open file "pg_xlog/
> 000000010000018F00000037" (log file 399, segment 55): No such file
> or directory
> [2006-04-20 16:41:46 MDT] LOG: redo done at 18F/36FFF254
> sh: line 1: /wal_archive/000000010000018F00000036.gz: No such file
> or directory
> [2006-04-20 16:41:46 MDT] PANIC: could not open file "pg_xlog/
> 000000010000018F00000036" (log file 399, segment 54): No such file
> or directory
> [2006-04-20 16:41:46 MDT] LOG: startup process (PID 9190) was
> terminated by signal 6
> [2006-04-20 16:41:46 MDT] LOG: aborting startup due to startup
> process failure
> [2006-04-20 16:41:46 MDT] LOG: logger shutting down
>
>
>
> The /wal_archive/000000010000018F00000037.gz is there accessible on
> the NFS mount.
>
> Is there a way to continue the restore process from where it left off?
>
> Thanks,
>
> ____________________________________________________________________
> Brendan Duddridge | CTO | 403-277-5591 x24 |  brendan@clickspace.com
>
> ClickSpace Interactive Inc.
> Suite L100, 239 - 10th Ave. SE
> Calgary, AB  T2G 0V9
>
> http://www.clickspace.com
>
> On Apr 20, 2006, at 3:19 PM, Jeff Frost wrote:
>
>> On Thu, 20 Apr 2006, Brendan Duddridge wrote:
>>
>>> Hi,
>>>
>>> We had a database issue today that caused us to have to restore
>>> to our most recent backup. We are using PITR so we have 3120 WAL
>>> files that need to be applied to the database.
>>>
>>> After 45 minutes, it has restored only 230 WAL files. At this
>>> rate, it's going to take about 10 hours to restore our database.
>>>
>>> Most of the time, the server is not using very much CPU time or I/
>>> O time. So I'm wondering what can be done to speed up the process?
>>
>> Brendan,
>>
>> Where are the WAL files being stored and how are they being read
>> back?
>>
>> --
>> Jeff Frost, Owner     <jeff@frostconsultingllc.com>
>> Frost Consulting, LLC     http://www.frostconsultingllc.com/
>> Phone: 650-780-7908    FAX: 650-649-1954
>>
>> ---------------------------(end of
>> broadcast)---------------------------
>> TIP 1: if posting/reading through Usenet, please send an appropriate
>>       subscribe-nomail command to majordomo@postgresql.org so that
>> your
>>       message can get through to the mailing list cleanly
>>
>
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
>               http://www.postgresql.org/docs/faq
>

Re: Recovery will take 10 hours

From

Tom Lane

Date:

20 April 2006, 20:20:56

Brendan Duddridge <brendan@clickspace.com> writes:
> However, as I just finished posting to the list, the process died
> with a PANIC error:

> [2006-04-20 16:41:28 MDT] LOG: restored log file
> "000000010000018F00000034" from archive
> [2006-04-20 16:41:35 MDT] LOG: restored log file
> "000000010000018F00000035" from archive
> [2006-04-20 16:41:38 MDT] LOG: restored log file
> "000000010000018F00000036" from archive
> sh: line 1: /wal_archive/000000010000018F00000037.gz: No such file or
> directory
> [2006-04-20 16:41:46 MDT] LOG: could not open file "pg_xlog/
> 000000010000018F00000037" (log file 399, segment 55): No such file or
> directory
> [2006-04-20 16:41:46 MDT] LOG: redo done at 18F/36FFF254
> sh: line 1: /wal_archive/000000010000018F00000036.gz: No such file or
> directory
> [2006-04-20 16:41:46 MDT] PANIC: could not open file "pg_xlog/
> 000000010000018F00000036" (log file 399, segment 54): No such file or
> directory

This looks to me like a bug in your archive restore command.  It had
just finished providing 000000010000018F00000036 at 16:41:38, why was
it not able to do so again at 16:41:46?

            regards, tom lane

Re: Recovery will take 10 hours

From

Jeff Frost

Date:

20 April 2006, 20:27:09

Brendan,

Is your NFS share mounted hard or soft?  Do you have space to copy the files
locally?  I suspect you're seeing NFS slowness in your restore since you
aren't using much in the way of disk IO or CPU.

-Jeff

On Thu, 20 Apr 2006, Brendan Duddridge wrote:

> Oops... forgot to mention that both files that postgres said were missing are
> in fact there:
>
> A partial listing from our wal_archive directory:
>
> -rw------- 1 postgres staff 4971129 Apr 19 20:08 000000010000018F00000036.gz
> -rw------- 1 postgres staff 4378284 Apr 19 20:09 000000010000018F00000037.gz
>
> There didn't seem to be any issues with the NFS mount. Perhaps it briefly
> disconnected and came back right away.
>
>
> Thanks!
>
>
> ____________________________________________________________________
> Brendan Duddridge | CTO | 403-277-5591 x24 |  brendan@clickspace.com
>
> ClickSpace Interactive Inc.
> Suite L100, 239 - 10th Ave. SE
> Calgary, AB  T2G 0V9
>
> http://www.clickspace.com
>
> On Apr 20, 2006, at 5:11 PM, Brendan Duddridge wrote:
>
>> Hi Jeff,
>>
>> The WAL files are stored on a separate server and accessed through an NFS
>> mount located at /wal_archive.
>>
>> However, the restore failed about 5 hours in after we got this error:
>>
>> [2006-04-20 16:41:28 MDT] LOG: restored log file "000000010000018F00000034"
>> from archive
>> [2006-04-20 16:41:35 MDT] LOG: restored log file "000000010000018F00000035"
>> from archive
>> [2006-04-20 16:41:38 MDT] LOG: restored log file "000000010000018F00000036"
>> from archive
>> sh: line 1: /wal_archive/000000010000018F00000037.gz: No such file or
>> directory
>> [2006-04-20 16:41:46 MDT] LOG: could not open file
>> "pg_xlog/000000010000018F00000037" (log file 399, segment 55): No such file
>> or directory
>> [2006-04-20 16:41:46 MDT] LOG: redo done at 18F/36FFF254
>> sh: line 1: /wal_archive/000000010000018F00000036.gz: No such file or
>> directory
>> [2006-04-20 16:41:46 MDT] PANIC: could not open file
>> "pg_xlog/000000010000018F00000036" (log file 399, segment 54): No such file
>> or directory
>> [2006-04-20 16:41:46 MDT] LOG: startup process (PID 9190) was terminated by
>> signal 6
>> [2006-04-20 16:41:46 MDT] LOG: aborting startup due to startup process
>> failure
>> [2006-04-20 16:41:46 MDT] LOG: logger shutting down
>>
>>
>>
>> The /wal_archive/000000010000018F00000037.gz is there accessible on the NFS
>> mount.
>>
>> Is there a way to continue the restore process from where it left off?
>>
>> Thanks,
>>
>> ____________________________________________________________________
>> Brendan Duddridge | CTO | 403-277-5591 x24 |  brendan@clickspace.com
>>
>> ClickSpace Interactive Inc.
>> Suite L100, 239 - 10th Ave. SE
>> Calgary, AB  T2G 0V9
>>
>> http://www.clickspace.com
>>
>> On Apr 20, 2006, at 3:19 PM, Jeff Frost wrote:
>>
>>> On Thu, 20 Apr 2006, Brendan Duddridge wrote:
>>>
>>>> Hi,
>>>>
>>>> We had a database issue today that caused us to have to restore to our
>>>> most recent backup. We are using PITR so we have 3120 WAL files that need
>>>> to be applied to the database.
>>>>
>>>> After 45 minutes, it has restored only 230 WAL files. At this rate, it's
>>>> going to take about 10 hours to restore our database.
>>>>
>>>> Most of the time, the server is not using very much CPU time or I/O time.
>>>> So I'm wondering what can be done to speed up the process?
>>>
>>> Brendan,
>>>
>>> Where are the WAL files being stored and how are they being read back?
>>>
>>> --
>>> Jeff Frost, Owner     <jeff@frostconsultingllc.com>
>>> Frost Consulting, LLC     http://www.frostconsultingllc.com/
>>> Phone: 650-780-7908    FAX: 650-649-1954
>>>
>>> ---------------------------(end of broadcast)---------------------------
>>> TIP 1: if posting/reading through Usenet, please send an appropriate
>>>      subscribe-nomail command to majordomo@postgresql.org so that your
>>>      message can get through to the mailing list cleanly
>>>
>>
>>
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 3: Have you checked our extensive FAQ?
>>
>>              http://www.postgresql.org/docs/faq
>>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
>

--
Jeff Frost, Owner     <jeff@frostconsultingllc.com>
Frost Consulting, LLC     http://www.frostconsultingllc.com/
Phone: 650-780-7908    FAX: 650-649-1954

Re: Recovery will take 10 hours

From

Brendan Duddridge

Date:

20 April 2006, 20:28:05

Well our restore command is pretty basic:

restore_command = 'gunzip </wal_archive/%f.gz>%p'

I'm not sure why that would succeed then fail.

____________________________________________________________________
Brendan Duddridge | CTO | 403-277-5591 x24 |  brendan@clickspace.com

ClickSpace Interactive Inc.
Suite L100, 239 - 10th Ave. SE
Calgary, AB  T2G 0V9

http://www.clickspace.com

On Apr 20, 2006, at 5:20 PM, Tom Lane wrote:

> Brendan Duddridge <brendan@clickspace.com> writes:
>> However, as I just finished posting to the list, the process died
>> with a PANIC error:
>
>> [2006-04-20 16:41:28 MDT] LOG: restored log file
>> "000000010000018F00000034" from archive
>> [2006-04-20 16:41:35 MDT] LOG: restored log file
>> "000000010000018F00000035" from archive
>> [2006-04-20 16:41:38 MDT] LOG: restored log file
>> "000000010000018F00000036" from archive
>> sh: line 1: /wal_archive/000000010000018F00000037.gz: No such file or
>> directory
>> [2006-04-20 16:41:46 MDT] LOG: could not open file "pg_xlog/
>> 000000010000018F00000037" (log file 399, segment 55): No such file or
>> directory
>> [2006-04-20 16:41:46 MDT] LOG: redo done at 18F/36FFF254
>> sh: line 1: /wal_archive/000000010000018F00000036.gz: No such file or
>> directory
>> [2006-04-20 16:41:46 MDT] PANIC: could not open file "pg_xlog/
>> 000000010000018F00000036" (log file 399, segment 54): No such file or
>> directory
>
> This looks to me like a bug in your archive restore command.  It had
> just finished providing 000000010000018F00000036 at 16:41:38, why was
> it not able to do so again at 16:41:46?
>
>             regards, tom lane
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo@postgresql.org so that
> your
>        message can get through to the mailing list cleanly
>

Re: Recovery will take 10 hours

From

Tom Lane

Date:

20 April 2006, 20:29:59

Brendan Duddridge <brendan@clickspace.com> writes:
> Oops... forgot to mention that both files that postgres said were
> missing are in fact there:

Please place the blame where it should fall: it's your archive restore
command that's telling postgres that.

> There didn't seem to be any issues with the NFS mount. Perhaps it
> briefly disconnected and came back right away.

Unstable NFS mounts are Really Bad News.  You shouldn't be expecting
to run a stable database atop such a thing.

If it's not the database but only the WAL archive that's NFS'd, it might
be possible to live with it, but you'll need to put some defenses into
your archive restore script to cope with such events.

As far as restarting goes: I think you can restart from here without
first redoing your base-backup restore, but as previously noted it'll
still read through the same WAL files it looked at before.  You won't
save much except the time to redo the base restore.

            regards, tom lane

Re: Recovery will take 10 hours

From

Brendan Duddridge

Date:

20 April 2006, 20:43:59

Thanks Tom,

We are storing only the WAL archives on the NFS volume. It must have
been a hiccup in the NFS mount. Jeff Frost asked if we were using
hard or soft mounts. We were using soft mounts, so that may be where
the problem lies with the PANIC.

Is it better to use the boot volume of the database machine for
archiving our WAL files instead of over the NFS mount? I'm sure it's
probably not a good idea to archive to the same volume as the pg_xlog
directory, so that's why I thought maybe using the boot drive would
be better. We'll just have to make sure we don't fill up the drive.
Although I know that PostgreSQL often writes to the /data directory
that is located on the boot drive. It might not be good to start
archiving there. Our table spaces are on a separate RAID.

If we need to restore in the future we'll just have to copy the WAL
files from the boot drive of our database machine over the NFS to the
restore machine.

Thanks,

____________________________________________________________________
Brendan Duddridge | CTO | 403-277-5591 x24 |  brendan@clickspace.com

ClickSpace Interactive Inc.
Suite L100, 239 - 10th Ave. SE
Calgary, AB  T2G 0V9

http://www.clickspace.com

On Apr 20, 2006, at 5:29 PM, Tom Lane wrote:

> Brendan Duddridge <brendan@clickspace.com> writes:
>> Oops... forgot to mention that both files that postgres said were
>> missing are in fact there:
>
> Please place the blame where it should fall: it's your archive restore
> command that's telling postgres that.
>
>> There didn't seem to be any issues with the NFS mount. Perhaps it
>> briefly disconnected and came back right away.
>
> Unstable NFS mounts are Really Bad News.  You shouldn't be expecting
> to run a stable database atop such a thing.
>
> If it's not the database but only the WAL archive that's NFS'd, it
> might
> be possible to live with it, but you'll need to put some defenses into
> your archive restore script to cope with such events.
>
> As far as restarting goes: I think you can restart from here without
> first redoing your base-backup restore, but as previously noted it'll
> still read through the same WAL files it looked at before.  You won't
> save much except the time to redo the base restore.
>
>             regards, tom lane
>

Re: Recovery will take 10 hours

From

Brendan Duddridge

Date:

20 April 2006, 21:24:47

Hi Tom,

Well, we started the restore back up with the WAL archives copied to
our local disk.

It's going at about the same pace as with the restore over NFS.

So I tried ktrace -p PID and it created a really big file. I had to
do 'ktrace -p PID -c' to get it to stop.

The ktrace.out file is read using kdump, but there's a lot of binary
data in there intermixed with some system calls.

For example:

15267 postgres RET   read 8192/0x2000
15267 postgres CALL  lseek(153,0,2)
15267 postgres RET   lseek 0
15267 postgres CALL  lseek(127,0,2)
15267 postgres RET   lseek 0
15267 postgres CALL  lseek(138,0,2)
15267 postgres RET   lseek 0
15267 postgres CALL  lseek(153,0,2)
15267 postgres RET   lseek 0
15267 postgres CALL  lseek(127,0,2)
15267 postgres RET   lseek 0
15267 postgres CALL  read(5,25225728,8192)
15267 postgres GIO   fd 5 read 8192 bytes
        "\M-P]\0\^A\0\0\0\^A\0\0\^A\M^H,\M-5`\0\0\0\^C\M-6r fill,
polyester has a subtle sheen, machine wash\0\0\0Xreverses to\
          solid colour, polyester fill, polyester has a subtle sheen,
machine wash\^_\^Y7\M-3\0\0\0\0\0\0\0\0\0\0\0\0\0\0oG\0\
         \b\0\^[)\^C \M^Or\M-#\^B\0\0\0\0\0A\M-&\M-]

... lots of data ....

         \M^K$\0\0\0\fcomplete\0\0\0HCustom-width Valanceless
Aluminum Mini Blinds 37 1/4-44" w. x 48" l.\0\0\0\M-P1" aluminum\
          slats, valanceless headrail and matching bottom rail,
hidden brackets, clear acrylic tilt wand, extra slats with rou\
         te holes in the back, can be cut down to minimum width of
14", hardware. . .\0\0\^Aq1" aluminum slats, valanceless he\
         adrail and matching bottom rail, hidden brackets, clear
acrylic tilt wand, extra slats with route holes in the back, \
         can be cut down to minimum width of 14", hardware and
instructions included, wipe with a dam"
15267 postgres RET   read 8192/0x2000
15267 postgres CALL  lseek(138,0,2)
15267 postgres RET   lseek 0
15267 postgres CALL  lseek(158,317251584,0)
15267 postgres RET   lseek 0
15267 postgres CALL  write(158,35286464,8192)
15267 postgres GIO   fd 158 wrote 8192 bytes
        "\0\0\^A\M^H+\M^W@\M-,\0\0\0\^A\^A\M-D\^P\M-@\^_\M-p \^C?\M^X
\M^@$?P\M^@$?\b\M^@$>\M-@\M^@$>x\M^@$>0\M^@$=\M-h\M^@$=\
         \240\M^@$#0\M^@$"X\M^@$=X\M^@$=\^P\M^@$<\M-H\M^@$<\M^@\M^@$<8
\M^@$;\M-p\M^@$;\M-(\M^@$;`\M^@$;\^X\M^@$:\M-P\M^@$:\M^H\

etc...

I'm not sure that really tells me anything though other than the WAL
archives don't actually archive SQL, but store only the database
changes.


Thanks,

____________________________________________________________________
Brendan Duddridge | CTO | 403-277-5591 x24 |  brendan@clickspace.com

ClickSpace Interactive Inc.
Suite L100, 239 - 10th Ave. SE
Calgary, AB  T2G 0V9

http://www.clickspace.com

On Apr 20, 2006, at 3:19 PM, Tom Lane wrote:

> Brendan Duddridge <brendan@clickspace.com> writes:
>> Do you mean do a kill -QUIT on the postgres process in order to
>> generate a stack trace?
>
> Not at all!  I'm talking about tracing the kernel calls it's making.
> Depending on your platform, the tool for this is called strace,
> ktrace, truss, or maybe even just trace.  With strace you'd do
> something like
>
>     strace -p PID-of-process 2>outfile
>     ... wait 30 sec or so ...
>     control-C
>
> Not sure about the APIs for the others but they're probably roughly
> similar ... read the man page ...
>
>             regards, tom lane
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match
>

Re: Recovery will take 10 hours

From

Brendan Duddridge

Date:

20 April 2006, 22:06:12

Hi Tomas,

Hmm... ktrace -p PID -c returns immediately without doing anything
unless I've previously done a ktrace -p PID.

According to the man page for ktrace's -c flag:
   -c      Clear the trace points associated with the specified file
or processes.

When I run ktrace on OS X Server 10.4.6 it returns to the console
immediately, however the ktrace.out file gets larger and larger until
I issue another ktrace command with the -c flag. It never sits
waiting for keyboard input.


I haven't been able to find any way of generating the stats yet. The
man page for ktrace or kdump doesn't mention anything about stats.


Thanks,


____________________________________________________________________
Brendan Duddridge | CTO | 403-277-5591 x24 |  brendan@clickspace.com

ClickSpace Interactive Inc.
Suite L100, 239 - 10th Ave. SE
Calgary, AB  T2G 0V9

http://www.clickspace.com

On Apr 20, 2006, at 6:51 PM, Tomas Vondra wrote:

>> So I tried ktrace -p PID and it created a really big file. I had
>> to do
>> 'ktrace -p PID -c' to get it to stop.
>>
>> The ktrace.out file is read using kdump, but there's a lot of binary
>> data in there intermixed with some system calls.
>
> Yes, that's what (s|k)trace does - it attaches to the process, and
> prints out all the system calls, parameters, return values etc. That
> gives you "exact" overview of what's going on in the program, but it's
> a little bit confusing if you are not familiar with that and/or you're
> in a hurry.
>
> But Luke Lonergan offered a '-c' switch, which gives you a statistics
> of the used system calls. This way you can see number of calls for
> individual syscalls and time spent in them. That could give you a hint
> why the process is so slow (for example there can be an I/O bottleneck
> or something like that).
>
> Just do 'ktrace -p PID -c' for about 30 seconds, then 'Ctrl-C' and
> post
> the output to this mailing list.
>
> t.v.
>

Re: Recovery will take 10 hours

From

Gavin Sherry

Date:

20 April 2006, 22:16:00

On Thu, 20 Apr 2006, Brendan Duddridge wrote:

> Hi Tomas,
>
> Hmm... ktrace -p PID -c returns immediately without doing anything
> unless I've previously done a ktrace -p PID.
>
> According to the man page for ktrace's -c flag:
>    -c      Clear the trace points associated with the specified file
> or processes.

On other systems, strace/truss with -c produces a list of sys calls with
the number of times they've been called in the elapsed period.

To answer your other question, temporarily disabling fsync during the
recovery should speed it up.

For future reference, processing thousands of WAL files for recovery is
not ideal. You should be doing a base backup much more often.

Gavin

Re: Recovery will take 10 hours

From

Simon Riggs

Date:

23 April 2006, 14:05:30

On Thu, 2006-04-20 at 13:29 -0600, Brendan Duddridge wrote:

> We had a database issue today that caused us to have to restore to
> our most recent backup. We are using PITR so we have 3120 WAL files
> that need to be applied to the database.

How often are you taking base backups?

> After 45 minutes, it has restored only 230 WAL files. At this rate,
> it's going to take about 10 hours to restore our database.

> Most of the time, the server is not using very much CPU time or I/O
> time. So I'm wondering what can be done to speed up the process?

You can improve the performance of a recovery by making your restore
command overlap retrieval of files. The recovery process waits while the
restore command returns a file, so by doing an asynchronous lookahead of
one file you can be gunzipping the next file while the current one is
being processed.

I'll either document this better, or build an overlap into the restore
command processing itself, so the script doesn't need to do this.

--
  Simon Riggs
  EnterpriseDB          http://www.enterprisedb.com/

Re: Recovery will take 10 hours

From

Brendan Duddridge

Date:

24 April 2006, 01:46:58

Hi Simon,

The backup with 3120 WAL files was a 2 day old base backup. We've moved
to a 1 day base backup now, but that'll still be 1600 WALs or so a day.
That will probably take 5 hours to restore I suspect. Unless we move to
2 or more base backups per day. That seems crazy though.

So how do you overlap the restore process with the retrieving of files?

Our restore command is:

restore_command = 'gunzip </wal_archive/%f.gz>%p'

If I change it to:

restore_command = 'gunzip </wal_archive/%f.gz>%p &'

to execute the restore command in the background, will that do the
trick?

But I don't think the real problem was the retrieval of the files. It
only
took maybe 1/2 a second to retrieve the file, but often took anywhere
from
5 to 30 seconds to process the file. More so on the longer end of the
scale.

Thanks,

____________________________________________________________________
Brendan Duddridge | CTO | 403-277-5591 x24 |  brendan@clickspace.com

ClickSpace Interactive Inc.
Suite L100, 239 - 10th Ave. SE
Calgary, AB  T2G 0V9

http://www.clickspace.com

On Apr 23, 2006, at 10:06 AM, Simon Riggs wrote:

> On Thu, 2006-04-20 at 13:29 -0600, Brendan Duddridge wrote:
>
>> We had a database issue today that caused us to have to restore to
>> our most recent backup. We are using PITR so we have 3120 WAL files
>> that need to be applied to the database.
>
> How often are you taking base backups?
>
>> After 45 minutes, it has restored only 230 WAL files. At this rate,
>> it's going to take about 10 hours to restore our database.
>
>> Most of the time, the server is not using very much CPU time or I/O
>> time. So I'm wondering what can be done to speed up the process?
>
> You can improve the performance of a recovery by making your restore
> command overlap retrieval of files. The recovery process waits
> while the
> restore command returns a file, so by doing an asynchronous
> lookahead of
> one file you can be gunzipping the next file while the current one is
> being processed.
>
> I'll either document this better, or build an overlap into the restore
> command processing itself, so the script doesn't need to do this.
>
> --
>   Simon Riggs
>   EnterpriseDB          http://www.enterprisedb.com/
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo@postgresql.org so that
> your
>        message can get through to the mailing list cleanly
>

Re: Recovery will take 10 hours

From

Markus Schaber

Date:

24 April 2006, 05:31:54

Hi, Brandan,

Brendan Duddridge wrote:

> So how do you overlap the restore process with the retrieving of files?

You need a shell script as restore command that does both uncompressing
the current file, and starting a background decompress of the next
file(s). It also has to check whether the current file is already in
progress from a last run, and wait until this is finished instead of
decompressing it. Seems to be a little complicated than it sounds first.

> restore_command = 'gunzip </wal_archive/%f.gz>%p &'

Warning: Don't do it this way!

It will break things because PostgreSQL will try to access a
not-completely-restored wal file.

HTH,
Markus
--
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf.     | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

Re: Recovery will take 10 hours

From

Simon Riggs

Date:

24 April 2006, 15:04:47

On Sun, 2006-04-23 at 22:46 -0600, Brendan Duddridge wrote:

> So how do you overlap the restore process with the retrieving of files?

The restore command can be *anything*. You just write a script...

> Our restore command is:
>
> restore_command = 'gunzip </wal_archive/%f.gz>%p'
>
> If I change it to:
>
> restore_command = 'gunzip </wal_archive/%f.gz>%p &'
>
> to execute the restore command in the background, will that do the
> trick?

No, but you can execute a shell script that does use & internally.

> But I don't think the real problem was the retrieval of the files. It
> only
> took maybe 1/2 a second to retrieve the file, but often took anywhere
> from
> 5 to 30 seconds to process the file. More so on the longer end of the
> scale.

Sorry, thought you meant the decompression time.

--
  Simon Riggs
  EnterpriseDB          http://www.enterprisedb.com/