Thread: WAL questions

WAL questions

From
"Steve Oualline"
Date:

We have a system with 1202 files in the WAL directory (pg_xlog).
When we start postmaster, it goes into the starting state for 5 minutes
and then crashes.

Questions:

1) What is the biggest number of WAL files you've seen and what were
you doing to the database at the time?

2) When postmaster starts, it replays the WAL files.  During this time
any connection is rejected with an error indicating that the database
is starting up.  What the longest amount of time that you'd expect
postmaster to be in the "starting up" state?

Re: WAL questions

From
Tom Lane
Date:
"Steve Oualline" <soualline@stbernard.com> writes:
> We have a system with 1202 files in the WAL directory (pg_xlog).
> When we start postmaster, it goes into the starting state for 5 minutes
> and then crashes.

Define "crash".  If you don't show us the *exact* messages you're
seeing, it's difficult to guess what's going on.

Also, what happened when the postmaster stopped the first time?  The
most interesting part of this from my point of view is how did you get
into this state in the first place --- unless you had set insanely high
values for checkpoint_segments and checkpoint_timeout, you should not
have gotten up to that many files in pg_xlog.  A plausible guess is that
something was preventing checkpoints from completing, but any such
problem should have left traces in the postmaster log.  If you've still
got the pre-crash log it would be very interesting to see.

            regards, tom lane