Thread: page is uninitialized --- fixing

page is uninitialized --- fixing

From
Brad Nicholson
Date:
We just took a test database down (PG 8.1.11) fairly hard (pulled a SAN
switch out while it was under load), which caused the DB to crash.  It
started up fine, when I vacuumed the DB, I saw the following messages.

WARNING:  relation "my_table" page 652139 is uninitialized --- fixing
WARNING:  relation "my_table" page 652140 is uninitialized --- fixing
WARNING:  relation "my_table" page 652940 is uninitialized --- fixing
WARNING:  relation "my_table" page 652941 is uninitialized --- fixing

That sort of looks like it could be data loss, can someone explain what
happened?

Brad.


Re: page is uninitialized --- fixing

From
Tom Lane
Date:
Brad Nicholson <bnichols@ca.afilias.info> writes:
> We just took a test database down (PG 8.1.11) fairly hard (pulled a SAN
> switch out while it was under load), which caused the DB to crash.  It
> started up fine, when I vacuumed the DB, I saw the following messages.

> WARNING:  relation "my_table" page 652139 is uninitialized --- fixing
> WARNING:  relation "my_table" page 652140 is uninitialized --- fixing
> WARNING:  relation "my_table" page 652940 is uninitialized --- fixing
> WARNING:  relation "my_table" page 652941 is uninitialized --- fixing

> That sort of looks like it could be data loss, can someone explain what
> happened?

It could be that but not necessarily.  These could be pages that were
allocated to put new tuples into, but the crash happened before the
inserting transaction committed (and, in fact, before any of the
associated WAL entries had made their way to disk -- else the empty
pages would've been initialized during WAL replay).

It would be easier to believe that if the uninitialized pages were all
contiguous though.  Do you know that this table was under heavy insert
load at the time?

            regards, tom lane

Re: page is uninitialized --- fixing

From
Alvaro Herrera
Date:
Brad Nicholson wrote:
> We just took a test database down (PG 8.1.11) fairly hard (pulled a SAN
> switch out while it was under load), which caused the DB to crash.  It
> started up fine, when I vacuumed the DB, I saw the following messages.
>
> WARNING:  relation "my_table" page 652139 is uninitialized --- fixing
> WARNING:  relation "my_table" page 652140 is uninitialized --- fixing
> WARNING:  relation "my_table" page 652940 is uninitialized --- fixing
> WARNING:  relation "my_table" page 652941 is uninitialized --- fixing
>
> That sort of looks like it could be data loss, can someone explain what
> happened?

It means someone extended the table, but didn't get around to actually
putting tuples on it and committing.  Four transactions were doing this
when the system crashed.  No committed data was lost.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: page is uninitialized --- fixing

From
Brad Nicholson
Date:
On Wed, 2008-03-26 at 15:31 -0400, Tom Lane wrote:
> Brad Nicholson <bnichols@ca.afilias.info> writes:
> > We just took a test database down (PG 8.1.11) fairly hard (pulled a SAN

> It could be that but not necessarily.  These could be pages that were
> allocated to put new tuples into, but the crash happened before the
> inserting transaction committed (and, in fact, before any of the
> associated WAL entries had made their way to disk -- else the empty
> pages would've been initialized during WAL replay).
>
> It would be easier to believe that if the uninitialized pages were all
> contiguous though.  Do you know that this table was under heavy insert
> load at the time?

It was.  This table is an insert only log table that was being heavily
was being heavily written to at the time of the crash.

Brad.


Re: page is uninitialized --- fixing

From
Alvaro Herrera
Date:
Brad Nicholson wrote:
> On Wed, 2008-03-26 at 15:31 -0400, Tom Lane wrote:
> > Brad Nicholson <bnichols@ca.afilias.info> writes:
> > > We just took a test database down (PG 8.1.11) fairly hard (pulled a SAN

> > It would be easier to believe that if the uninitialized pages were all
> > contiguous though.  Do you know that this table was under heavy insert
> > load at the time?
>
> It was.  This table is an insert only log table that was being heavily
> was being heavily written to at the time of the crash.

Is it possible that there were *two* crashes?

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: page is uninitialized --- fixing

From
Brad Nicholson
Date:
On Thu, 2008-03-27 at 10:29 -0300, Alvaro Herrera wrote:
> Brad Nicholson wrote:
> > On Wed, 2008-03-26 at 15:31 -0400, Tom Lane wrote:
> > > Brad Nicholson <bnichols@ca.afilias.info> writes:
> > > > We just took a test database down (PG 8.1.11) fairly hard (pulled a SAN
>
> > > It would be easier to believe that if the uninitialized pages were all
> > > contiguous though.  Do you know that this table was under heavy insert
> > > load at the time?
> >
> > It was.  This table is an insert only log table that was being heavily
> > was being heavily written to at the time of the crash.
>
> Is it possible that there were *two* crashes?

There was only one crash.  However, there were two separate SAN switches
that were pulled out from under the DB, not sure if that would matter.

Brad.


Re: page is uninitialized --- fixing

From
Tom Lane
Date:
Brad Nicholson <bnichols@ca.afilias.info> writes:
> On Thu, 2008-03-27 at 10:29 -0300, Alvaro Herrera wrote:
>> Brad Nicholson wrote:
>>> It was.  This table is an insert only log table that was being heavily
>>> was being heavily written to at the time of the crash.
>>
>> Is it possible that there were *two* crashes?

> There was only one crash.  However, there were two separate SAN switches
> that were pulled out from under the DB, not sure if that would matter.

To explain the pattern that was shown I think you'd have to assume
something like this:

* Some backend goes to do an insert, finds no space in FSM, obtains
and zeroes page 652139.  But before it can do the insert (or at least
before it can emit the WAL record) it blocks for a long time.

* Some other backend does exactly the same thing with page 652140.

* While those guys are still blocked, yet other backends write into
pages 652141..652939.  These writes do make it to WAL.

* One or two backends initialize pages 652940 and 652941, but these
writes don't make it to WAL.  (This could be just one backend, if
you assume it had WAL-logged the first write but that didn't get
out of WAL buffers in time.)

* Crash.

This is not entirely out of the question, because of the designed-in
property that a freshly initialized page is only inserted into by
the backend that got it --- no one else will know there is any
free space in it until VACUUM first passes over it.  So if there
are a lot of different sessions writing into this table you don't
need to assume more than about one tuple per page.  Still, it's
kinda hard to believe that the first two backends could remain stuck
for so long as to let ~800 other insertions happen.

What do you mean by "two separate SAN switches pulled out" --- is the
DB spread across multiple SAN controllers?

            regards, tom lane

Re: page is uninitialized --- fixing

From
Brad Nicholson
Date:
On Thu, 2008-03-27 at 10:37 -0400, Tom Lane wrote:
>
> What do you mean by "two separate SAN switches pulled out" --- is the
> DB spread across multiple SAN controllers?
>

It's using IO mutilpath through 2 HBAs.  Both of those were taken down.

Brad.


Re: page is uninitialized --- fixing

From
Stefan Kaltenbrunner
Date:
Tom Lane wrote:

> This is not entirely out of the question, because of the designed-in
> property that a freshly initialized page is only inserted into by
> the backend that got it --- no one else will know there is any
> free space in it until VACUUM first passes over it.  So if there
> are a lot of different sessions writing into this table you don't
> need to assume more than about one tuple per page.  Still, it's
> kinda hard to believe that the first two backends could remain stuck
> for so long as to let ~800 other insertions happen.

depending on how the multipathing and recovery works on that particular
SAN/OS combination it might very well be that some processes are getting
  their IO hold much longer than some other processes.
Maybe the  first two backends had IO in-flight and the OS needed time to
requeue/resend those after the SAN recovered and "new" backends were
able to do IO immediately ?


Stefan