Thread: page is uninitialized --- fixing
We just took a test database down (PG 8.1.11) fairly hard (pulled a SAN switch out while it was under load), which caused the DB to crash. It started up fine, when I vacuumed the DB, I saw the following messages. WARNING: relation "my_table" page 652139 is uninitialized --- fixing WARNING: relation "my_table" page 652140 is uninitialized --- fixing WARNING: relation "my_table" page 652940 is uninitialized --- fixing WARNING: relation "my_table" page 652941 is uninitialized --- fixing That sort of looks like it could be data loss, can someone explain what happened? Brad.
Brad Nicholson <bnichols@ca.afilias.info> writes: > We just took a test database down (PG 8.1.11) fairly hard (pulled a SAN > switch out while it was under load), which caused the DB to crash. It > started up fine, when I vacuumed the DB, I saw the following messages. > WARNING: relation "my_table" page 652139 is uninitialized --- fixing > WARNING: relation "my_table" page 652140 is uninitialized --- fixing > WARNING: relation "my_table" page 652940 is uninitialized --- fixing > WARNING: relation "my_table" page 652941 is uninitialized --- fixing > That sort of looks like it could be data loss, can someone explain what > happened? It could be that but not necessarily. These could be pages that were allocated to put new tuples into, but the crash happened before the inserting transaction committed (and, in fact, before any of the associated WAL entries had made their way to disk -- else the empty pages would've been initialized during WAL replay). It would be easier to believe that if the uninitialized pages were all contiguous though. Do you know that this table was under heavy insert load at the time? regards, tom lane
Brad Nicholson wrote: > We just took a test database down (PG 8.1.11) fairly hard (pulled a SAN > switch out while it was under load), which caused the DB to crash. It > started up fine, when I vacuumed the DB, I saw the following messages. > > WARNING: relation "my_table" page 652139 is uninitialized --- fixing > WARNING: relation "my_table" page 652140 is uninitialized --- fixing > WARNING: relation "my_table" page 652940 is uninitialized --- fixing > WARNING: relation "my_table" page 652941 is uninitialized --- fixing > > That sort of looks like it could be data loss, can someone explain what > happened? It means someone extended the table, but didn't get around to actually putting tuples on it and committing. Four transactions were doing this when the system crashed. No committed data was lost. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Wed, 2008-03-26 at 15:31 -0400, Tom Lane wrote: > Brad Nicholson <bnichols@ca.afilias.info> writes: > > We just took a test database down (PG 8.1.11) fairly hard (pulled a SAN > It could be that but not necessarily. These could be pages that were > allocated to put new tuples into, but the crash happened before the > inserting transaction committed (and, in fact, before any of the > associated WAL entries had made their way to disk -- else the empty > pages would've been initialized during WAL replay). > > It would be easier to believe that if the uninitialized pages were all > contiguous though. Do you know that this table was under heavy insert > load at the time? It was. This table is an insert only log table that was being heavily was being heavily written to at the time of the crash. Brad.
Brad Nicholson wrote: > On Wed, 2008-03-26 at 15:31 -0400, Tom Lane wrote: > > Brad Nicholson <bnichols@ca.afilias.info> writes: > > > We just took a test database down (PG 8.1.11) fairly hard (pulled a SAN > > It would be easier to believe that if the uninitialized pages were all > > contiguous though. Do you know that this table was under heavy insert > > load at the time? > > It was. This table is an insert only log table that was being heavily > was being heavily written to at the time of the crash. Is it possible that there were *two* crashes? -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
On Thu, 2008-03-27 at 10:29 -0300, Alvaro Herrera wrote: > Brad Nicholson wrote: > > On Wed, 2008-03-26 at 15:31 -0400, Tom Lane wrote: > > > Brad Nicholson <bnichols@ca.afilias.info> writes: > > > > We just took a test database down (PG 8.1.11) fairly hard (pulled a SAN > > > > It would be easier to believe that if the uninitialized pages were all > > > contiguous though. Do you know that this table was under heavy insert > > > load at the time? > > > > It was. This table is an insert only log table that was being heavily > > was being heavily written to at the time of the crash. > > Is it possible that there were *two* crashes? There was only one crash. However, there were two separate SAN switches that were pulled out from under the DB, not sure if that would matter. Brad.
Brad Nicholson <bnichols@ca.afilias.info> writes: > On Thu, 2008-03-27 at 10:29 -0300, Alvaro Herrera wrote: >> Brad Nicholson wrote: >>> It was. This table is an insert only log table that was being heavily >>> was being heavily written to at the time of the crash. >> >> Is it possible that there were *two* crashes? > There was only one crash. However, there were two separate SAN switches > that were pulled out from under the DB, not sure if that would matter. To explain the pattern that was shown I think you'd have to assume something like this: * Some backend goes to do an insert, finds no space in FSM, obtains and zeroes page 652139. But before it can do the insert (or at least before it can emit the WAL record) it blocks for a long time. * Some other backend does exactly the same thing with page 652140. * While those guys are still blocked, yet other backends write into pages 652141..652939. These writes do make it to WAL. * One or two backends initialize pages 652940 and 652941, but these writes don't make it to WAL. (This could be just one backend, if you assume it had WAL-logged the first write but that didn't get out of WAL buffers in time.) * Crash. This is not entirely out of the question, because of the designed-in property that a freshly initialized page is only inserted into by the backend that got it --- no one else will know there is any free space in it until VACUUM first passes over it. So if there are a lot of different sessions writing into this table you don't need to assume more than about one tuple per page. Still, it's kinda hard to believe that the first two backends could remain stuck for so long as to let ~800 other insertions happen. What do you mean by "two separate SAN switches pulled out" --- is the DB spread across multiple SAN controllers? regards, tom lane
On Thu, 2008-03-27 at 10:37 -0400, Tom Lane wrote: > > What do you mean by "two separate SAN switches pulled out" --- is the > DB spread across multiple SAN controllers? > It's using IO mutilpath through 2 HBAs. Both of those were taken down. Brad.
Tom Lane wrote: > This is not entirely out of the question, because of the designed-in > property that a freshly initialized page is only inserted into by > the backend that got it --- no one else will know there is any > free space in it until VACUUM first passes over it. So if there > are a lot of different sessions writing into this table you don't > need to assume more than about one tuple per page. Still, it's > kinda hard to believe that the first two backends could remain stuck > for so long as to let ~800 other insertions happen. depending on how the multipathing and recovery works on that particular SAN/OS combination it might very well be that some processes are getting their IO hold much longer than some other processes. Maybe the first two backends had IO in-flight and the OS needed time to requeue/resend those after the SAN recovered and "new" backends were able to do IO immediately ? Stefan