Home > mailing lists

Recovery inconsistencies, standby much larger than primary - Mailing list pgsql-hackers

From	Greg Stark
Subject	Recovery inconsistencies, standby much larger than primary
Date	January 25, 2014 03:24:18
Msg-id	CAM-w4HNUgUcJqDNbgfAU=YjksHZDPPbT7RH73pYrzd7ByYrjrA@mail.gmail.com Whole thread Raw
Responses	Re: Recovery inconsistencies, standby much larger than primary (Andres Freund <andres@2ndquadrant.com>)
List	pgsql-hackers

Tree view

Since the point release we've run into a number of databases that when
we restore from a base backup end up being larger than the primary
database was. Sometimes by a large factor. The data below is from
9.1.11 (both primary and standby) but we've seen the same thing on
9.2.6.

primary$ for i in  1261982 1364767 1366221 473158 ; do echo -n "$i " ;
du -shc $i* | tail -1 ; done
1261982 29G total
1364767 23G total
1366221 12G total
473158 76G total

standby$ for i in  1261982 1364767 1366221 473158 ; do echo -n "$i " ;
du -shc $i* | tail -1 ; done
1261982 55G total
1364767 28G total
1366221 17G total
473158 139G total

I've run the snaga xlogdump on the WAL records played before reaching
a consistent point (we confirmed the extra storage had already
appeared by then) and grepped for the above relfilenode but they're
quite large. I believe these dumps don't contain any sensitive data,
when I verify that I can upload one of them for inspection.

$ ls -lh [14]*
-rw-rw-r-- 1 heroku heroku 325M Jan 24 04:13 1261982
-rw-r--r-- 1 root   root   352M Jan 25 00:04 1364767
-rw-r--r-- 1 root   root   123M Jan 25 00:04 1366221
-rw-r--r-- 1 root   root   357M Jan 25 00:04 473158

The first three are btrees and the fourth is a haeap btw.

We're also seeing log entries about "wal contains reference to invalid
pages" but these errors seem only vaguely correlated. Sometimes we get
the errors but the tables don't grow noticeably and sometimes we don't
get the errors and the tables are much larger.

Much of the added space is uninitialized pages as you might expect but
I don't understand is how the database can start up without running
into the "reference to invalid pages" panic consistently. We check
both that there are no references after consistency is reached *and*
that any references before consistency are resolved by a truncate or
unlink before consistency.

The primary was never this large btw, so it's not just a case of
leftover files from drops or truncates that might have failed on the
standby.

I'm assuming this is somehow related to the mulixact or transaction
wraparound problems but I don't really understand how they could be
hitting when both the primary and standby are post-upgrade to the most
recent point release which have the fixes

-- 
greg

pgsql-hackers by date:

From: Tom Lane
Date: 25 January 2014, 02:54:25
Subject: Re: pg_get_viewdefs() indentation considered harmful

From: Greg Stark
Date: 25 January 2014, 03:49:47
Subject: Re: pg_get_viewdefs() indentation considered harmful

Recovery inconsistencies, standby much larger than primary - Mailing list pgsql-hackers

Previous

Next