Home > mailing lists

Re: Recovery inconsistencies, standby much larger than primary - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: Recovery inconsistencies, standby much larger than primary
Date	February 12, 2014 21:51:56
Msg-id	32313.1392231107@sss.pgh.pa.us Whole thread Raw
In response to	Re: Recovery inconsistencies, standby much larger than primary (Tom Lane <tgl@sss.pgh.pa.us>)
List	pgsql-hackers

Tree view

I wrote:
> Greg Stark <stark@mit.edu> writes:
>> (Or maybe the hot backup
>> process could just catch the files in this state if a table is rapidly
>> growing and it doesn't take care to avoid picking up new files that
>> appear after it starts?)

> That's a possible explanation I guess, but it doesn't seem terribly
> probable from a timing standpoint.

I did a bit of arithmetic using the cases you posted previously.
In the first case, where block 3634978 got written to 7141472,
you can make the numbers come out right if you assume that a page
was missing at the end of segment 1 --- that leads to the conclusion
that EOF exclusive of that missing page had been around 28.75 GB,
which squares well with the relation's size on master.  However, it's
fairly hard to credit that the base backup would have collected
full-size or nearly full-size images of segments 2 through 28 while
not seeing segment 1 at full size.  You'd have to assume that the
rel grew by a factor of ~14 while the base backup was in progress
--- and then didn't grow very much more afterwards.  (What state
exactly did you measure the primary rel sizes in?  Was it long
after the backup/restore, or did you rewind things somehow?)

The other examples seem to fit the theory a bit better, but this
one is hard to explain this way.

The other big problem for this theory is that you said in
http://www.postgresql.org/message-id/CAM-w4HPvJCBRVV3dXg8aj0WzkU08dHuX-XYbfDYQhNrn5bnTQg@mail.gmail.com

> What's worse is we created a new standby from the same base backup and
> replayed the same records and it didn't reproduce the problem.

If this were the explanation, it oughta be reproducible that way.

I still agree that XLogReadBufferExtended shouldn't be assuming that P_NEW
will not skip pages.  But I think we have another bug in here somewhere.
        regards, tom lane

pgsql-hackers by date:

From: Robert Haas
Date: 12 February 2014, 21:47:48
Subject: Re: WIP patch for Todo Item : Provide fallback_application_name in contrib/pgbench, oid2name, and dblink

From: Tom Lane
Date: 12 February 2014, 21:56:05
Subject: Re: Recovery inconsistencies, standby much larger than primary

Re: Recovery inconsistencies, standby much larger than primary - Mailing list pgsql-hackers

Previous

Next