Re: [PATCH] pg_receivexlog: fixed to work with logical segno > 0 - Mailing list pgsql-hackers

From Mika Eloranta
Subject Re: [PATCH] pg_receivexlog: fixed to work with logical segno > 0
Date
Msg-id 9AAF2810-A8D8-40A3-AC43-F5E8363E496D@ohmu.fi
Whole thread Raw
In response to Re: [PATCH] pg_receivexlog: fixed to work with logical segno > 0  (Heikki Linnakangas <hlinnakangas@vmware.com>)
List pgsql-hackers
On Nov 4, 2013, at 11:06, Heikki Linnakangas wrote:
> On 01.11.2013 11:42, Mika Eloranta wrote:
>> pg_receivexlog calculated the xlog segment number incorrectly
>> when started after the previous instance was interrupted.
>>
>> Resuming streaming only worked when the physical wal segment
>> counter was zero, i.e. for the first 256 segments or so.
>
> Oops. Fixed, thanks for the report!
>
> It's a bit scary that this bug went unnoticed for this long; it was introduced quite early in the 9.3 development
cycle.Seems that I did all the testing of streaming timeline changes with pg_receivexlog later in 9.3 cycle with
segmentnumbers < 256, and no-one else have done long-running tests with pg_receivexlog either. 

Thanks for the fix, Heikki!

It sounds like either PostgreSQL 9.3.x and/or pg_receivexlog is not yet used in a lot of places. Otherwise this
probablywould have been found earlier. 

Affected versions:

$ git tag --contains dfda6eba
REL9_3_0
REL9_3_1
REL9_3_BETA1
REL9_3_BETA2
REL9_3_RC1

What makes this a really sneaky and severe problem is the way it stays dormant for a period of time after a fresh db
initor pg_upgrade. Here's how I bumped into it: 

1. Old postgresql 9.2 db running, pg_receivexlog streaming extra backups to a remote box.
2. pg_upgrade to 9.3.1.
3. pg_receivexlog from the upgraded DB still works ok and handles restarts fine, because the xlog indexes were reset
backto zero at pg_upgrade. 
4. xlog history eventually grows over 256 * 16MB.
5. pg_receivexlog gets interrupted for whatever reason (gets stopped, killed, crashes, host is restarted).
6. A new pg_receivexlog instance fails to resume streaming and there is no easy workaround that would maintain an
uninterrupted,gapless xlog history. 

Initially, before I had analysed the problem any further, I had to stash the xlogs, restart pg_receivexlog and after
thattrigger new pg_basebackups. 

Regardless of this bug, I find that pg_receivexlog (and pg_basebackup) are excellent tools and people should use them
more!

PS. something like "pg_receivexlog --start-pos=2D/15000000" might be nice for overriding the streaming start position.

--
Mika Eloranta
Ohmu Ltd.  http://www.ohmu.fi/


pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Something fishy happening on frogmouth
Next
From: Heikki Linnakangas
Date:
Subject: Re: missing RelationCloseSmgr in FreeFakeRelcacheEntry?