On Nov 4, 2013, at 11:06, Heikki Linnakangas wrote:
> On 01.11.2013 11:42, Mika Eloranta wrote:
>> pg_receivexlog calculated the xlog segment number incorrectly
>> when started after the previous instance was interrupted.
>>
>> Resuming streaming only worked when the physical wal segment
>> counter was zero, i.e. for the first 256 segments or so.
>
> Oops. Fixed, thanks for the report!
>
> It's a bit scary that this bug went unnoticed for this long; it was introduced quite early in the 9.3 development
cycle.Seems that I did all the testing of streaming timeline changes with pg_receivexlog later in 9.3 cycle with
segmentnumbers < 256, and no-one else have done long-running tests with pg_receivexlog either.
Thanks for the fix, Heikki!
It sounds like either PostgreSQL 9.3.x and/or pg_receivexlog is not yet used in a lot of places. Otherwise this
probablywould have been found earlier.
Affected versions:
$ git tag --contains dfda6eba
REL9_3_0
REL9_3_1
REL9_3_BETA1
REL9_3_BETA2
REL9_3_RC1
What makes this a really sneaky and severe problem is the way it stays dormant for a period of time after a fresh db
initor pg_upgrade. Here's how I bumped into it:
1. Old postgresql 9.2 db running, pg_receivexlog streaming extra backups to a remote box.
2. pg_upgrade to 9.3.1.
3. pg_receivexlog from the upgraded DB still works ok and handles restarts fine, because the xlog indexes were reset
backto zero at pg_upgrade.
4. xlog history eventually grows over 256 * 16MB.
5. pg_receivexlog gets interrupted for whatever reason (gets stopped, killed, crashes, host is restarted).
6. A new pg_receivexlog instance fails to resume streaming and there is no easy workaround that would maintain an
uninterrupted,gapless xlog history.
Initially, before I had analysed the problem any further, I had to stash the xlogs, restart pg_receivexlog and after
thattrigger new pg_basebackups.
Regardless of this bug, I find that pg_receivexlog (and pg_basebackup) are excellent tools and people should use them
more!
PS. something like "pg_receivexlog --start-pos=2D/15000000" might be nice for overriding the streaming start position.
--
Mika Eloranta
Ohmu Ltd. http://www.ohmu.fi/