Thread: [PATCH] pg_receivexlog: fixed to work with logical segno > 0

[PATCH] pg_receivexlog: fixed to work with logical segno > 0

From
Mika Eloranta
Date:
pg_receivexlog calculated the xlog segment number incorrectly
when started after the previous instance was interrupted.

Resuming streaming only worked when the physical wal segment
counter was zero, i.e. for the first 256 segments or so.
---src/bin/pg_basebackup/pg_receivexlog.c | 2 +-1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/bin/pg_basebackup/pg_receivexlog.c b/src/bin/pg_basebackup/pg_receivexlog.c
index 031ec1a..6f9fcf4 100644
--- a/src/bin/pg_basebackup/pg_receivexlog.c
+++ b/src/bin/pg_basebackup/pg_receivexlog.c
@@ -171,7 +171,7 @@ FindStreamingStart(uint32 *tli)                    progname, dirent->d_name);
disconnect_and_exit(1);       }
 
-        segno = ((uint64) log) << 32 | seg;
+        segno = (((uint64) log) << 8) | seg;        /*         * Check that the segment has the right size, if it's
supposedto be
 
-- 
1.8.0.1




Re: [PATCH] pg_receivexlog: fixed to work with logical segno > 0

From
Heikki Linnakangas
Date:
On 01.11.2013 11:42, Mika Eloranta wrote:
> pg_receivexlog calculated the xlog segment number incorrectly
> when started after the previous instance was interrupted.
>
> Resuming streaming only worked when the physical wal segment
> counter was zero, i.e. for the first 256 segments or so.

Oops. Fixed, thanks for the report!

It's a bit scary that this bug went unnoticed for this long; it was 
introduced quite early in the 9.3 development cycle. Seems that I did 
all the testing of streaming timeline changes with pg_receivexlog later 
in 9.3 cycle with segment numbers < 256, and no-one else have done 
long-running tests with pg_receivexlog either.

- Heikki



Re: [PATCH] pg_receivexlog: fixed to work with logical segno > 0

From
Mika Eloranta
Date:
On Nov 4, 2013, at 11:06, Heikki Linnakangas wrote:
> On 01.11.2013 11:42, Mika Eloranta wrote:
>> pg_receivexlog calculated the xlog segment number incorrectly
>> when started after the previous instance was interrupted.
>>
>> Resuming streaming only worked when the physical wal segment
>> counter was zero, i.e. for the first 256 segments or so.
>
> Oops. Fixed, thanks for the report!
>
> It's a bit scary that this bug went unnoticed for this long; it was introduced quite early in the 9.3 development
cycle.Seems that I did all the testing of streaming timeline changes with pg_receivexlog later in 9.3 cycle with
segmentnumbers < 256, and no-one else have done long-running tests with pg_receivexlog either. 

Thanks for the fix, Heikki!

It sounds like either PostgreSQL 9.3.x and/or pg_receivexlog is not yet used in a lot of places. Otherwise this
probablywould have been found earlier. 

Affected versions:

$ git tag --contains dfda6eba
REL9_3_0
REL9_3_1
REL9_3_BETA1
REL9_3_BETA2
REL9_3_RC1

What makes this a really sneaky and severe problem is the way it stays dormant for a period of time after a fresh db
initor pg_upgrade. Here's how I bumped into it: 

1. Old postgresql 9.2 db running, pg_receivexlog streaming extra backups to a remote box.
2. pg_upgrade to 9.3.1.
3. pg_receivexlog from the upgraded DB still works ok and handles restarts fine, because the xlog indexes were reset
backto zero at pg_upgrade. 
4. xlog history eventually grows over 256 * 16MB.
5. pg_receivexlog gets interrupted for whatever reason (gets stopped, killed, crashes, host is restarted).
6. A new pg_receivexlog instance fails to resume streaming and there is no easy workaround that would maintain an
uninterrupted,gapless xlog history. 

Initially, before I had analysed the problem any further, I had to stash the xlogs, restart pg_receivexlog and after
thattrigger new pg_basebackups. 

Regardless of this bug, I find that pg_receivexlog (and pg_basebackup) are excellent tools and people should use them
more!

PS. something like "pg_receivexlog --start-pos=2D/15000000" might be nice for overriding the streaming start position.

--
Mika Eloranta
Ohmu Ltd.  http://www.ohmu.fi/