Re: streaming header too small - Mailing list pgsql-hackers

From Magnus Hagander
Subject Re: streaming header too small
Date
Msg-id CABUevEz5b_=0gQoDy2MaeFpFcQznsqekU6DvCLF-XsnzytNDwQ@mail.gmail.com
Whole thread Raw
In response to Re: streaming header too small  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Responses Re: streaming header too small
List pgsql-hackers
<p dir="ltr"><br /> On Feb 20, 2013 11:29 AM, "Heikki Linnakangas" <<a
href="mailto:hlinnakangas@vmware.com">hlinnakangas@vmware.com</a>>wrote:<br /> ><br /> > On 20.02.2013 02:11,
SelenaDeckelmann wrote:<br /> >><br /> >> So, I just ran into a similar issue backing up a 9.2.1 server
using<br/> >> pg_basebackup version 9.2.3:<br /> >><br /> >> pg_basebackup: starting background WAL
receiver<br/> >> pg_basebackup: streaming header too small: 25<br /> >><br /> >><br /> >> I've
hadit happen two times in a row. I'm going to try again...<br /> >><br /> >> But -- what would be helpful
here?I can recompile pg_basebackup with more<br /> >> debugging...<br /> ><br /> ><br /> > Hmm, 25 bytes
wouldbe the size of the WAL data packet, if it contains just the header and no actual WAL data. I think pg_basebackup
shouldaccept that - it's not unreasonable that the server might send such a packet sometimes.<br /> ><br /> >
Lookingat the walsender code, it's not supposed to ever send such a packet. But I suspect there's one corner-case where
itmight: if the current send location is at an xlogid boundary, so that we previously sent the last byte from the last
WALsegment in the previous logical xlog file, and the WAL flush position points to byte 0 in the beginning of the new
WALfile. Both of those positions are in fact the same thing, but we have two different ways to represent the same
position.For example, if we've already sent up to WAL position (sentPtr in walsender.c):<br /> ><br /> > xlogid =
4<br/> > xrecoff = XLogFileSize<br /> ><br /> > and GetFlushRecPtr() returns:<br /> ><br /> > xlogid =
5<br/> > xrecoff = 0<br /> ><br /> > Those both point to the same position. But the check in XLogSend that
decidesif there is any work to do uses XLByteLE() to check if they are equal, and XLByteLE() treats the latter to be
greaterthan the former. So, in that situation, XLogSend() would decide that it has work to do, but there actually
isn't,so it would send 0 bytes of WAL data.<br /> ><br /> > I'm not sure how GetFlushRecPtr() could return such a
position,though. But I'm also not convinced that it can't happen.<br /> ><br /> > It would be fairly easy to fix
walsenderto not send anything in that situation. It would also be easy to fix pg_basebackup to not treat it as an
error.We probably should do both.<br /> ><br /> > In 9.3, the XLogRecPtr representation changed so that there is
onlyone value for a boundary position like that, so this is a 9.2-only issue.<br /><p dir="ltr">That does sound like a
reasonableexplanation and fix. Heck, probably enough to just put the fix in pg_basebackup since it's gone in 9.3
anyway.<p dir="ltr">But I'd really like to confirm this is the actual situation before considering it fixed, since it's
clearlyvery intermittent. <p dir="ltr">Selena, was this reasonably reproducible for you? Would it be possible to get a
networktrace of it to show of that's the kind of package coming across, or by hacking up pg_basebackup to print the
exactposition it was at when the problem occurred? <p dir="ltr">/Magnus <br /> 

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: FDW for PostgreSQL
Next
From: Peter Eisentraut
Date:
Subject: Re: Materialized views WIP patch