Re: streaming header too small - Mailing list pgsql-hackers
From | Magnus Hagander |
---|---|
Subject | Re: streaming header too small |
Date | |
Msg-id | CABUevEzwipq1YCGsnBZKJCgJkJZDm8V2X=_UftNgYniQmmKjew@mail.gmail.com Whole thread Raw |
In response to | Re: streaming header too small (Selena Deckelmann <selena@chesnok.com>) |
List | pgsql-hackers |
On Wed, Feb 20, 2013 at 4:53 PM, Selena Deckelmann <selena@chesnok.com> wrote: > > > > On Wed, Feb 20, 2013 at 6:23 AM, Magnus Hagander <magnus@hagander.net> > wrote: >> >> >> On Feb 20, 2013 11:29 AM, "Heikki Linnakangas" <hlinnakangas@vmware.com> >> wrote: >> > >> > On 20.02.2013 02:11, Selena Deckelmann wrote: >> >> >> >> So, I just ran into a similar issue backing up a 9.2.1 server using >> >> pg_basebackup version 9.2.3: >> >> >> >> pg_basebackup: starting background WAL receiver >> >> pg_basebackup: streaming header too small: 25 >> >> >> >> >> >> I've had it happen two times in a row. I'm going to try again... >> >> >> >> But -- what would be helpful here? I can recompile pg_basebackup with >> >> more >> >> debugging... >> > >> > >> > Hmm, 25 bytes would be the size of the WAL data packet, if it contains >> > just the header and no actual WAL data. I think pg_basebackup should accept >> > that - it's not unreasonable that the server might send such a packet >> > sometimes. >> > >> > Looking at the walsender code, it's not supposed to ever send such a >> > packet. But I suspect there's one corner-case where it might: if the current >> > send location is at an xlogid boundary, so that we previously sent the last >> > byte from the last WAL segment in the previous logical xlog file, and the >> > WAL flush position points to byte 0 in the beginning of the new WAL file. >> > Both of those positions are in fact the same thing, but we have two >> > different ways to represent the same position. For example, if we've already >> > sent up to WAL position (sentPtr in walsender.c): >> > >> > xlogid = 4 >> > xrecoff = XLogFileSize >> > >> > and GetFlushRecPtr() returns: >> > >> > xlogid = 5 >> > xrecoff = 0 >> > >> > Those both point to the same position. But the check in XLogSend that >> > decides if there is any work to do uses XLByteLE() to check if they are >> > equal, and XLByteLE() treats the latter to be greater than the former. So, >> > in that situation, XLogSend() would decide that it has work to do, but there >> > actually isn't, so it would send 0 bytes of WAL data. >> > >> > I'm not sure how GetFlushRecPtr() could return such a position, though. >> > But I'm also not convinced that it can't happen. >> > >> > It would be fairly easy to fix walsender to not send anything in that >> > situation. It would also be easy to fix pg_basebackup to not treat it as an >> > error. We probably should do both. >> > >> > In 9.3, the XLogRecPtr representation changed so that there is only one >> > value for a boundary position like that, so this is a 9.2-only issue. > > > Thanks for the debugging, Heikki! > >> >> That does sound like a reasonable explanation and fix. Heck, probably >> enough to just put the fix in pg_basebackup since it's gone in 9.3 anyway. >> >> But I'd really like to confirm this is the actual situation before >> considering it fixed, since it's clearly very intermittent. >> >> Selena, was this reasonably reproducible for you? Would it be possible to >> get a network trace of it to show of that's the kind of package coming >> across, or by hacking up pg_basebackup to print the exact position it was at >> when the problem occurred? > > > This is happening with a very busy 700 GB system, so I'm going to rule out a > network trace out for the moment. The error is occurring "sometime" in the > middle of the backup. Last time it was at least 30-40 minutes into a 2 hr > backup. Yikes. Yeah, that doesn't seem like the best choice of method :) > I'll see about hacking up pg_basebackup. I'm doing nightly backups so > hopefully (?) it will happen again in the next few days. Thanks! -- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/
pgsql-hackers by date: