basebackup.c's sendFile() ignores read errors - Mailing list pgsql-hackers

From Robert Haas
Subject basebackup.c's sendFile() ignores read errors
Date
Msg-id CA+TgmobG4ywMzL5oQq2a8YKp8x2p3p1LOMMcGqpS7aekT9+ETA@mail.gmail.com
Whole thread Raw
Responses Re: basebackup.c's sendFile() ignores read errors  (Jeevan Chalke <jeevan.chalke@enterprisedb.com>)
List pgsql-hackers
While reviewing a proposed patch to basebackup.c this morning, I found
myself a bit underwhelmed by the quality of the code and comments in
basebackup.c's sendFile(). I believe it's already been pointed out
that the the retry logic here is not particularly correct, and the
comments demonstrate a pretty clear lack of understanding of how the
actual race conditions that are possible here might operate.

However, I then noticed another problem which I think is significantly
more serious: this code totally ignores the possibility of a failure
in fread().  It just assumes that if fread() fails to return a
positive value, it's reached the end of the file, and if that happens,
it just pads out the rest of the file with zeroes.  So it looks to me
like if fread() encountered, say, an I/O error while trying to read a
data file, and if that error were on the first data block, then the
entire contents of that file would be replaced with zero bytes in the
backup, and no error or warning of any kind would be given to the
user.  If it hit the error later, everything from the point of the
error onward would just get replaced with zero bytes.  To be clear, I
think it is fine for basebackup.c to fill out the rest of the expected
length with zeroes if in fact the file has been truncated during the
backup; recovery should fix it.  But recovery is not going to fix data
that got eaten because EIO was encountered while copying a file.

The logic that rereads a block appears to have the opposite problem.
Here, it will detect an error, but it will also fail in the face of a
concurrent truncation, which it shouldn't.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Paul A Jungwirth
Date:
Subject: Re: range bug in resolve_generic_type?
Next
From: Robert Haas
Date:
Subject: Re: block-level incremental backup