Thread: Short writes
We are currently debugging a problem with error messages like this: ERROR: 53100: could not write block 2427137 of relation 1663/706306048/706314280: No space left on device The device has plenty of space left. The problem appears to be this curious code in src/backend/storage/file/fd.c: /* if write didn't set errno, assume problem is no disk space */if (returnCode != amount && errno == 0) errno = ENOSPC; What is the rationale for making this assumption? We haven't yet figured out why the above error happens, but I suggest that we at least make a more accurate error message. -- Peter Eisentraut http://developer.postgresql.org/~petere/
Peter Eisentraut <peter_e@gmx.net> writes: > The device has plenty of space left. Disk quota problem maybe? > The problem appears to be this > curious code in src/backend/storage/file/fd.c: > /* if write didn't set errno, assume problem is no disk space */ > if (returnCode != amount && errno == 0) > errno = ENOSPC; > What is the rationale for making this assumption? Because, in fact, that is the usual reason for a short write. Do you have something better for the code to do? regards, tom lane
On Tue, Nov 28, 2006 at 04:51:35PM +0100, Peter Eisentraut wrote: > We are currently debugging a problem with error messages like this: > > ERROR: 53100: could not write block 2427137 of relation > 1663/706306048/706314280: No space left on device > > The device has plenty of space left. The problem appears to be this > curious code in src/backend/storage/file/fd.c: > > /* if write didn't set errno, assume problem is no disk space */ > if (returnCode != amount && errno == 0) > errno = ENOSPC; > > What is the rationale for making this assumption? Essentially, for files short writes are not supposed to happen. For pipes and sockets it's fairly normal, but for files it's not expected to ever happen. Given that write() only has one return value, if all the data cannot be written, it has to return the number of bytes written and can't return the actual error. What probably *should* happen is that a second write is initiated to write the remainder of the block, at which point the system can say "disk full". > We haven't yet figured out why the above error happens, but I suggest > that we at least make a more accurate error message. It would be interesting to know what other causes there could be for short writes. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Martijn van Oosterhout <kleptog@svana.org> writes: > What probably *should* happen is that a second write is initiated to > write the remainder of the block, at which point the system can say > "disk full". You are hoping that the second write would return ENOSPC, but field experience is that it just returns 0, and you still have to make the assumption about why. See archives from back around the time that patch was put in. regards, tom lane
Tom Lane wrote: > Because, in fact, that is the usual reason for a short write. Do you > have something better for the code to do? Well, I would have liked to know the truth, such as ERROR: short write on block blah DETAIL: wrote %u bytes, %u requested HINT: That might mean that the disk is full. In our instance, there is no fullness problem. (The errors repeat too inconsistently yet persistently for that.) Probably, the storage device is improperly mounted (which could become the second half of the hint), but I have to wait for test results. -- Peter Eisentraut http://developer.postgresql.org/~petere/
Peter Eisentraut <peter_e@gmx.net> writes: > Tom Lane wrote: >> Because, in fact, that is the usual reason for a short write. Do you >> have something better for the code to do? > Well, I would have liked to know the truth, such as > ERROR: short write on block blah > DETAIL: wrote %u bytes, %u requested > HINT: That might mean that the disk is full. There isn't any way for mdwrite() to return that much information with the current smgr.c-to-md.c API. On the other hand, that API has no very good reason to live --- since smgr.c is just going to elog(ERROR), we might as well allow md.c to do so, and make the functions return void. I'll see about doing that in 8.3; I was going to need some change anyway to properly report read-past-EOF as an error in mdread(), and this seems better than kluging it. regards, tom lane
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Tue, Nov 28, 2006 at 05:05:02PM +0100, Martijn van Oosterhout wrote: > On Tue, Nov 28, 2006 at 04:51:35PM +0100, Peter Eisentraut wrote: > > We are currently debugging a problem with error messages like this: > > > > ERROR: 53100: could not write block 2427137 of relation > > 1663/706306048/706314280: No space left on device [...] > It would be interesting to know what other causes there could be for > short writes. Interrupted system call? [Diclaimer: I assume provisions for that are taken, I just don't know the code around that spot and am just offering an answer to the above question] The problem arises from the fact that errno is only guaranteed to be set on a -1 return value. It'd be nice to have errno set on a short write too. So the "right" answer might be to retry a write on a short write and only to bail out in the <=0 case (raising an "unspecified error" in the 0 case). Ugh. Regards - -- tomás -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) iD8DBQFFbVkdBcgs9XrR2kYRAv/nAJ0WcmWxTFAvyikYghngtzrj98Jn8QCfe4Pb S6dsxcgnyXeqpj4F7VDCeZs= =9WIa -----END PGP SIGNATURE-----
On Wed, Nov 29, 2006 at 09:55:41AM +0000, tomas@tuxteam.de wrote: > > It would be interesting to know what other causes there could be for > > short writes. > > Interrupted system call? > > [Diclaimer: I assume provisions for that are taken, I just don't know > the code around that spot and am just offering an answer to the above > question] Seems unlikely. Under BSD signal semantics (which PostgreSQL uses), there is no such thing as an "interrupted system call". When a signal happens, the system is supposed to restart the system call automatically. If this were a problem, we'd have seen it long before now I think. > The problem arises from the fact that errno is only guaranteed to be set > on a -1 return value. It'd be nice to have errno set on a short write > too. On return from a raw system call the there only one value. If >=0, that's the return value. If <0, then errno is set to -result and -1 is returned to the app. So you see, what you're suggesting isn't possible without a completely different way to doing system calls. Other interfaces, like async I/O have request blocks and can return both an error status and a number of bytes. > So the "right" answer might be to retry a write on a short write and only > to bail out in the <=0 case (raising an "unspecified error" in the 0 > case). Ugh. Possibly, but it'd still be nice to know what is causing the failure if it's not disk full. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Wed, Nov 29, 2006 at 11:48:27AM +0100, Martijn van Oosterhout wrote: > On Wed, Nov 29, 2006 at 09:55:41AM +0000, tomas@tuxteam.de wrote: > > > It would be interesting to know what other causes there could be for > > > short writes. > > > > Interrupted system call? > > > > [Diclaimer: I assume provisions for that are taken, I just don't know > > the code around that spot and am just offering an answer to the above > > question] > > Seems unlikely. Under BSD signal semantics (which PostgreSQL uses), > there is no such thing as an "interrupted system call". When a signal > happens, the system is supposed to restart the system call > automatically. I have hazy memories of SA_RESTART not being totally reliable, but I can't come up with hard data. Maybe the memories (or my storage media ;) are outdated. > On return from a raw system call the there only one value. If >=0, > that's the return value. If <0, then errno is set to -result and -1 is > returned to the app. So you see, what you're suggesting isn't possible > without a completely different way to doing system calls. ...or just setting errno whenever the result is smaller than the requested length (aka short). This isn't really forbidden. > Possibly, but it'd still be nice to know what is causing the failure if > it's not disk full. You'll expect a -1 on teh second attempt, and thusly a meaningful errno (although I've heard of cases where you just get 0 on disk full: how disgusting). Regards - -- tomás -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) iD8DBQFFbp9iBcgs9XrR2kYRAp0iAJ99ckB7sCHh39IJCdkq1VoHZs083gCfTH1I YgmdAjs4mkrqgqtTsXdGOV0= =cn7S -----END PGP SIGNATURE-----
I wrote: > We are currently debugging a problem with error messages like this: > > ERROR: 53100: could not write block 2427137 of relation > 1663/706306048/706314280: No space left on device For those scoring along at home, the problem was an NFS partition mounted with the "intr" option. -- Peter Eisentraut http://developer.postgresql.org/~petere/
On Thu, Nov 30, 2006 at 01:59:31PM +0100, Peter Eisentraut wrote: > I wrote: > > We are currently debugging a problem with error messages like this: > > > > ERROR: 53100: could not write block 2427137 of relation > > 1663/706306048/706314280: No space left on device > > For those scoring along at home, the problem was an NFS partition > mounted with the "intr" option. So it was a signal interrupting the write? Was it a user-controlled signal, or one generated by postgres itself? Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.