Thread: Short writes

Short writes

From
Peter Eisentraut
Date:
We are currently debugging a problem with error messages like this:

ERROR:  53100: could not write block 2427137 of relation 
1663/706306048/706314280: No space left on device

The device has plenty of space left.  The problem appears to be this 
curious code in src/backend/storage/file/fd.c:
/* if write didn't set errno, assume problem is no disk space */if (returnCode != amount && errno == 0)    errno =
ENOSPC;

What is the rationale for making this assumption?

We haven't yet figured out why the above error happens, but I suggest 
that we at least make a more accurate error message.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/


Re: Short writes

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> The device has plenty of space left.

Disk quota problem maybe?

> The problem appears to be this 
> curious code in src/backend/storage/file/fd.c:

>     /* if write didn't set errno, assume problem is no disk space */
>     if (returnCode != amount && errno == 0)
>         errno = ENOSPC;

> What is the rationale for making this assumption?

Because, in fact, that is the usual reason for a short write.  Do you
have something better for the code to do?
        regards, tom lane


Re: Short writes

From
Martijn van Oosterhout
Date:
On Tue, Nov 28, 2006 at 04:51:35PM +0100, Peter Eisentraut wrote:
> We are currently debugging a problem with error messages like this:
>
> ERROR:  53100: could not write block 2427137 of relation
> 1663/706306048/706314280: No space left on device
>
> The device has plenty of space left.  The problem appears to be this
> curious code in src/backend/storage/file/fd.c:
>
>     /* if write didn't set errno, assume problem is no disk space */
>     if (returnCode != amount && errno == 0)
>         errno = ENOSPC;
>
> What is the rationale for making this assumption?

Essentially, for files short writes are not supposed to happen. For
pipes and sockets it's fairly normal, but for files it's not expected
to ever happen.

Given that write() only has one return value, if all the data cannot be
written, it has to return the number of bytes written and can't return
the actual error.

What probably *should* happen is that a second write is initiated to
write the remainder of the block, at which point the system can say
"disk full".

> We haven't yet figured out why the above error happens, but I suggest
> that we at least make a more accurate error message.

It would be interesting to know what other causes there could be for
short writes.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Re: Short writes

From
Tom Lane
Date:
Martijn van Oosterhout <kleptog@svana.org> writes:
> What probably *should* happen is that a second write is initiated to
> write the remainder of the block, at which point the system can say
> "disk full".

You are hoping that the second write would return ENOSPC, but field
experience is that it just returns 0, and you still have to make the
assumption about why.  See archives from back around the time that
patch was put in.
        regards, tom lane


Re: Short writes

From
Peter Eisentraut
Date:
Tom Lane wrote:
> Because, in fact, that is the usual reason for a short write.  Do you
> have something better for the code to do?

Well, I would have liked to know the truth, such as

ERROR: short write on block blah
DETAIL: wrote %u bytes, %u requested
HINT: That might mean that the disk is full.

In our instance, there is no fullness problem.  (The errors repeat too 
inconsistently yet persistently for that.)  Probably, the storage 
device is improperly mounted (which could become the second half of the 
hint), but I have to wait for test results.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/


Re: Short writes

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> Tom Lane wrote:
>> Because, in fact, that is the usual reason for a short write.  Do you
>> have something better for the code to do?

> Well, I would have liked to know the truth, such as

> ERROR: short write on block blah
> DETAIL: wrote %u bytes, %u requested
> HINT: That might mean that the disk is full.

There isn't any way for mdwrite() to return that much information with
the current smgr.c-to-md.c API.  On the other hand, that API has no very
good reason to live --- since smgr.c is just going to elog(ERROR), we
might as well allow md.c to do so, and make the functions return void.

I'll see about doing that in 8.3; I was going to need some change anyway
to properly report read-past-EOF as an error in mdread(), and this seems
better than kluging it.
        regards, tom lane


Re: Short writes

From
tomas@tuxteam.de
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Tue, Nov 28, 2006 at 05:05:02PM +0100, Martijn van Oosterhout wrote:
> On Tue, Nov 28, 2006 at 04:51:35PM +0100, Peter Eisentraut wrote:
> > We are currently debugging a problem with error messages like this:
> > 
> > ERROR:  53100: could not write block 2427137 of relation 
> > 1663/706306048/706314280: No space left on device
[...]
> It would be interesting to know what other causes there could be for
> short writes.

Interrupted system call?

[Diclaimer: I assume provisions for that are taken, I just don't know
the code around that spot and am just offering an answer to the above
question]

The problem arises from the fact that errno is only guaranteed to be set
on a -1 return value. It'd be nice to have errno set on a short write
too.

So the "right" answer might be to retry a write on a short write and only
to bail out in the <=0 case (raising an "unspecified error" in the 0
case). Ugh.

Regards
- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFFbVkdBcgs9XrR2kYRAv/nAJ0WcmWxTFAvyikYghngtzrj98Jn8QCfe4Pb
S6dsxcgnyXeqpj4F7VDCeZs=
=9WIa
-----END PGP SIGNATURE-----



Re: Short writes

From
Martijn van Oosterhout
Date:
On Wed, Nov 29, 2006 at 09:55:41AM +0000, tomas@tuxteam.de wrote:
> > It would be interesting to know what other causes there could be for
> > short writes.
>
> Interrupted system call?
>
> [Diclaimer: I assume provisions for that are taken, I just don't know
> the code around that spot and am just offering an answer to the above
> question]

Seems unlikely. Under BSD signal semantics (which PostgreSQL uses),
there is no such thing as an "interrupted system call". When a signal
happens, the system is supposed to restart the system call
automatically.

If this were a problem, we'd have seen it long before now I think.

> The problem arises from the fact that errno is only guaranteed to be set
> on a -1 return value. It'd be nice to have errno set on a short write
> too.

On return from a raw system call the there only one value. If >=0,
that's the return value. If <0, then errno is set to -result and -1 is
returned to the app. So you see, what you're suggesting isn't possible
without a completely different way to doing system calls.

Other interfaces, like async I/O have request blocks and can return
both an error status and a number of bytes.

> So the "right" answer might be to retry a write on a short write and only
> to bail out in the <=0 case (raising an "unspecified error" in the 0
> case). Ugh.

Possibly, but it'd still be nice to know what is causing the failure if
it's not disk full.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Re: Short writes

From
tomas@tuxteam.de
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Wed, Nov 29, 2006 at 11:48:27AM +0100, Martijn van Oosterhout wrote:
> On Wed, Nov 29, 2006 at 09:55:41AM +0000, tomas@tuxteam.de wrote:
> > > It would be interesting to know what other causes there could be for
> > > short writes.
> > 
> > Interrupted system call?
> > 
> > [Diclaimer: I assume provisions for that are taken, I just don't know
> > the code around that spot and am just offering an answer to the above
> > question]
> 
> Seems unlikely. Under BSD signal semantics (which PostgreSQL uses),
> there is no such thing as an "interrupted system call". When a signal
> happens, the system is supposed to restart the system call
> automatically.

I have hazy memories of SA_RESTART not being totally reliable, but I
can't come up with hard data. Maybe the memories (or my storage media ;)
are outdated.

> On return from a raw system call the there only one value. If >=0,
> that's the return value. If <0, then errno is set to -result and -1 is
> returned to the app. So you see, what you're suggesting isn't possible
> without a completely different way to doing system calls.

...or just setting errno whenever the result is smaller than the
requested length (aka short). This isn't really forbidden.

> Possibly, but it'd still be nice to know what is causing the failure if
> it's not disk full.

You'll expect a -1 on teh second attempt, and thusly a meaningful errno
(although I've heard of cases where you just get 0 on disk full: how
disgusting).

Regards
- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFFbp9iBcgs9XrR2kYRAp0iAJ99ckB7sCHh39IJCdkq1VoHZs083gCfTH1I
YgmdAjs4mkrqgqtTsXdGOV0=
=cn7S
-----END PGP SIGNATURE-----



Re: Short writes

From
Peter Eisentraut
Date:
I wrote:
> We are currently debugging a problem with error messages like this:
>
> ERROR:  53100: could not write block 2427137 of relation
> 1663/706306048/706314280: No space left on device

For those scoring along at home, the problem was an NFS partition 
mounted with the "intr" option.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/


Re: Short writes

From
Martijn van Oosterhout
Date:
On Thu, Nov 30, 2006 at 01:59:31PM +0100, Peter Eisentraut wrote:
> I wrote:
> > We are currently debugging a problem with error messages like this:
> >
> > ERROR:  53100: could not write block 2427137 of relation
> > 1663/706306048/706314280: No space left on device
>
> For those scoring along at home, the problem was an NFS partition
> mounted with the "intr" option.

So it was a signal interrupting the write? Was it a user-controlled
signal, or one generated by postgres itself?

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.