Thread: Re: ZeroFill(.../pg_xlog/xlogtemp.20148) failed: No such file or directory
Re: ZeroFill(.../pg_xlog/xlogtemp.20148) failed: No such file or directory
From
"Oliver Elphick"
Date:
I found the answer to this: the partition had filled up, and so the problem was lack of disk space. Could we have a more helpful error message? I was just looking in the wrong direction because of the contents of the message. *** postgresql-7.1.1.orig/src/backend/access/transam/xlog.c Tue May 22 16:45:14 2001 --- postgresql-7.1.1/src/backend/access/transam/xlog.c Tue May 22 16:48:12 2001*************** *** 1334,1340 **** unlink(tmppath); errno = save_errno; ! elog(STOP, "ZeroFill(%s) failed: %m", tmppath); } } --- 1334,1340 ---- unlink(tmppath); errno = save_errno; ! elog(STOP, "ZeroFill failed to create or write %s: %m", tmppath); } } -- Oliver Elphick Oliver.Elphick@lfix.co.uk Isle of Wight http://www.lfix.co.uk/oliver PGP: 1024R/32B8FAA1: 97 EA 1D 47 72 3F 28 47 6B 7E 39 CC 56 E4 C1 47 GPG: 1024D/3E1D0C1C: CA12 09E0 E8D5 8870 5839 932A 614D 4C34 3E1D 0C1C ======================================== "We are troubled on every side, yet not distressed; we are perplexed, but not in despair; persecuted, but not forsaken; cast down, but not destroyed; Always bearing about in the body the dying of the Lord Jesus, that the life also of Jesus might be made manifest in our body." II Corinthians 4:8-10
Re: Re: ZeroFill(.../pg_xlog/xlogtemp.20148) failed: No such file or directory
From
Bruce Momjian
Date:
Looks safe. Patch applied. > I found the answer to this: the partition had filled up, and so the problem > was lack of disk space. > > Could we have a more helpful error message? I was just looking in the > wrong direction because of the contents of the message. > > *** postgresql-7.1.1.orig/src/backend/access/transam/xlog.c Tue May 22 > 16:45:14 2001 > --- postgresql-7.1.1/src/backend/access/transam/xlog.c Tue May 22 16:48:12 > 2001*************** > *** 1334,1340 **** > unlink(tmppath); > errno = save_errno; > > ! elog(STOP, "ZeroFill(%s) failed: %m", tmppath); > } > } > > --- 1334,1340 ---- > unlink(tmppath); > errno = save_errno; > > ! elog(STOP, "ZeroFill failed to create or write %s: %m", tmppath); > } > } > > > -- > Oliver Elphick Oliver.Elphick@lfix.co.uk > Isle of Wight http://www.lfix.co.uk/oliver > PGP: 1024R/32B8FAA1: 97 EA 1D 47 72 3F 28 47 6B 7E 39 CC 56 E4 C1 47 > GPG: 1024D/3E1D0C1C: CA12 09E0 E8D5 8870 5839 932A 614D 4C34 3E1D 0C1C > ======================================== > "We are troubled on every side, yet not distressed; we > are perplexed, but not in despair; persecuted, but not > forsaken; cast down, but not destroyed; Always bearing > about in the body the dying of the Lord Jesus, that > the life also of Jesus might be made manifest in our > body." II Corinthians 4:8-10 > > > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
"Oliver Elphick" <olly@lfix.co.uk> writes: > I found the answer to this: the partition had filled up, and so the problem > was lack of disk space. > Could we have a more helpful error message? Indeed. I don't like your solution however, since it's just papering over the real problem which is lack of a suitable error code from write(). Evidently write() isn't setting errno as long as it's able to write at least some data. Perhaps we could do errno = 0; if (write(...) != expectedbytecount) { int save_errno = errno; unlink(tmp); errno = save_errno ? save_errno : ENOSPC; elog(...); } Comments? Is it reasonable to guess that the problem must be ENOSPC if write doesn't write all the bytes but also doesn't set errno? Are there any systems that don't define ENOSPC? regards, tom lane
Re: Re: ZeroFill(.../pg_xlog/xlogtemp.20148) failed: No such file or directory
From
Chris Jones
Date:
On Wed, May 23, 2001 at 12:50:46PM -0400, Tom Lane wrote: > errno = 0; > if (write(...) != expectedbytecount) > { > int save_errno = errno; > > unlink(tmp); > > errno = save_errno ? save_errno : ENOSPC; > > elog(...); > } > > Comments? Is it reasonable to guess that the problem must be ENOSPC > if write doesn't write all the bytes but also doesn't set errno? No, it could be any number of other things. The first that comes to mind is EINTR. How about something closer to: totalwritten = 0; while(totalwritten < expectedbytecount) { lastwritten = write(...); if(lastwritten == -1) { /* errno is guaranteed to be set */ if(errno == EINTR) { continue; } unlink(tmp); elog(...); break; } else if(lastwritten == 0) { /* errno should be 0. Considering this an error is probably a BAD idea. */ unlink(tmp); elog(...); break; } else { /* we got a partial write count. No problem; try again. */ totalwritten += lastwritten; } } Chris -- chris@mt.sri.com ----------------------------------------------------- Chris Jones SRI International, Inc. www.sri.com
Attachment
Chris Jones <chris@mt.sri.com> writes: > No, it could be any number of other things. The first that comes to > mind is EINTR. How about something closer to: Writes to disk files don't suffer EINTR as far as I've ever heard (if they do, there are an awful lot of broken programs out there). More to the point, a kernel that aborted a write because of an interrupt *and failed to set errno* would certainly be broken. The question is what to assume when we see that the write did not change errno. regards, tom lane
Re: Re: ZeroFill(.../pg_xlog/xlogtemp.20148) failed: No such file or directory
From
Chris Jones
Date:
On Wed, May 23, 2001 at 11:39:15AM -0600, Chris Jones wrote: > No, it could be any number of other things. The first that comes to > mind is EINTR. How about something closer to: Hmm. Actually, write(2) shouldn't return EINTR; it should return a short read count. But other errno values include EDQUOT and EFBIG. So the code I suggested is not very good, either. Better to just do: > totalwritten = 0; > while(totalwritten < expectedbytecount) { > lastwritten = write(...); > if(lastwritten == -1) { > /* errno is guaranteed to be set */ > unlink(tmp); > elog(...); > break; > } else { > /* we got a partial write count. No problem; try again. */ > totalwritten += lastwritten; > } > } Chris -- chris@mt.sri.com ----------------------------------------------------- Chris Jones SRI International, Inc. www.sri.com
Attachment
Re: Re: ZeroFill(.../pg_xlog/xlogtemp.20148) failed: No such file or directory
From
Chris Jones
Date:
On Wed, May 23, 2001 at 01:47:37PM -0400, Tom Lane wrote: > Chris Jones <chris@mt.sri.com> writes: > > No, it could be any number of other things. The first that comes to > > mind is EINTR. How about something closer to: > > Writes to disk files don't suffer EINTR as far as I've ever heard > (if they do, there are an awful lot of broken programs out there). Yeah, my mistake. > More to the point, a kernel that aborted a write because of an interrupt > *and failed to set errno* would certainly be broken. The question is > what to assume when we see that the write did not change errno. If write didn't return -1, it shouldn't have set errno. A short write count isn't an error condition. Chris -- chris@mt.sri.com ----------------------------------------------------- Chris Jones SRI International, Inc. www.sri.com
Attachment
Chris Jones <chris@mt.sri.com> writes: >> /* we got a partial write count. No problem; try again. */ >> totalwritten +=3D lastwritten; No. An infinite loop is NOT an acceptable response to running out of disk space. This is a disk file we are writing, not a socket. regards, tom lane
Chris Jones <chris@mt.sri.com> writes: > If write didn't return -1, it shouldn't have set errno. A short write > count isn't an error condition. On disk files it certainly is; there's no non-error reason to do that, and AFAICS no reason for the application to try again. regards, tom lane
Re: Re: ZeroFill(.../pg_xlog/xlogtemp.20148) failed: No such file or directory
From
Chris Jones
Date:
On Wed, May 23, 2001 at 02:18:31PM -0400, Tom Lane wrote: > No. An infinite loop is NOT an acceptable response to running out of > disk space. This is a disk file we are writing, not a socket. Ack. You're right, of course. Sorry for the noise. Chris -- chris@mt.sri.com ----------------------------------------------------- Chris Jones SRI International, Inc. www.sri.com
Attachment
Ian Lance Taylor <ian@airs.com> writes: > Probably true, but on Unix you certainly can't assume that write will > set errno if it does not return -1. Right. The code you propose is isomorphic to what I suggested originally. The question is which error condition should we assume if errno has not been set; is disk-full sufficiently likely to be the cause that we should just say that, or are there plausible alternatives? regards, tom lane
Re: Re: ZeroFill(.../pg_xlog/xlogtemp.20148) failed: No such file or directory
From
Ian Lance Taylor
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes: > Chris Jones <chris@mt.sri.com> writes: > > If write didn't return -1, it shouldn't have set errno. A short write > > count isn't an error condition. > > On disk files it certainly is; there's no non-error reason to do that, > and AFAICS no reason for the application to try again. Probably true, but on Unix you certainly can't assume that write will set errno if it does not return -1. On Linux systems, for example, this does not happen. As Chris says, Posix only promises to set errno if there is an error indication. The only error indication for write is a return of -1. A portable way to check whether errno was set would be to do something like errno = 0; if (write(...) != ...) { if (errno == 0) error("unexpected short write--disk full?") else error("write failed: %s", strerror(errno)); } Ian
Re: Re: ZeroFill(.../pg_xlog/xlogtemp.20148) failed: No such file or directory
From
"Denis A. Doroshenko"
Date:
On Wed, May 23, 2001 at 02:04:51PM -0400, Tom Lane wrote: > Chris Jones <chris@mt.sri.com> writes: > > If write didn't return -1, it shouldn't have set errno. A short write > > count isn't an error condition. > > On disk files it certainly is; there's no non-error reason to do that, > and AFAICS no reason for the application to try again. i've tried to get partial write on disk shortage condition and had no success. on OpenBSD, if there is no space write() seems to write the whole buffer or fail with -1/errno. i used such proggie attached to the and (owell, i'm not sure about forks, but it adds more simultaneosity... huh?). BTW. i didn't see anywhere i looked whetjer write on disk files can fail after writting some part of data. -- Denis A. Doroshenko [GPRS/IN/WAP, VAS group engineer] .-. _|_ | [Omnitel Ltd., T.Sevcenkos st. 25, Vilnius, Lithuania] | | _ _ _ .| _ | [Phone: +370 9863486 E-mail: d.doroshenko@omnitel.net] |_|| | || |||(/_|_ ---[a.c]------------------------------------------------------------ #include <err.h> #include <errno.h> #include <fcntl.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include <unistd.h> #define SIZ (12345) #define CHILDREN (5) #define FILE "/tmp/garbage.XXXXXXXXXX" int main (void) { char *buf; char *file; int fd, i, j, rc; warnx("[%d] allocating %d of memory", getpid(), SIZ); if ( (buf = malloc(SIZ)) == NULL ) err(1, "malloc()"); file = strdup(FILE); warnx("[%d] creating %s", getpid(), file); if ( (fd = mkstemp(file)) == -1 ) err(1, "open()"); warnx("[%d] forking...", getpid()); for ( j = 0; j < CHILDREN; j++ ) { if ( fork() == 0 ) { warnx("[%d:%d]: filling %s with junk", getppid(), j, file); for ( i = 0; ; i++ ) { if ( (rc = write(fd, buf, SIZ)) == -1 ) { warn("[%d:%d] write()", getppid(), j); break; } if ( rc == SIZ ) { (void)fputc(j + '0', stderr); continue; } warn("[%d:%d] write(%d written)", getppid(), j, rc); } (void)close(fd); return (0); } } /* father */ while ( (j = wait(&i)) != - 1 ) ; warnx("[%d] destroying %s", getpid(), file); (void)close(fd); (void)unlink(file); return (0); }
Re: Re: ZeroFill(.../pg_xlog/xlogtemp.20148) failed: No such file or directory
From
Ian Lance Taylor
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes: > Ian Lance Taylor <ian@airs.com> writes: > > Probably true, but on Unix you certainly can't assume that write will > > set errno if it does not return -1. > > Right. The code you propose is isomorphic to what I suggested > originally. The question is which error condition should we assume > if errno has not been set; is disk-full sufficiently likely to be the > cause that we should just say that, or are there plausible alternatives? Sufficiently likely? Dunno. I can think of some other possibilities. If the file is on a file system mounted via NFS or any other remote file system, you might get any number of errors. If there is a disk error after at least one disk block has been copied and written, the kernel might return a short count. If the kernel is severely overloaded, and fails to allocate a buffer after allocating and writing at least one buffer successfully, it might return a short count. If the file is very large, and the write would push it over the maximum file size, you might get a short count up to the maximum file size. A similar case might happen if the file is closed to the process resource limit (RLIMIT_FSIZE). I assume we can rule out cases like a write from a buffer at the end of user memory such that some data can be copied into kernel space and then a segmentation violation occurs--on some systems that could cause a short count if a full block can be written before the invalid memory is reached. Obviously a full disk is the most likely case. This is particularly true if the write is for less than a full disk block. But otherwise I could believe that at least the disk error case might happen to somebody someday. Ian
Re: Re: ZeroFill(.../pg_xlog/xlogtemp.20148) failed: No such file or directory
From
Ian Lance Taylor
Date:
"Denis A. Doroshenko" <d.doroshenko@omnitel.net> writes: > i've tried to get partial write on disk shortage condition and had no > success. on OpenBSD, if there is no space write() seems to write the > whole buffer or fail with -1/errno. i used such proggie attached to > the and (owell, i'm not sure about forks, but it adds more > simultaneosity... huh?). BTW. i didn't see anywhere i looked whetjer > write on disk files can fail after writting some part of data. Try writing more bytes in a single call to write(). Like, 100000 bytes or something. You will only get a short return from write() if you write more than the disk block size. On modern file systems the disk block size can get fairly large. Ian
Re: Re: ZeroFill(.../pg_xlog/xlogtemp.20148) failed: No such file or directory
From
"Denis A. Doroshenko"
Date:
On Wed, May 23, 2001 at 02:24:44PM -0700, Ian Lance Taylor wrote: > "Denis A. Doroshenko" <d.doroshenko@omnitel.net> writes: > > Try writing more bytes in a single call to write(). Like, 100000 > bytes or something. > > You will only get a short return from write() if you write more than > the disk block size. On modern file systems the disk block size can > get fairly large. the program i sent, had 800K blocks, but believe me, the first variant has been using 1M writes. the result was the same... -- Denis A. Doroshenko [GPRS/IN/WAP, VAS group engineer] .-. _|_ | [Omnitel Ltd., T.Sevcenkos st. 25, Vilnius, Lithuania] | | _ _ _ .| _ | [Phone: +370 9863486 E-mail: d.doroshenko@omnitel.net] |_|| | || |||(/_|_