Thread: Large file support available
Large file support is now compiled by default if available. (Use --disable-largefile to turn it off. That's what Autoconf gives us.) But: The zlib library uses unsigned ints and unsigned longs for file positions and offsets. Depending on how that is used in detail and depending on how zlib itself is compiled, this may or may not work. The tar file format (POSIX and traditional) has an inherent limitation on the size of the member files of 2^33 bytes (pg_dump currently only handles 2^30). The result in that case continues to be a broken archive. The GNU tar format has an extension that would handle 2^89 bytes. This may be something interesting to work on. -- Peter Eisentraut peter_e@gmx.net
Peter Eisentraut <peter_e@gmx.net> writes: > Large file support is now compiled by default if available. I am now getting (on HPUX 10.20) /usr/include/sys/resource.h: In function `getrlimit': /usr/include/sys/resource.h:168: warning: implicit declaration of function `__getrlimit64' /usr/include/sys/resource.h: In function `setrlimit': /usr/include/sys/resource.h:170: warning: implicit declaration of function `__setrlimit64' for essentially every file in the system. A little digging shows that this is happening because _FILE64 is defined and _LARGEFILE64_SOURCE is not; this is evidently a Bad Idea on HPUX. Further digging shows that noplace in the standard headers is _LARGEFILE64_SOURCE #define'd, so evidently one is supposed to supply it from user headers. Ugh. Please add this to the list of platform-specific symbols that had better be turned on to support large files. regards, tom lane
Also, even with configure --disable-largefile, I find that pg_config.h still contains /* Define to 1 to make fseeko visible on some hosts. */ #define _LARGEFILE_SOURCE 1 /* Define to 1 if fseeko (and presumably ftello) exists and is declared. */ #define HAVE_FSEEKO 1 This strikes me as probably Not a Good Thing, although I haven't dug to see what the implications are. regards, tom lane
> Large file support is now compiled by default if available. (Use > --disable-largefile to turn it off. That's what Autoconf gives us.) Are you sure that backend gains more performance than 1GB segmented file (I mean large file support turn on LET_OS_MANAGE_FILESIZE)? I myself have not tried yet, but a linux kernel hacker around me gave this question sometime ago. -- Tatsuo Ishii
Tatsuo Ishii writes: > Are you sure that backend gains more performance than 1GB segmented > file (I mean large file support turn on LET_OS_MANAGE_FILESIZE)? No idea. My change only enables access to large files, it doesn't change the segmentation logic in the backend. The main use at this point is for pg_dump-related activities. In fact, while the large file support API can handle 64-bit offsets, its availability and use don't guarantee that the file system will support any particular file size. So the segmentation logic in the backend isn't going anywhere, as far as I'm concerned. -- Peter Eisentraut peter_e@gmx.net
Tom Lane writes: > Also, even with configure --disable-largefile, I find that pg_config.h > still contains > > /* Define to 1 to make fseeko visible on some hosts. */ > #define _LARGEFILE_SOURCE 1 > > /* Define to 1 if fseeko (and presumably ftello) exists and is declared. */ > #define HAVE_FSEEKO 1 > > This strikes me as probably Not a Good Thing, although I haven't dug to > see what the implications are. This is harmless (until proven otherwise). fseeko() is identical to fseek() except that the offset argument uses off_t, and _LARGEFILE_SOURCE makes fseeko() and friends visible in the headers. That's all. No large files involved. -- Peter Eisentraut peter_e@gmx.net
Tom Lane writes: > /usr/include/sys/resource.h: In function `getrlimit': > /usr/include/sys/resource.h:168: warning: implicit declaration of function `__getrlimit64' > /usr/include/sys/resource.h: In function `setrlimit': > /usr/include/sys/resource.h:170: warning: implicit declaration of function `__setrlimit64' > > for essentially every file in the system. A little digging shows that > this is happening because _FILE64 is defined and _LARGEFILE64_SOURCE > is not; this is evidently a Bad Idea on HPUX. You're supposed to define _LARGEFILE64_SOURCE if you want to use functions like open64(), fseek64(), getrlimit64(), etc. in your source. We don't want those, obviously. What is happening here is that evidently the system headers effectively redefine getrlimit() to point to getrlimit64() if FILE_OFFSET_BITS=64, which is the usual strategy for all the I/O functions. But you're not supposed to have to define _LARGEFILE64_SOURCE for this, because the change is supposed to be transparent. If the {s|g}etrlimit warnings are indeed the only ones (i.e., none about open, fseek, write, read, etc.) then this is either a bug or there's something wrong in the include file order or something like that. Which way is sys/resource.h included anyway? If there's no way to fix it then we can add a definition of _LARGEFILE64_SOURCE to hpux.h and consider further action. -- Peter Eisentraut peter_e@gmx.net
Peter Eisentraut wrote: > Tom Lane writes: > > > Also, even with configure --disable-largefile, I find that pg_config.h > > still contains > > > > /* Define to 1 to make fseeko visible on some hosts. */ > > #define _LARGEFILE_SOURCE 1 > > > > /* Define to 1 if fseeko (and presumably ftello) exists and is declared. */ > > #define HAVE_FSEEKO 1 > > > > This strikes me as probably Not a Good Thing, although I haven't dug to > > see what the implications are. > > This is harmless (until proven otherwise). fseeko() is identical to > fseek() except that the offset argument uses off_t, and _LARGEFILE_SOURCE > makes fseeko() and friends visible in the headers. That's all. No large > files involved. I am confused. fseeko() doesn't look standard to me. I though fgetpos/fsetpos() where the standard interfaces for large file support; from BSD/OS: The fgetpos(), fsetpos(), fseek(), ftell(), and rewind() functions con- form to ANSI C X3.159-1989 (``ANSI C ''). -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Peter Eisentraut <peter_e@gmx.net> writes: > If the {s|g}etrlimit warnings are indeed the only ones (i.e., none about > open, fseek, write, read, etc.) then this is either a bug or there's > something wrong in the include file order or something like that. No such luck. Here's a more complete excerpt of one typical failure: gcc -O1 -Wall -Wmissing-prototypes -Wmissing-declarations -g -I../../../../src/include -c -o tuptoaster.o tuptoaster.c In file included from /usr/include/sys/wait.h:83, from /usr/local/lib/gcc-lib/hppa2.0-hp-hpux10.20/2.95.3/include/stdlib.h:231, from ../../../../src/include/c.h:56, from ../../../../src/include/postgres.h:47, from tuptoaster.c:25: /usr/include/sys/resource.h: In function `getrlimit': /usr/include/sys/resource.h:168: warning: implicit declaration of function `__getrlimit64' /usr/include/sys/resource.h: In function `setrlimit': /usr/include/sys/resource.h:170: warning: implicit declaration of function `__setrlimit64' In file included from /usr/include/unistd.h:11, from tuptoaster.c:27: /usr/include/sys/unistd.h: In function `truncate': /usr/include/sys/unistd.h:539: warning: implicit declaration of function `__truncate64' /usr/include/sys/unistd.h: In function `prealloc': /usr/include/sys/unistd.h:543: warning: implicit declaration of function `__prealloc64' /usr/include/sys/unistd.h: In function `lockf': /usr/include/sys/unistd.h:544: warning: implicit declaration of function `__lockf64' /usr/include/sys/unistd.h: In function `ftruncate': /usr/include/sys/unistd.h:545: warning: implicit declaration of function `__ftruncate64' In file included from /usr/include/fcntl.h:9, from tuptoaster.c:28: /usr/include/sys/fcntl.h: In function `open': /usr/include/sys/fcntl.h:216: warning: implicit declaration of function `__open64' /usr/include/sys/fcntl.h: In function `creat': /usr/include/sys/fcntl.h:217: warning: implicit declaration of function `__creat64' AFAICT a *lot* of HPUX headers expect you to #define _LARGEFILE64_SOURCE if you want this stuff to work. regards, tom lane
Peter, I have received no reply to this question. --------------------------------------------------------------------------- Bruce Momjian wrote: > Peter Eisentraut wrote: > > Tom Lane writes: > > > > > Also, even with configure --disable-largefile, I find that pg_config.h > > > still contains > > > > > > /* Define to 1 to make fseeko visible on some hosts. */ > > > #define _LARGEFILE_SOURCE 1 > > > > > > /* Define to 1 if fseeko (and presumably ftello) exists and is declared. */ > > > #define HAVE_FSEEKO 1 > > > > > > This strikes me as probably Not a Good Thing, although I haven't dug to > > > see what the implications are. > > > > This is harmless (until proven otherwise). fseeko() is identical to > > fseek() except that the offset argument uses off_t, and _LARGEFILE_SOURCE > > makes fseeko() and friends visible in the headers. That's all. No large > > files involved. > > I am confused. fseeko() doesn't look standard to me. I though > fgetpos/fsetpos() where the standard interfaces for large file support; > from BSD/OS: > > The fgetpos(), fsetpos(), fseek(), ftell(), and rewind() functions con- > form to ANSI C X3.159-1989 (``ANSI C ''). > > -- > Bruce Momjian | http://candle.pha.pa.us > pgman@candle.pha.pa.us | (610) 359-1001 > + If your life is a hard drive, | 13 Roberts Road > + Christ can be your backup. | Newtown Square, Pennsylvania 19073 > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
OK, with no one replying to this, I will take it upon myself to resolve this. According to the Mac OSX fseek() manual page: The fgetpos(), fsetpos(), fseek(), ftell(), and rewind() functions conform to ANSI X3.159-1989 (``ANSI C''). The fseeko() and ftello() functions conform to Version 2 of the Single UNIX Specification (``SUSv2''). which basically says that we should be using fseek or preferably fseekpos, not fseeko. I realize that the advantage of fseeko is that it has the same API as fseek, but if we are going to fix this, we may as well do it right and use fgetpos if we have it. Is there anyone who has fseeko() but _not_ fsetpos()? --------------------------------------------------------------------------- Bruce Momjian wrote: > > Peter, I have received no reply to this question. > > --------------------------------------------------------------------------- > > Bruce Momjian wrote: > > Peter Eisentraut wrote: > > > Tom Lane writes: > > > > > > > Also, even with configure --disable-largefile, I find that pg_config.h > > > > still contains > > > > > > > > /* Define to 1 to make fseeko visible on some hosts. */ > > > > #define _LARGEFILE_SOURCE 1 > > > > > > > > /* Define to 1 if fseeko (and presumably ftello) exists and is declared. */ > > > > #define HAVE_FSEEKO 1 > > > > > > > > This strikes me as probably Not a Good Thing, although I haven't dug to > > > > see what the implications are. > > > > > > This is harmless (until proven otherwise). fseeko() is identical to > > > fseek() except that the offset argument uses off_t, and _LARGEFILE_SOURCE > > > makes fseeko() and friends visible in the headers. That's all. No large > > > files involved. > > > > I am confused. fseeko() doesn't look standard to me. I though > > fgetpos/fsetpos() where the standard interfaces for large file support; > > from BSD/OS: > > > > The fgetpos(), fsetpos(), fseek(), ftell(), and rewind() functions con- > > form to ANSI C X3.159-1989 (``ANSI C ''). > > > > -- > > Bruce Momjian | http://candle.pha.pa.us > > pgman@candle.pha.pa.us | (610) 359-1001 > > + If your life is a hard drive, | 13 Roberts Road > > + Christ can be your backup. | Newtown Square, Pennsylvania 19073 > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 4: Don't 'kill -9' the postmaster > > > > -- > Bruce Momjian | http://candle.pha.pa.us > pgman@candle.pha.pa.us | (610) 359-1001 > + If your life is a hard drive, | 13 Roberts Road > + Christ can be your backup. | Newtown Square, Pennsylvania 19073 > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Is there anyone who has fseeko() but _not_ fsetpos()? AFAICT this is completely irrelevant to large files. regards, tom lane
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Is there anyone who has fseeko() but _not_ fsetpos()? > > AFAICT this is completely irrelevant to large files. Please explain. At: http://man.dnswatch.com/cgi-bin/htmlman?fseek+3 I see: The fseeko() function is identical to fseek(), except it takes an off_t argument instead of a long. Likewise, theftello() function is identical to ftell(), except it returns an off_t. while fsetpos() is: fsetpos(FILE *stream, const fpos_t *pos); and presumably fpos_t handles long files too. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Bruce Momjian <pgman@candle.pha.pa.us> writes: > I see: > The fseeko() function is identical to fseek(), except it takes an off_t > argument instead of a long. Likewise, the ftello() function is identical > to ftell(), except it returns an off_t. Indeed. Notice the complete lack of any commitment about the size of off_t ... > while fsetpos() is: > fsetpos(FILE *stream, const fpos_t *pos); ... or the size of fpos_t. You might find it illuminating to read this random extract from the HPUX 10.20 man pages: NAME fgetpos64(), fopen64(), freopen64(), fseeko64(), fsetpos64(), fstatvfsdev64(), ftello64(), ftw64(), nftw64(),statvfsdev64(), tmpfile64() - non-POSIX standard API interfaces to support large files. DESCRIPTION New API's to support large files. These API interfaces are not a part of the POSIX standard and may beremoved in the future. fgetpos64() The fgetpos64() function is identical to fgetpos() except that fgetpos64()returns the position in a fpos64_t instead of a fpos_t. All other functional behaviors, returns, and errors are identical. ... etc ... I don't see any reason to believe that fgetpos buys us anything but notational inconvenience. It certainly doesn't buy large file support, at least not without the same behind-the-scenes redefinitions needed for fseek/fseeko and friends... regards, tom lane
Bruce Momjian wrote: >OK, with no one replying to this, I will take it upon myself to resolve >this. According to the Mac OSX fseek() manual page: > > The fgetpos(), fsetpos(), fseek(), ftell(), and rewind() functions > conform to ANSI X3.159-1989 (``ANSI C''). > > The fseeko() and ftello() functions conform to Version 2 of the > Single UNIX Specification (``SUSv2''). > > I might be veering *slightly* off the topic here, but since I got bitten by this recently I thought I would mention it: On Linux, and found that I needed <#include asm/fcntl.h> instead of <#include fcntl.h> when using lseek. I had expected defining _FILE_OFFSET_BITS=64 to sort this (which it did not). I think that this will only be an issue if folk want relation files to be chunked at > 2G (or want to define LET_OS_MANAGE_FILES). best wishes Mark
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > I see: > > The fseeko() function is identical to fseek(), except it takes an off_t > > argument instead of a long. Likewise, the ftello() function is identical > > to ftell(), except it returns an off_t. > > Indeed. Notice the complete lack of any commitment about the size of > off_t ... > > > while fsetpos() is: > > fsetpos(FILE *stream, const fpos_t *pos); > > ... or the size of fpos_t. > > You might find it illuminating to read this random extract from the > HPUX 10.20 man pages: ... > I don't see any reason to believe that fgetpos buys us anything but > notational inconvenience. It certainly doesn't buy large file support, > at least not without the same behind-the-scenes redefinitions needed for > fseek/fseeko and friends... Clearly there is the issues that fseek uses long, which isn't enough for large file support. On BSD/OS, we have fsetpos, which is the way we do large file support: int fseek(FILE *stream, long offset, int whence); int fsetpos(FILE *stream, const fpos_t *pos); My point is that it seems fsetpos is the approved way of accessing large files, rather than fseeko. In fact, I don't have fseeko here but I do have fsetpos, and it does handle large files because my includes have this:typedef off_t fpos_ttypedef quad_t off_t; -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Bruce Momjian writes: > My point is that it seems fsetpos is the approved way of accessing large > files, rather than fseeko. In fact, I don't have fseeko here but I do > have fsetpos, and it does handle large files because my includes have > this: > > typedef off_t fpos_t > typedef quad_t off_t; Interesting. In general, you can't rely on fpos_t being an integral type, which indeed on my machine it isn't. But for pg_dump we need an integral type because we do offset arithmetic. -- Peter Eisentraut peter_e@gmx.net
Peter Eisentraut wrote: > Bruce Momjian writes: > > > My point is that it seems fsetpos is the approved way of accessing large > > files, rather than fseeko. In fact, I don't have fseeko here but I do > > have fsetpos, and it does handle large files because my includes have > > this: > > > > typedef off_t fpos_t > > typedef quad_t off_t; > > Interesting. In general, you can't rely on fpos_t being an integral type, > which indeed on my machine it isn't. But for pg_dump we need an integral > type because we do offset arithmetic. Oh, is that why fsetpos is always SEEK_SET and not SET_CURR or offset stuff. Strange I don't have fseeko and do have large file support. BSD/OS has had it for years. I guess they just do off_t arithmetic, but that isn't portable. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073