Thread: Large file support available

Large file support available

From
Peter Eisentraut
Date:
Large file support is now compiled by default if available.  (Use
--disable-largefile to turn it off.  That's what Autoconf gives us.)

But:

The zlib library uses unsigned ints and unsigned longs for file positions
and offsets.  Depending on how that is used in detail and depending on how
zlib itself is compiled, this may or may not work.

The tar file format (POSIX and traditional) has an inherent limitation on
the size of the member files of 2^33 bytes (pg_dump currently only handles
2^30).  The result in that case continues to be a broken archive.  The GNU
tar format has an extension that would handle 2^89 bytes.  This may be
something interesting to work on.

-- 
Peter Eisentraut   peter_e@gmx.net



Re: Large file support available

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> Large file support is now compiled by default if available.

I am now getting (on HPUX 10.20)

/usr/include/sys/resource.h: In function `getrlimit':
/usr/include/sys/resource.h:168: warning: implicit declaration of function `__getrlimit64'
/usr/include/sys/resource.h: In function `setrlimit':
/usr/include/sys/resource.h:170: warning: implicit declaration of function `__setrlimit64'

for essentially every file in the system.  A little digging shows that
this is happening because _FILE64 is defined and _LARGEFILE64_SOURCE
is not; this is evidently a Bad Idea on HPUX.

Further digging shows that noplace in the standard headers is
_LARGEFILE64_SOURCE #define'd, so evidently one is supposed to supply it
from user headers.  Ugh.  Please add this to the list of
platform-specific symbols that had better be turned on to support large
files.
        regards, tom lane


Re: Large file support available

From
Tom Lane
Date:
Also, even with configure --disable-largefile, I find that pg_config.h
still contains

/* Define to 1 to make fseeko visible on some hosts. */
#define _LARGEFILE_SOURCE 1

/* Define to 1 if fseeko (and presumably ftello) exists and is declared. */
#define HAVE_FSEEKO 1

This strikes me as probably Not a Good Thing, although I haven't dug to
see what the implications are.
        regards, tom lane


Re: Large file support available

From
Tatsuo Ishii
Date:
> Large file support is now compiled by default if available.  (Use
> --disable-largefile to turn it off.  That's what Autoconf gives us.)

Are you sure that backend gains more performance than 1GB segmented
file (I mean large file support turn on LET_OS_MANAGE_FILESIZE)? I
myself have not tried yet, but a linux kernel hacker around me gave
this question sometime ago.
--
Tatsuo Ishii


Re: Large file support available

From
Peter Eisentraut
Date:
Tatsuo Ishii writes:

> Are you sure that backend gains more performance than 1GB segmented
> file (I mean large file support turn on LET_OS_MANAGE_FILESIZE)?

No idea.  My change only enables access to large files, it doesn't change
the segmentation logic in the backend.  The main use at this point is for
pg_dump-related activities.

In fact, while the large file support API can handle 64-bit offsets, its
availability and use don't guarantee that the file system will support any
particular file size.  So the segmentation logic in the backend isn't
going anywhere, as far as I'm concerned.

-- 
Peter Eisentraut   peter_e@gmx.net



Re: Large file support available

From
Peter Eisentraut
Date:
Tom Lane writes:

> Also, even with configure --disable-largefile, I find that pg_config.h
> still contains
>
> /* Define to 1 to make fseeko visible on some hosts. */
> #define _LARGEFILE_SOURCE 1
>
> /* Define to 1 if fseeko (and presumably ftello) exists and is declared. */
> #define HAVE_FSEEKO 1
>
> This strikes me as probably Not a Good Thing, although I haven't dug to
> see what the implications are.

This is harmless (until proven otherwise).  fseeko() is identical to
fseek() except that the offset argument uses off_t, and _LARGEFILE_SOURCE
makes fseeko() and friends visible in the headers.  That's all.  No large
files involved.

-- 
Peter Eisentraut   peter_e@gmx.net



Re: Large file support available

From
Peter Eisentraut
Date:
Tom Lane writes:

> /usr/include/sys/resource.h: In function `getrlimit':
> /usr/include/sys/resource.h:168: warning: implicit declaration of function `__getrlimit64'
> /usr/include/sys/resource.h: In function `setrlimit':
> /usr/include/sys/resource.h:170: warning: implicit declaration of function `__setrlimit64'
>
> for essentially every file in the system.  A little digging shows that
> this is happening because _FILE64 is defined and _LARGEFILE64_SOURCE
> is not; this is evidently a Bad Idea on HPUX.

You're supposed to define _LARGEFILE64_SOURCE if you want to use functions
like open64(), fseek64(), getrlimit64(), etc. in your source.  We don't
want those, obviously.

What is happening here is that evidently the system headers effectively
redefine getrlimit() to point to getrlimit64() if FILE_OFFSET_BITS=64,
which is the usual strategy for all the I/O functions.  But you're not
supposed to have to define _LARGEFILE64_SOURCE for this, because the
change is supposed to be transparent.

If the {s|g}etrlimit warnings are indeed the only ones (i.e., none about
open, fseek, write, read, etc.) then this is either a bug or there's
something wrong in the include file order or something like that.  Which
way is sys/resource.h included anyway?

If there's no way to fix it then we can add a definition of
_LARGEFILE64_SOURCE to hpux.h and consider further action.

-- 
Peter Eisentraut   peter_e@gmx.net



Re: Large file support available

From
Bruce Momjian
Date:
Peter Eisentraut wrote:
> Tom Lane writes:
> 
> > Also, even with configure --disable-largefile, I find that pg_config.h
> > still contains
> >
> > /* Define to 1 to make fseeko visible on some hosts. */
> > #define _LARGEFILE_SOURCE 1
> >
> > /* Define to 1 if fseeko (and presumably ftello) exists and is declared. */
> > #define HAVE_FSEEKO 1
> >
> > This strikes me as probably Not a Good Thing, although I haven't dug to
> > see what the implications are.
> 
> This is harmless (until proven otherwise).  fseeko() is identical to
> fseek() except that the offset argument uses off_t, and _LARGEFILE_SOURCE
> makes fseeko() and friends visible in the headers.  That's all.  No large
> files involved.

I am confused.  fseeko() doesn't look standard to me.  I though
fgetpos/fsetpos() where the standard interfaces for large file support; 
from BSD/OS:
    The fgetpos(), fsetpos(), fseek(), ftell(), and rewind() functions con-    form to ANSI C X3.159-1989 (``ANSI C
'').

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Large file support available

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> If the {s|g}etrlimit warnings are indeed the only ones (i.e., none about
> open, fseek, write, read, etc.) then this is either a bug or there's
> something wrong in the include file order or something like that.

No such luck.  Here's a more complete excerpt of one typical failure:

gcc -O1 -Wall -Wmissing-prototypes -Wmissing-declarations -g -I../../../../src/include   -c -o tuptoaster.o
tuptoaster.c
In file included from /usr/include/sys/wait.h:83,                from
/usr/local/lib/gcc-lib/hppa2.0-hp-hpux10.20/2.95.3/include/stdlib.h:231,               from
../../../../src/include/c.h:56,               from ../../../../src/include/postgres.h:47,                from
tuptoaster.c:25:
/usr/include/sys/resource.h: In function `getrlimit':
/usr/include/sys/resource.h:168: warning: implicit declaration of function `__getrlimit64'
/usr/include/sys/resource.h: In function `setrlimit':
/usr/include/sys/resource.h:170: warning: implicit declaration of function `__setrlimit64'
In file included from /usr/include/unistd.h:11,                from tuptoaster.c:27:
/usr/include/sys/unistd.h: In function `truncate':
/usr/include/sys/unistd.h:539: warning: implicit declaration of function `__truncate64'
/usr/include/sys/unistd.h: In function `prealloc':
/usr/include/sys/unistd.h:543: warning: implicit declaration of function `__prealloc64'
/usr/include/sys/unistd.h: In function `lockf':
/usr/include/sys/unistd.h:544: warning: implicit declaration of function `__lockf64'
/usr/include/sys/unistd.h: In function `ftruncate':
/usr/include/sys/unistd.h:545: warning: implicit declaration of function `__ftruncate64'
In file included from /usr/include/fcntl.h:9,                from tuptoaster.c:28:
/usr/include/sys/fcntl.h: In function `open':
/usr/include/sys/fcntl.h:216: warning: implicit declaration of function `__open64'
/usr/include/sys/fcntl.h: In function `creat':
/usr/include/sys/fcntl.h:217: warning: implicit declaration of function `__creat64'

AFAICT a *lot* of HPUX headers expect you to #define _LARGEFILE64_SOURCE
if you want this stuff to work.
        regards, tom lane


Re: Large file support available

From
Bruce Momjian
Date:
Peter, I have received no reply to this question.

---------------------------------------------------------------------------

Bruce Momjian wrote:
> Peter Eisentraut wrote:
> > Tom Lane writes:
> > 
> > > Also, even with configure --disable-largefile, I find that pg_config.h
> > > still contains
> > >
> > > /* Define to 1 to make fseeko visible on some hosts. */
> > > #define _LARGEFILE_SOURCE 1
> > >
> > > /* Define to 1 if fseeko (and presumably ftello) exists and is declared. */
> > > #define HAVE_FSEEKO 1
> > >
> > > This strikes me as probably Not a Good Thing, although I haven't dug to
> > > see what the implications are.
> > 
> > This is harmless (until proven otherwise).  fseeko() is identical to
> > fseek() except that the offset argument uses off_t, and _LARGEFILE_SOURCE
> > makes fseeko() and friends visible in the headers.  That's all.  No large
> > files involved.
> 
> I am confused.  fseeko() doesn't look standard to me.  I though
> fgetpos/fsetpos() where the standard interfaces for large file support; 
> from BSD/OS:
> 
>      The fgetpos(), fsetpos(), fseek(), ftell(), and rewind() functions con-
>      form to ANSI C X3.159-1989 (``ANSI C '').
> 
> -- 
>   Bruce Momjian                        |  http://candle.pha.pa.us
>   pgman@candle.pha.pa.us               |  (610) 359-1001
>   +  If your life is a hard drive,     |  13 Roberts Road
>   +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Large file support available

From
Bruce Momjian
Date:
OK, with no one replying to this, I will take it upon myself to resolve
this.  According to the Mac OSX fseek() manual page:
    The fgetpos(), fsetpos(), fseek(), ftell(), and rewind() functions    conform to ANSI X3.159-1989 (``ANSI C'').
    The fseeko() and ftello() functions conform to Version 2 of the    Single UNIX Specification (``SUSv2'').

which basically says that we should be using fseek or preferably
fseekpos, not fseeko.  I realize that the advantage of fseeko is that it
has the same API as fseek, but if we are going to fix this, we may as
well do it right and use fgetpos if we have it.

Is there anyone who has fseeko() but _not_ fsetpos()?

---------------------------------------------------------------------------

Bruce Momjian wrote:
> 
> Peter, I have received no reply to this question.
> 
> ---------------------------------------------------------------------------
> 
> Bruce Momjian wrote:
> > Peter Eisentraut wrote:
> > > Tom Lane writes:
> > > 
> > > > Also, even with configure --disable-largefile, I find that pg_config.h
> > > > still contains
> > > >
> > > > /* Define to 1 to make fseeko visible on some hosts. */
> > > > #define _LARGEFILE_SOURCE 1
> > > >
> > > > /* Define to 1 if fseeko (and presumably ftello) exists and is declared. */
> > > > #define HAVE_FSEEKO 1
> > > >
> > > > This strikes me as probably Not a Good Thing, although I haven't dug to
> > > > see what the implications are.
> > > 
> > > This is harmless (until proven otherwise).  fseeko() is identical to
> > > fseek() except that the offset argument uses off_t, and _LARGEFILE_SOURCE
> > > makes fseeko() and friends visible in the headers.  That's all.  No large
> > > files involved.
> > 
> > I am confused.  fseeko() doesn't look standard to me.  I though
> > fgetpos/fsetpos() where the standard interfaces for large file support; 
> > from BSD/OS:
> > 
> >      The fgetpos(), fsetpos(), fseek(), ftell(), and rewind() functions con-
> >      form to ANSI C X3.159-1989 (``ANSI C '').
> > 
> > -- 
> >   Bruce Momjian                        |  http://candle.pha.pa.us
> >   pgman@candle.pha.pa.us               |  (610) 359-1001
> >   +  If your life is a hard drive,     |  13 Roberts Road
> >   +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
> > 
> > ---------------------------(end of broadcast)---------------------------
> > TIP 4: Don't 'kill -9' the postmaster
> > 
> 
> -- 
>   Bruce Momjian                        |  http://candle.pha.pa.us
>   pgman@candle.pha.pa.us               |  (610) 359-1001
>   +  If your life is a hard drive,     |  13 Roberts Road
>   +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Large file support available

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Is there anyone who has fseeko() but _not_ fsetpos()?

AFAICT this is completely irrelevant to large files.
        regards, tom lane


Re: Large file support available

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Is there anyone who has fseeko() but _not_ fsetpos()?
> 
> AFAICT this is completely irrelevant to large files.

Please explain.  At:
http://man.dnswatch.com/cgi-bin/htmlman?fseek+3

I see:
    The fseeko() function is identical to fseek(), except it takes an off_t    argument instead of a long.  Likewise,
theftello() function is identical    to ftell(), except it returns an off_t.
 

while fsetpos() is:
    fsetpos(FILE *stream, const fpos_t *pos);

and presumably fpos_t handles long files too.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Large file support available

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> I see:
>      The fseeko() function is identical to fseek(), except it takes an off_t
>      argument instead of a long.  Likewise, the ftello() function is identical
>      to ftell(), except it returns an off_t.

Indeed.  Notice the complete lack of any commitment about the size of
off_t ...

> while fsetpos() is:
>      fsetpos(FILE *stream, const fpos_t *pos);

... or the size of fpos_t.

You might find it illuminating to read this random extract from the
HPUX 10.20 man pages:

NAME    fgetpos64(), fopen64(), freopen64(), fseeko64(), fsetpos64(),    fstatvfsdev64(), ftello64(), ftw64(),
nftw64(),statvfsdev64(),    tmpfile64() - non-POSIX standard API interfaces to support large    files.
 

DESCRIPTION    New API's to support large files. These API interfaces are not a part    of the POSIX standard and may
beremoved in the future.
 
    fgetpos64()           The fgetpos64() function is identical to                          fgetpos() except that
fgetpos64()returns the                          position in a fpos64_t instead of a fpos_t.  All
 other functional behaviors, returns, and errors                          are identical.
 
    ... etc ...

I don't see any reason to believe that fgetpos buys us anything but
notational inconvenience.  It certainly doesn't buy large file support,
at least not without the same behind-the-scenes redefinitions needed for
fseek/fseeko and friends...
        regards, tom lane


Re: Large file support available

From
Mark Kirkwood
Date:
Bruce Momjian wrote:

>OK, with no one replying to this, I will take it upon myself to resolve
>this.  According to the Mac OSX fseek() manual page:
>
>     The fgetpos(), fsetpos(), fseek(), ftell(), and rewind() functions
>     conform to ANSI X3.159-1989 (``ANSI C'').
>
>     The fseeko() and ftello() functions conform to Version 2 of the
>     Single UNIX Specification (``SUSv2'').
>  
>

I might be veering *slightly* off the topic here, but since I got bitten 
by this recently I thought I would mention it:

On Linux, and found that I needed

<#include asm/fcntl.h>
instead of
<#include fcntl.h>

when using lseek. I had expected defining _FILE_OFFSET_BITS=64 to sort 
this (which it did not).

I think that this will only be an issue if folk want relation files to 
be chunked at > 2G (or want to define LET_OS_MANAGE_FILES).

best wishes

Mark



Re: Large file support available

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > I see:
> >      The fseeko() function is identical to fseek(), except it takes an off_t
> >      argument instead of a long.  Likewise, the ftello() function is identical
> >      to ftell(), except it returns an off_t.
> 
> Indeed.  Notice the complete lack of any commitment about the size of
> off_t ...
> 
> > while fsetpos() is:
> >      fsetpos(FILE *stream, const fpos_t *pos);
> 
> ... or the size of fpos_t.
> 
> You might find it illuminating to read this random extract from the
> HPUX 10.20 man pages:
...
> I don't see any reason to believe that fgetpos buys us anything but
> notational inconvenience.  It certainly doesn't buy large file support,
> at least not without the same behind-the-scenes redefinitions needed for
> fseek/fseeko and friends...

Clearly there is the issues that fseek uses long, which isn't enough for
large file support.  On BSD/OS, we have fsetpos, which is the way we do
large file support:
    int    fseek(FILE *stream, long offset, int whence);
    int    fsetpos(FILE *stream, const fpos_t *pos);

My point is that it seems fsetpos is the approved way of accessing large
files, rather than fseeko.  In fact, I don't have fseeko here but I do
have fsetpos, and it does handle large files because my includes have
this:typedef off_t fpos_ttypedef quad_t off_t;

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Large file support available

From
Peter Eisentraut
Date:
Bruce Momjian writes:

> My point is that it seems fsetpos is the approved way of accessing large
> files, rather than fseeko.  In fact, I don't have fseeko here but I do
> have fsetpos, and it does handle large files because my includes have
> this:
>
>     typedef off_t fpos_t
>     typedef quad_t off_t;

Interesting.  In general, you can't rely on fpos_t being an integral type,
which indeed on my machine it isn't.  But for pg_dump we need an integral
type because we do offset arithmetic.

-- 
Peter Eisentraut   peter_e@gmx.net



Re: Large file support available

From
Bruce Momjian
Date:
Peter Eisentraut wrote:
> Bruce Momjian writes:
> 
> > My point is that it seems fsetpos is the approved way of accessing large
> > files, rather than fseeko.  In fact, I don't have fseeko here but I do
> > have fsetpos, and it does handle large files because my includes have
> > this:
> >
> >     typedef off_t fpos_t
> >     typedef quad_t off_t;
> 
> Interesting.  In general, you can't rely on fpos_t being an integral type,
> which indeed on my machine it isn't.  But for pg_dump we need an integral
> type because we do offset arithmetic.

Oh, is that why fsetpos is always SEEK_SET and not SET_CURR or offset
stuff.  Strange I don't have fseeko and do have large file support. 
BSD/OS has had it for years.  I guess they just do off_t arithmetic, but
that isn't portable.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073