Thread: pg_dump and large files - is this a problem?

pg_dump and large files - is this a problem?

From
Philip Warner
Date:
Is it my imagination, or is there a problem with the way pg_dump uses off_t 
etc. My understanding is that off_t may be 64 bits on systems with 32 bit 
ints. But it looks like pg_dump writes them as 4 byte values in all cases. 
It also reads them as 4 byte values. Does this seem like a problem to 
anybody else?



----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                 |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/



Re: pg_dump and large files - is this a problem?

From
Tom Lane
Date:
Philip Warner <pjw@rhyme.com.au> writes:
> Is it my imagination, or is there a problem with the way pg_dump uses off_t 
> etc. My understanding is that off_t may be 64 bits on systems with 32 bit 
> ints. But it looks like pg_dump writes them as 4 byte values in all cases. 
> It also reads them as 4 byte values. Does this seem like a problem to 
> anybody else?

Yes, it does --- the implication is that the custom format, at least,
can't support dumps > 4Gb.  What exactly is pg_dump writing off_t's
into files for; maybe there's not really a problem?

If there is a problem, seems like we'd better fix it.  Perhaps there
needs to be something in the header to tell the reader the sizeof
off_t.
        regards, tom lane


Re: pg_dump and large files - is this a problem?

From
Bruce Momjian
Date:
Tom Lane wrote:
> Philip Warner <pjw@rhyme.com.au> writes:
> > Is it my imagination, or is there a problem with the way pg_dump uses off_t 
> > etc. My understanding is that off_t may be 64 bits on systems with 32 bit 
> > ints. But it looks like pg_dump writes them as 4 byte values in all cases. 
> > It also reads them as 4 byte values. Does this seem like a problem to 
> > anybody else?
> 
> Yes, it does --- the implication is that the custom format, at least,
> can't support dumps > 4Gb.  What exactly is pg_dump writing off_t's
> into files for; maybe there's not really a problem?
> 
> If there is a problem, seems like we'd better fix it.  Perhaps there
> needs to be something in the header to tell the reader the sizeof
> off_t.

BSD/OS has 64-bit off_t's so it does support large files.  Is there
something I can test?

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: pg_dump and large files - is this a problem?

From
Peter Eisentraut
Date:
Tom Lane writes:

> Yes, it does --- the implication is that the custom format, at least,
> can't support dumps > 4Gb.  What exactly is pg_dump writing off_t's
> into files for; maybe there's not really a problem?

That's kind of what I was wondering, too.

Not that it's an excuse, but I think that large file access through zlib
won't work anyway.  Zlib uses the integer types in fairly random ways.

-- 
Peter Eisentraut   peter_e@gmx.net



Re: pg_dump and large files - is this a problem?

From
Philip Warner
Date:
At 09:59 AM 1/10/2002 -0400, Tom Lane wrote:
>If there is a problem, seems like we'd better fix it.  Perhaps there
>needs to be something in the header to tell the reader the sizeof
>off_t.

Yes, and do the peripheral stuff to support old archives etc. We also need 
to be careful about the places where we do file-position-arithmetic - if 
there are any, I can't recall.

I am not sure we need to worry about whether zlib supports large files 
since I am pretty sure we don't use zlib for file IO - we just pass it 
in-memory blocks; so it should work no matter how much data is in the stream.


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                 |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/



Re: pg_dump and large files - is this a problem?

From
Philip Warner
Date:
At 11:20 AM 1/10/2002 -0400, Bruce Momjian wrote:
>BSD/OS has 64-bit off_t's so it does support large files.  Is there
>something I can test?

Not really since it saves only the first 32 bits of the 64 bit positions it 
will do no worse than a version that supports 32 bits only. It might even 
do slightly better. When this is sorted out, we need to verify that:

- large dump files are restorable

- dump files with 32 bit off_t restore properly on systems with 64 biy off_t

- dump files with 64 bit off_t restore properly on systems with 32 bit off_tAS
LONG AS the offsets are less than 32 bits.

- old dump files restore properly.

- new dump files have a new version number so that old pg_restore will not 
try to restore them.

We probably need to add Read/WriteOffset to pg_backup_archiver.c to read 
the appropriate sized value from a dump file, in the same way that 
Read/WriteInt works now.


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                 |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/



Re: pg_dump and large files - is this a problem?

From
Philip Warner
Date:
At 09:42 AM 2/10/2002 +1000, Philip Warner wrote:
>Yes, and do the peripheral stuff to support old archives etc.

Does silence mean people agree? Does it also mean someone is doing this 
(eg. whoever did the off_t support)? Or does it mean somebody else needs to 
do it?




----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                 |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/



Re: pg_dump and large files - is this a problem?

From
Tom Lane
Date:
Philip Warner <pjw@rhyme.com.au> writes:
> At 09:42 AM 2/10/2002 +1000, Philip Warner wrote:
>> Yes, and do the peripheral stuff to support old archives etc.

> Does silence mean people agree? Does it also mean someone is doing this 
> (eg. whoever did the off_t support)? Or does it mean somebody else needs to 
> do it?

It needs to get done; AFAIK no one has stepped up to do it.  Do you want
to?
        regards, tom lane


Re: pg_dump and large files - is this a problem?

From
Bruce Momjian
Date:
Philip Warner wrote:
> At 09:42 AM 2/10/2002 +1000, Philip Warner wrote:
> >Yes, and do the peripheral stuff to support old archives etc.
>
> Does silence mean people agree? Does it also mean someone is doing this
> (eg. whoever did the off_t support)? Or does it mean somebody else needs to
> do it?

Added to open items:

    Fix pg_dump to handle 64-bit off_t offsets for custom format


--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
                               P O S T G R E S Q L

                          7 . 3  O P E N    I T E M S


Current at ftp://momjian.postgresql.org/pub/postgresql/open_items.

Source Code Changes
-------------------
Schema handling - ready? interfaces? client apps?
Drop column handling - ready for all clients, apps?
Fix BeOS, QNX4 ports
Fix AIX large file compile failure of 2002-09-11 (Andreas)
Get bison upgrade on postgresql.org for ecpg only (Marc)
Fix vacuum btree bug (Tom)
Fix client apps for autocommit = off
Change log_min_error_statement to be off by default (Gavin)
Fix return tuple counts/oid/tag for rules, SPI
Add schema dump option to pg_dump
Make SET not start a transaction with autocommit off, document it
Remove GRANT EXECUTE to all /contrib functions?
Change NUMERIC to have 16 digit precision
Handle CREATE CONSTRAINT TRIGGER without FROM in loads from old db's
Fix pg_dump to handle 64-bit off_t offsets for custom format

On Going
--------
Security audit


Documentation Changes
---------------------
Document need to add permissions to loaded functions and languages
Move documation to gborg for moved projects


7.2.X
-----
CLOG
WAL checkpoint
Linux mktime()

Re: pg_dump and large files - is this a problem?

From
Philip Warner
Date:
At 11:06 AM 2/10/2002 -0400, Tom Lane wrote:
>It needs to get done; AFAIK no one has stepped up to do it.  Do you want
>to?

I'll have a look; my main concern at the moment is that off_t and size_t 
are totally non-committal as to structure; in particular I can probably 
safely assume that they are unsigned, but can I assume that they have the 
same endian--ness as int etc?

If so, then will it be valid to just read/write each byte in endian order? 
How likely is it that the 64 bit value will actually be implemented as a 
structure like:

off_t { int lo; int hi; }

which effectively ignores endian-ness at the 32 bit scale?



----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                 |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/



Re: pg_dump and large files - is this a problem?

From
Philip Warner
Date:
At 11:06 AM 2/10/2002 -0400, Tom Lane wrote:
>It needs to get done; AFAIK no one has stepped up to do it.  Do you want
>to?

My limited reading of off_t stuff now suggests that it would be brave to 
assume it is even a simple 64 bit number (or even 3 32 bit numbers). One 
alternative, which I am not terribly fond of, is to have pg_dump write 
multiple files - when we get to 1 or 2GB, we just open another file, and 
record our file positions as a (file number, file position) pair. Low tech, 
but at least we know it would work.

Unless anyone knows of a documented way to get 64 bit uint/int file 
offsets, I don't see we have mush choice.


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                 |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/



Re: pg_dump and large files - is this a problem?

From
"Mario Weilguni"
Date:
>My limited reading of off_t stuff now suggests that it would be brave to
>assume it is even a simple 64 bit number (or even 3 32 bit numbers). One
>alternative, which I am not terribly fond of, is to have pg_dump write
>multiple files - when we get to 1 or 2GB, we just open another file, and
>record our file positions as a (file number, file position) pair. Low tech,
>but at least we know it would work.
>
>Unless anyone knows of a documented way to get 64 bit uint/int file
>offsets, I don't see we have mush choice.

How common is fgetpos64? Linux supports it, but I don't know about other
systems.

http://hpc.uky.edu/cgi-bin/man.cgi?section=all&topic=fgetpos64

Regards,Mario Weilguni


Re: pg_dump and large files - is this a problem?

From
Giles Lean
Date:
Philip Warner writes:

> My limited reading of off_t stuff now suggests that it would be brave to 
> assume it is even a simple 64 bit number (or even 3 32 bit numbers).

What are you reading??  If you find a platform with 64 bit file
offsets that doesn't support 64 bit integral types I will not just be
surprised but amazed.

> One alternative, which I am not terribly fond of, is to have pg_dump
> write multiple files - when we get to 1 or 2GB, we just open another
> file, and record our file positions as a (file number, file
> position) pair. Low tech, but at least we know it would work.

That does avoid the issue completely, of course, and also avoids
problems where a platform might have large file support but a
particular filesystem might or might not.

> Unless anyone knows of a documented way to get 64 bit uint/int file 
> offsets, I don't see we have mush choice.

If you're on a platform that supports large files it will either have
a straightforward 64 bit off_t or else will support the "large files
API" that is common on Unix-like operating systems.

What are you trying to do, exactly?

Regards,

Giles





Re: pg_dump and large files - is this a problem?

From
Philip Warner
Date:
At 07:15 AM 4/10/2002 +1000, Giles Lean wrote:

> > My limited reading of off_t stuff now suggests that it would be brave to
> > assume it is even a simple 64 bit number (or even 3 32 bit numbers).
>
>What are you reading??  If you find a platform with 64 bit file
>offsets that doesn't support 64 bit integral types I will not just be
>surprised but amazed.

Yes, but there is no guarantee that off_t is implemented as such, nor would 
we be wise to assume so (most docs say explicitly not to do so).


> > Unless anyone knows of a documented way to get 64 bit uint/int file
> > offsets, I don't see we have mush choice.
>
>If you're on a platform that supports large files it will either have
>a straightforward 64 bit off_t or else will support the "large files
>API" that is common on Unix-like operating systems.
>
>What are you trying to do, exactly?

Again yes, but the problem is the same: we need a way of making the *value* 
of an off_t portable (not just assuming it's a int64). In general that 
involves knowing how to turn it into a more universal data type (eg. int64, 
or even a string). Does the large file API have functions for representing 
the off_t values that is portable across architectures? And is the API also 
portable?



----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                 |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/



Re: pg_dump and large files - is this a problem?

From
Giles Lean
Date:
Philip Warner writes:

> Yes, but there is no guarantee that off_t is implemented as such, nor would 
> we be wise to assume so (most docs say explicitly not to do so).

I suspect you're reading old documents, which is why I asked what you
were referring to.  In the '80s what you are saying would have been
best practice, no question: 64 bit type support was not common.

When talking of near-current systems with 64 bit off_t you are not
going to find one without support for 64 bit integral types.

> Again yes, but the problem is the same: we need a way of making the *value* 
> of an off_t portable (not just assuming it's a int64). In general that 
> involves knowing how to turn it into a more universal data type (eg. int64, 
> or even a string).

So you need to know the size of off_t, which will be 32 bit or 64 bit,
and then you need routines to convert that to a portable representation.
The canonical solution is XDR, but I'm not sure that you want to  bother
with it or if it has been extended universally to support 64 bit types.

If you limit the file sizes to 1GB (your less preferred option, I
know;-) then like the rest of the PostgreSQL code you can safely
assume that off_t fits into 32 bits and have a choice of functions
(XDR or ntohl() etc) to deal with them and ignore 64 bit off_t
issues altogether.

If you intend pg_dump files to be portable avoiding the use of large
files will be best.  It also avoids issues on platforms such as HP-UX
where large file support is available, but it has to be enabled on a
per-filesystem basis. :-(

> Does the large file API have functions for representing 
> the off_t values that is portable across architectures? And is the API also 
> portable?

The large files API is a way to access large files from 32 bit
processes.  It is reasonably portable, but is a red herring for
what you are wanting to do.  (I'm not convinced I am understanding
what you're trying to do, but I have 'flu which is not helping. :-)

Regards,

Giles



Re: pg_dump and large files - is this a problem?

From
Tom Lane
Date:
Giles Lean <giles@nemeton.com.au> writes:
> When talking of near-current systems with 64 bit off_t you are not
> going to find one without support for 64 bit integral types.

I tend to agree with Giles on this point.  A non-integral representation
of off_t is theoretically possible but I don't believe it exists in
practice.  Before going far out of our way to allow it, we should first
require some evidence that it's needed on a supported or
likely-to-be-supported platform.

time_t isn't guaranteed to be an integral type either if you read the
oldest docs about it ... but no one believes that in practice ...
        regards, tom lane


Re: pg_dump and large files - is this a problem?

From
Bruce Momjian
Date:
Tom Lane wrote:
> Giles Lean <giles@nemeton.com.au> writes:
> > When talking of near-current systems with 64 bit off_t you are not
> > going to find one without support for 64 bit integral types.
> 
> I tend to agree with Giles on this point.  A non-integral representation
> of off_t is theoretically possible but I don't believe it exists in
> practice.  Before going far out of our way to allow it, we should first
> require some evidence that it's needed on a supported or
> likely-to-be-supported platform.
> 
> time_t isn't guaranteed to be an integral type either if you read the
> oldest docs about it ... but no one believes that in practice ...

I think fpos_t is the non-integral one.  I thought off_t almost always
was integral.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: pg_dump and large files - is this a problem?

From
Philip Warner
Date:
At 11:07 PM 3/10/2002 -0400, Tom Lane wrote:
>A non-integral representation
>of off_t is theoretically possible but I don't believe it exists in
>practice.

Excellent. So I can just read/write the bytes in an appropriate order and 
expect whatever size it is to be a single intXX.

Fine with me, unless anybody voices another opinion in the next day, I will 
proceed. I just have this vague recollection of seeing a header file with a 
more complex structure for off_t. I'm probably dreaming.




----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                 |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/



Re: pg_dump and large files - is this a problem?

From
Philip Warner
Date:
I have made the changes to pg_dump and verified that (a) it reads old 
files, (b) it handles 8 byte offsets, and (c) it dumps & seems to restore 
(at least to /dev/null).

I don't have a lot of options for testing it - should I just apply the 
changes and wait for the problems, or can someone offer a bigendian machine 
and/or a 4 byte off_t machine?


>was integral.

----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                 |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/



Re: pg_dump and large files - is this a problem?

From
Tom Lane
Date:
Philip Warner <pjw@rhyme.com.au> writes:
> I don't have a lot of options for testing it - should I just apply the 
> changes and wait for the problems, or can someone offer a bigendian machine 
> and/or a 4 byte off_t machine?

My HP is big-endian; send in the patch and I'll check it here...
        regards, tom lane


Re: pg_dump and large files - is this a problem?

From
Peter Eisentraut
Date:
Philip Warner writes:

>
> I have made the changes to pg_dump and verified that (a) it reads old
> files, (b) it handles 8 byte offsets, and (c) it dumps & seems to restore
> (at least to /dev/null).
>
> I don't have a lot of options for testing it - should I just apply the
> changes and wait for the problems, or can someone offer a bigendian machine
> and/or a 4 byte off_t machine?

Any old machine has a 4-byte off_t if you configure with
--disable-largefile.  This could be a neat way to test:  Make two
installations configured different ways and move data back and forth
between them until it changes.  ;-)

-- 
Peter Eisentraut   peter_e@gmx.net



Re: pg_dump and large files - is this a problem?

From
Philip Warner
Date:
At 12:07 AM 19/10/2002 +0200, Peter Eisentraut wrote:
>Any old machine has a 4-byte off_t if you configure with
>--disable-largefile.

Thanks - done. I just dumped to a custom backup file, then dumped it do 
SQL, and compared each version (V7.2.1, 8 byte & 4 byte offsets), and they 
all looked OK. Also, the 4 byte version reads the 8 byte offset version 
correctly - although I have not checked reading > 4GB files with 4 byte 
offset, but it's not a priority for obvious reasons.

So once Giles gets back to me (Monday), I'll commit the changes.



----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                 |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/



Re: pg_dump and large files - is this a problem?

From
Philip Warner
Date:
I have put the latest patch at:
    http://downloads.rhyme.com.au/postgresql/pg_dump/

along with two dump files of the regression DB, one with 4 byte
and the other with 8 byte offsets. I can read/restore each from
the other, so it looks pretty good. Once the endianness is tested,
we should be OK.

Known problems:

- will not cope with > 4GB files and size_t not 64 bit.
- when printing data position, it is assumed that off_t is UINT64  (we could remove this entirely - it's just for
display)
- if seek is not supported, then an intXX is assigned to off_t  when file offsets are needed. This *should* not cause a
problem since without seek, the offsets will not be written to the file.
 

Changes from Prior Version:

- No longer stores or outputs data length
- Assumes result of ftello is correct if it disagrees with internally  kept tally.
- 'pg_restore -l' now shows sizes of int and offset.


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                 |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/



Re: pg_dump and large files - is this a problem?

From
Bruce Momjian
Date:
Your patch has been added to the PostgreSQL unapplied patches list at:
http://momjian.postgresql.org/cgi-bin/pgpatches

I will try to apply it within the next 48 hours.

---------------------------------------------------------------------------


Philip Warner wrote:
> 
> I have put the latest patch at:
> 
>      http://downloads.rhyme.com.au/postgresql/pg_dump/
> 
> along with two dump files of the regression DB, one with 4 byte
> and the other with 8 byte offsets. I can read/restore each from
> the other, so it looks pretty good. Once the endianness is tested,
> we should be OK.
> 
> Known problems:
> 
> - will not cope with > 4GB files and size_t not 64 bit.
> - when printing data position, it is assumed that off_t is UINT64
>    (we could remove this entirely - it's just for display)
> - if seek is not supported, then an intXX is assigned to off_t
>    when file offsets are needed. This *should* not cause a problem
>    since without seek, the offsets will not be written to the file.
> 
> Changes from Prior Version:
> 
> - No longer stores or outputs data length
> - Assumes result of ftello is correct if it disagrees with internally
>    kept tally.
> - 'pg_restore -l' now shows sizes of int and offset.
> 
> 
> ----------------------------------------------------------------
> Philip Warner                    |     __---_____
> Albatross Consulting Pty. Ltd.   |----/       -  \
> (A.B.N. 75 008 659 498)          |          /(@)   ______---_
> Tel: (+61) 0500 83 82 81         |                 _________  \
> Fax: (+61) 0500 83 82 82         |                 ___________ |
> Http://www.rhyme.com.au          |                /           \|
>                                   |    --________--
> PGP key available upon request,  |  /
> and from pgp5.ai.mit.edu:11371   |/
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: pg_dump and large files - is this a problem?

From
Philip Warner
Date:
At 09:18 PM 20/10/2002 -0400, Bruce Momjian wrote:
>I will try to apply it within the next 48 hours.

I'm happy to apply it when necessary; but I wouldn't do it until we've from 
some someone with a big-endian machine...



----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                 |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/



Re: pg_dump and large files - is this a problem?

From
Bruce Momjian
Date:
Philip Warner wrote:
> At 09:18 PM 20/10/2002 -0400, Bruce Momjian wrote:
> >I will try to apply it within the next 48 hours.
> 
> I'm happy to apply it when necessary; but I wouldn't do it until we've from 
> some someone with a big-endian machine...

Well, I think Tom was going to try it on his HPUX machine.  However, it
is on the open items list, so we are going to need to get it in there
soon anyway, or yank it all out.  If no big endian people want to test
it, we will have to ship and then I am sure some big-ending testing will
happen.  ;-)

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: pg_dump and large files - is this a problem?

From
Philip Warner
Date:
At 09:50 PM 20/10/2002 -0400, Bruce Momjian wrote:
>Well, I think Tom was going to try it on his HPUX machine.

It might be good if someone who knows a little more than me about
endianness etc has a look at the patch - specifically this bit of code:

#if __BYTE_ORDER == __LITTLE_ENDIAN        for (off = 0 ; off < sizeof(off_t) ; off++) {
#else        for (off = sizeof(off_t) -1 ; off >= 0 ; off--) {
#endif            i = *(char*)(ptr+off);            (*AH->WriteBytePtr) (AH, i);        }

It is *intended* to write the data such that the least significant byte
is written first to the file, but the dump Giles put on his FTP site
is not correct - it's written msb->lsb.

There seem to be two possibilities (a) I am an idiot and there is something
wrong with the code above that I can not see, or (b) the test:
    #if __BYTE_ORDER == __LITTLE_ENDIAN

is not the right thing to do. Any insights would be appreciated.



----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                 |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/



Re: pg_dump and large files - is this a problem?

From
Tom Lane
Date:
Philip Warner <pjw@rhyme.com.au> writes:
> It might be good if someone who knows a little more than me about
> endianness etc has a look at the patch - specifically this bit of code:

> #if __BYTE_ORDER == __LITTLE_ENDIAN

Well, the main problem with that is there's no such symbol as
__BYTE_ORDER ...

I'd prefer not to introduce one, either, if we can possibly avoid it.
I know that we have BYTE_ORDER defined in the port header files, but
I think it's quite untrustworthy, since there is no other place in the
main distribution that uses it anymore (AFAICS only contrib/pgcrypto
uses it at all).

The easiest way to write and reassemble an arithmetic value in a
platform-independent order is via shifting.  For instance,
// write, LSB firstfor (i = 0; i < sizeof(off_t); i++){    writebyte(val & 0xFF);    val >>= 8;}
// read, LSB firstval = 0;shift = 0;for (i = 0; i < sizeof(off_t); i++){    val |= (readbyte() << shift);    shift +=
8;}

(This assumes readbyte delivers an unsigned byte, else you might need to
mask it with 0xFF before shifting.)
        regards, tom lane


Re: pg_dump and large files - is this a problem?

From
Tom Lane
Date:
Philip Warner <pjw@rhyme.com.au> writes:
> then checking the first byte? This should give me the endianness, and makes 
> a non-destructive write (not sure it it's important). Currently the 
> commonly used code does not rely on off_t arithmetic, so if possible I'd 
> like to avoid shift. Does that sound reasonable? Or overly cautious?

I think it's pointless.  Let's assume off_t is not an arithmetic type
but some weird struct dreamed up by a crazed kernel hacker.  What are
the odds that dumping the bytes in it, in either order, will produce
something that's compatible with any other platform?  There could be
padding, or the fields might be in an order that doesn't match the
byte order within the fields, or something else.

The shift method requires *no* directly endian-dependent code,
and I think it will work on any platform where you have any hope of
portability anyway.
        regards, tom lane


Re: pg_dump and large files - is this a problem?

From
Philip Warner
Date:
At 09:47 AM 21/10/2002 -0400, Tom Lane wrote:
>Well, the main problem with that is there's no such symbol as
>__BYTE_ORDER ...

What about just:
    int i = 256;

then checking the first byte? This should give me the endianness, and makes 
a non-destructive write (not sure it it's important). Currently the 
commonly used code does not rely on off_t arithmetic, so if possible I'd 
like to avoid shift. Does that sound reasonable? Or overly cautious?







----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                 |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/



Re: pg_dump and large files - is this a problem?

From
Bruce Momjian
Date:
Here is a modified version of Philip's patch that has the changes Tom
suggested;  treating off_t as an integral type.  I did light testing on
my BSD/OS machine that has 8-byte off_t but I don't have 4 gigs of free
space to test larger files.  
ftp://candle.pha.pa.us/pub/postgresql/mypatches/pg_dump

Can others test?

---------------------------------------------------------------------------

Tom Lane wrote:
> Philip Warner <pjw@rhyme.com.au> writes:
> > then checking the first byte? This should give me the endianness, and makes 
> > a non-destructive write (not sure it it's important). Currently the 
> > commonly used code does not rely on off_t arithmetic, so if possible I'd 
> > like to avoid shift. Does that sound reasonable? Or overly cautious?
> 
> I think it's pointless.  Let's assume off_t is not an arithmetic type
> but some weird struct dreamed up by a crazed kernel hacker.  What are
> the odds that dumping the bytes in it, in either order, will produce
> something that's compatible with any other platform?  There could be
> padding, or the fields might be in an order that doesn't match the
> byte order within the fields, or something else.
> 
> The shift method requires *no* directly endian-dependent code,
> and I think it will work on any platform where you have any hope of
> portability anyway.
> 
>             regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: pg_dump and large files - is this a problem?

From
Larry Rosenman
Date:
On Mon, 2002-10-21 at 20:47, Bruce Momjian wrote:
> 
> Here is a modified version of Philip's patch that has the changes Tom
> suggested;  treating off_t as an integral type.  I did light testing on
> my BSD/OS machine that has 8-byte off_t but I don't have 4 gigs of free
> space to test larger files.  
I can make an account for anyone that wants to play on UnixWare 7.1.3.



-- 
Larry Rosenman                     http://www.lerctr.org/~ler
Phone: +1 972-414-9812                 E-Mail: ler@lerctr.org
US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749



Re: pg_dump and large files - is this a problem?

From
Bruce Momjian
Date:
Larry Rosenman wrote:
> On Mon, 2002-10-21 at 20:47, Bruce Momjian wrote:
> > 
> > Here is a modified version of Philip's patch that has the changes Tom
> > suggested;  treating off_t as an integral type.  I did light testing on
> > my BSD/OS machine that has 8-byte off_t but I don't have 4 gigs of free
> > space to test larger files.  
> I can make an account for anyone that wants to play on UnixWare 7.1.3.

If you have 7.3, you can just test this way:1) apply the patch2) run the regression tests3) pg_dump -Fc regression
>/tmp/x4)pg_restore  -Fc  </tmp/x
 

That's all I did and it worked.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: pg_dump and large files - is this a problem?

From
Larry Rosenman
Date:
On Mon, 2002-10-21 at 20:52, Bruce Momjian wrote:
> Larry Rosenman wrote:
> > On Mon, 2002-10-21 at 20:47, Bruce Momjian wrote:
> > > 
> > > Here is a modified version of Philip's patch that has the changes Tom
> > > suggested;  treating off_t as an integral type.  I did light testing on
> > > my BSD/OS machine that has 8-byte off_t but I don't have 4 gigs of free
> > > space to test larger files.  
> > I can make an account for anyone that wants to play on UnixWare 7.1.3.
> 
> If you have 7.3, you can just test this way:
I haven't had the time to play with 7.3 (busy on a NUMBER of other
things). 

I'm more than willing to supply resources, just my time is short right
now. 


>     
>     1) apply the patch
>     2) run the regression tests
>     3) pg_dump -Fc regression >/tmp/x
>     4) pg_restore  -Fc  </tmp/x
> 
> That's all I did and it worked.
> 
> -- 
>   Bruce Momjian                        |  http://candle.pha.pa.us
>   pgman@candle.pha.pa.us               |  (610) 359-1001
>   +  If your life is a hard drive,     |  13 Roberts Road
>   +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
-- 
Larry Rosenman                     http://www.lerctr.org/~ler
Phone: +1 972-414-9812                 E-Mail: ler@lerctr.org
US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749



Re: pg_dump and large files - is this a problem?

From
Philip Warner
Date:
At 09:52 PM 21/10/2002 -0400, Bruce Momjian wrote:
>         4) pg_restore  -Fc  </tmp/x

pg_restore  /tmp/x

is enough; it will determine the file type, and by avoiding the pipe, you 
allow it to do seeks which are not much use here, but are usefull when you 
only restore one table in a very large backup.


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                 |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/



Re: pg_dump and large files - is this a problem?

From
Philip Warner
Date:
At 10:16 AM 21/10/2002 -0400, Tom Lane wrote:
>What are
>the odds that dumping the bytes in it, in either order, will produce
>something that's compatible with any other platform?

None, but it will be compatible with itself (the most we can hope for), and 
will work even if shifting is not supported for off_t (how likely is 
that?). I agree shift is definitely the way to go if it works on arbitrary 
data - ie. it does not rely on off_t being an integer. Can I shift a struct?


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                 |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/



Re: pg_dump and large files - is this a problem?

From
Tom Lane
Date:
Philip Warner <pjw@rhyme.com.au> writes:
> None, but it will be compatible with itself (the most we can hope for), and 
> will work even if shifting is not supported for off_t (how likely is 
> that?). I agree shift is definitely the way to go if it works on arbitrary 
> data - ie. it does not rely on off_t being an integer. Can I shift a struct?

You can't.  If there are any platforms where in fact off_t isn't an
arithmetic type, then shifting code would break there.  I am not sure
there are any; can anyone provide a counterexample?

It would be simple enough to add a configure test to see whether off_t
is arithmetic (just try to compile "off_t x; x <<= 8;").  How about#ifdef OFF_T_IS_ARITHMETIC_TYPE    // cross-platform
compatible   use shifting method#else    // not cross-platform compatible    read or write bytes of struct in storage
order#endif
        regards, tom lane


Re: pg_dump and large files - is this a problem?

From
Bruce Momjian
Date:
Tom Lane wrote:
> Philip Warner <pjw@rhyme.com.au> writes:
> > None, but it will be compatible with itself (the most we can hope for), and 
> > will work even if shifting is not supported for off_t (how likely is 
> > that?). I agree shift is definitely the way to go if it works on arbitrary 
> > data - ie. it does not rely on off_t being an integer. Can I shift a struct?
> 
> You can't.  If there are any platforms where in fact off_t isn't an
> arithmetic type, then shifting code would break there.  I am not sure
> there are any; can anyone provide a counterexample?
> 
> It would be simple enough to add a configure test to see whether off_t
> is arithmetic (just try to compile "off_t x; x <<= 8;").  How about
>     #ifdef OFF_T_IS_ARITHMETIC_TYPE
>         // cross-platform compatible
>         use shifting method
>     #else
>         // not cross-platform compatible
>         read or write bytes of struct in storage order
>     #endif

It is my understanding that off_t is an integral type and fpos_t is
perhaps a struct.  My fgetpos manual page says:
    The fgetpos() and fsetpos() functions are alternate interfaces equivalent    to ftell() and fseek() (with whence
setto SEEK_SET ), setting and stor-    ing the current value of the file offset into or from the object refer-    enced
bypos. On some (non-UNIX) systems an ``fpos_t'' object may be a    complex object and these routines may be the only
wayto portably reposi-    tion a text stream.
 

I poked around and found this Usenet posting:

http://groups.google.com/groups?q=C+off_t+standard+integral&hl=en&lr=&ie=UTF-8&oe=UTF-8&selm=E958tG.8tH%40root.co.uk&rnum=1

stating that while off_t must be arithmetic, it doesn't have to be
integral, meaning it could be float or double, which can't be shifted.

However, since we don't know if we support any non-integral off_t
platforms, and because a configure test would require us to have two
code paths for with/without integral off_t, I suggest we apply my
version of Philip's patch and let's see if everyone can compile it
cleanly.  It does have the advantage of being more portable on systems
that do have integral off_t, which I think is most/all of our supported
platforms.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: pg_dump and large files - is this a problem?

From
Philip Warner
Date:
At 12:00 PM 22/10/2002 -0400, Bruce Momjian wrote:
>It does have the advantage of being more portable on systems
>that do have integral off_t

I suspect it is no more portable than determining storage order by using 
'int i = 256', then writing in storage order, and has the disadvantage that 
it may break as discussed.

AFAICT, using storage order will not break under any circumstances within 
one OS/architecture (unlike using shift), and will not break any more often 
than using shift in cases where off_t is integral.


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                 |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/



Re: pg_dump and large files - is this a problem?

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> However, since we don't know if we support any non-integral off_t
> platforms, and because a configure test would require us to have two
> code paths for with/without integral off_t, I suggest we apply my
> version of Philip's patch and let's see if everyone can compile it
> cleanly.

Actually, it looks to me like configure will spit up if off_t is not
an integral type:
/* Check that off_t can represent 2**63 - 1 correctly.   We can't simply define LARGE_OFF_T to be 9223372036854775807,
since some C++ compilers masquerading as C compilers   incorrectly reject 9223372036854775807.  */
 
#define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62)) int off_t_is_large[(LARGE_OFF_T % 2147483629 == 721
     && LARGE_OFF_T % 2147483647 == 1)          ? 1 : -1];
 

So I think we're wasting our time to debate whether we need to support
non-integral off_t ... let's just apply Bruce's version and wait to
see if anyone has a problem before doing more work.
        regards, tom lane


Re: pg_dump and large files - is this a problem?

From
Bruce Momjian
Date:
Philip Warner wrote:
> At 12:00 PM 22/10/2002 -0400, Bruce Momjian wrote:
> >It does have the advantage of being more portable on systems
> >that do have integral off_t
> 
> I suspect it is no more portable than determining storage order by using 
> 'int i = 256', then writing in storage order, and has the disadvantage that 
> it may break as discussed.
> 
> AFAICT, using storage order will not break under any circumstances within 
> one OS/architecture (unlike using shift), and will not break any more often 
> than using shift in cases where off_t is integral.

Your version will break more often because we are assuming we can
determine the endian-ness of the OS, _and_ for quad off_t types,
assuming we know that is stored the same too.  While we have ending for
int's, I have no idea if quads are always stored the same.  By accessing
it as an integral type, we make certain it is output the same way every
time for every OS.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: pg_dump and large files - is this a problem?

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > However, since we don't know if we support any non-integral off_t
> > platforms, and because a configure test would require us to have two
> > code paths for with/without integral off_t, I suggest we apply my
> > version of Philip's patch and let's see if everyone can compile it
> > cleanly.
> 
> Actually, it looks to me like configure will spit up if off_t is not
> an integral type:
> 
>  /* Check that off_t can represent 2**63 - 1 correctly.
>     We can't simply define LARGE_OFF_T to be 9223372036854775807,
>     since some C++ compilers masquerading as C compilers
>     incorrectly reject 9223372036854775807.  */
> #define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62))
>   int off_t_is_large[(LARGE_OFF_T % 2147483629 == 721
>                && LARGE_OFF_T % 2147483647 == 1)
>               ? 1 : -1];
> 
> So I think we're wasting our time to debate whether we need to support
> non-integral off_t ... let's just apply Bruce's version and wait to
> see if anyone has a problem before doing more work.

I am concerned about one more thing.  On BSD/OS, we have off_t of quad
(8 byte), but we don't have fseeko, so this call looks questionable:
if (fseeko(AH->FH, tctx->dataPos, SEEK_SET) != 0)

In this case, dataPos is off_t (8 bytes), while fseek only accepts long
in that parameter (4 bytes).  When this code is hit, a file > 4 gigs
will seek to the wrong offset, I am afraid.  Also, I don't understand
why the compiler doesn't produce a warning.

I wonder if I should add a conditional test so this code is hit only if
HAVE_FSEEKO is defined.  There is alternative code for all the non-zero
fseeks.

Comments?

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: pg_dump and large files - is this a problem?

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Your version will break more often because we are assuming we can
> determine the endian-ness of the OS, _and_ for quad off_t types,
> assuming we know that is stored the same too.  While we have ending for
> int's, I have no idea if quads are always stored the same.

There is precedent for problems of that ilk, too, cf PDP_ENDIAN: years
ago someone made double-word-integer software routines and did not
think twice about which word should appear first in storage, with the
consequence that the storage order was neither little-endian nor
big-endian.  (We have exactly the same issue with our CRC routines for
compilers without int64: the two-int32 struct is defined in a way that's
compatible with little-endian storage, and on a big-endian machine it'll
produce a funny storage order.)

Unless someone can point to a supported (or potentially interesting)
platform on which off_t is indeed not integral, I think the shift-based
code is our safest bet.  (The precedent of the off_t checking code in
configure makes me really doubt that there are any platforms with
non-integral off_t.)
        regards, tom lane


Re: pg_dump and large files - is this a problem?

From
Bruce Momjian
Date:
Patch applied with shift <</>> changes by me.  Thanks.

---------------------------------------------------------------------------


Philip Warner wrote:
> 
> I have put the latest patch at:
> 
>      http://downloads.rhyme.com.au/postgresql/pg_dump/
> 
> along with two dump files of the regression DB, one with 4 byte
> and the other with 8 byte offsets. I can read/restore each from
> the other, so it looks pretty good. Once the endianness is tested,
> we should be OK.
> 
> Known problems:
> 
> - will not cope with > 4GB files and size_t not 64 bit.
> - when printing data position, it is assumed that off_t is UINT64
>    (we could remove this entirely - it's just for display)
> - if seek is not supported, then an intXX is assigned to off_t
>    when file offsets are needed. This *should* not cause a problem
>    since without seek, the offsets will not be written to the file.
> 
> Changes from Prior Version:
> 
> - No longer stores or outputs data length
> - Assumes result of ftello is correct if it disagrees with internally
>    kept tally.
> - 'pg_restore -l' now shows sizes of int and offset.
> 
> 
> ----------------------------------------------------------------
> Philip Warner                    |     __---_____
> Albatross Consulting Pty. Ltd.   |----/       -  \
> (A.B.N. 75 008 659 498)          |          /(@)   ______---_
> Tel: (+61) 0500 83 82 81         |                 _________  \
> Fax: (+61) 0500 83 82 82         |                 ___________ |
> Http://www.rhyme.com.au          |                /           \|
>                                   |    --________--
> PGP key available upon request,  |  /
> and from pgp5.ai.mit.edu:11371   |/
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: pg_dump and large files - is this a problem?

From
Bruce Momjian
Date:
Bruce Momjian wrote:
> > So I think we're wasting our time to debate whether we need to support
> > non-integral off_t ... let's just apply Bruce's version and wait to
> > see if anyone has a problem before doing more work.
>
> I am concerned about one more thing.  On BSD/OS, we have off_t of quad
> (8 byte), but we don't have fseeko, so this call looks questionable:
>
>     if (fseeko(AH->FH, tctx->dataPos, SEEK_SET) != 0)
>
> In this case, dataPos is off_t (8 bytes), while fseek only accepts long
> in that parameter (4 bytes).  When this code is hit, a file > 4 gigs
> will seek to the wrong offset, I am afraid.  Also, I don't understand
> why the compiler doesn't produce a warning.
>
> I wonder if I should add a conditional test so this code is hit only if
> HAVE_FSEEKO is defined.  There is alternative code for all the non-zero
> fseeks.

Here is a patch that I think fixes the problem I outlined above.  If
there is no fseeko(), it will not call fseek with a non-zero offset
unless sizeof(off_t) <= sizeof(long).

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
Index: src/bin/pg_dump/pg_backup_custom.c
===================================================================
RCS file: /cvsroot/pgsql-server/src/bin/pg_dump/pg_backup_custom.c,v
retrieving revision 1.22
diff -c -c -r1.22 pg_backup_custom.c
*** src/bin/pg_dump/pg_backup_custom.c    22 Oct 2002 19:15:23 -0000    1.22
--- src/bin/pg_dump/pg_backup_custom.c    22 Oct 2002 21:36:30 -0000
***************
*** 431,437 ****
      if (tctx->dataState == K_OFFSET_NO_DATA)
          return;

!     if (!ctx->hasSeek || tctx->dataState == K_OFFSET_POS_NOT_SET)
      {
          /* Skip over unnecessary blocks until we get the one we want. */

--- 431,441 ----
      if (tctx->dataState == K_OFFSET_NO_DATA)
          return;

!     if (!ctx->hasSeek || tctx->dataState == K_OFFSET_POS_NOT_SET
! #if !defined(HAVE_FSEEKO)
!         || sizeof(off_t) > sizeof(long)
! #endif
!         )
      {
          /* Skip over unnecessary blocks until we get the one we want. */

***************
*** 809,815 ****
           * be ok to just use the existing self-consistent block
           * formatting.
           */
!         if (ctx->hasSeek)
          {
              fseeko(AH->FH, tpos, SEEK_SET);
              WriteToc(AH);
--- 813,823 ----
           * be ok to just use the existing self-consistent block
           * formatting.
           */
!         if (ctx->hasSeek
! #if !defined(HAVE_FSEEKO)
!             && sizeof(off_t) <= sizeof(long)
! #endif
!             )
          {
              fseeko(AH->FH, tpos, SEEK_SET);
              WriteToc(AH);

Re: pg_dump and large files - is this a problem?

From
Peter Eisentraut
Date:
Bruce Momjian writes:

> I am concerned about one more thing.  On BSD/OS, we have off_t of quad
> (8 byte), but we don't have fseeko, so this call looks questionable:
>
>     if (fseeko(AH->FH, tctx->dataPos, SEEK_SET) != 0)

Maybe you want to ask your OS provider how the heck this is supposed to
work.  I mean, it's great to have wide types, but what's the point if the
API can't handle them?

-- 
Peter Eisentraut   peter_e@gmx.net



Re: pg_dump and large files - is this a problem?

From
Bruce Momjian
Date:
Peter Eisentraut wrote:
> Bruce Momjian writes:
> 
> > I am concerned about one more thing.  On BSD/OS, we have off_t of quad
> > (8 byte), but we don't have fseeko, so this call looks questionable:
> >
> >     if (fseeko(AH->FH, tctx->dataPos, SEEK_SET) != 0)
> 
> Maybe you want to ask your OS provider how the heck this is supposed to
> work.  I mean, it's great to have wide types, but what's the point if the
> API can't handle them?

Excellent question.  They do have fsetpos/fgetpos, and I think they
think you are supposed to use those.  However, they don't do seek from
current position, and they don't take an off_t, so I am confused myself.

I did ask on the mailing list and everyone kind of agreed it was a
missing feature.  However, because of the way we call fseeko not knowing
if it is a quad or a long, I think we have to add the checks to prevent
such wild seeks from happening.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: pg_dump and large files - is this a problem?

From
Philip Warner
Date:
At 05:37 PM 22/10/2002 -0400, Bruce Momjian wrote:
>!               if (ctx->hasSeek
>! #if !defined(HAVE_FSEEKO)
>!                       && sizeof(off_t) <= sizeof(long)
>! #endif
>!                       )

Just to clarify my understanding:

- HAVE_FSEEKO is tested & defined in configure
- If it is not defined, then all calls to fseeko will magically be 
translated to fseek calls, and use the 'long' parameter type.

Is that right?

If so, why don't we:

#if defined(HAVE_FSEEKO)
#define FILE_OFFSET off_t
#define FSEEK fseeko
#else
#define FILE_OFFSET long
#define FSEEK fseek
#end if

then replace all refs to off_t with FILE_OFFSET, and fseeko with FSEEK.

Existing checks etc will then refuse to load file offsets with significant 
bytes after the 4th byte, we will still use fseek/o in broken OS 
implementations of off_t.



----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                 |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/



Re: pg_dump and large files - is this a problem?

From
Bruce Momjian
Date:
Philip Warner wrote:
> At 05:37 PM 22/10/2002 -0400, Bruce Momjian wrote:
> >!               if (ctx->hasSeek
> >! #if !defined(HAVE_FSEEKO)
> >!                       && sizeof(off_t) <= sizeof(long)
> >! #endif
> >!                       )
> 
> Just to clarify my understanding:
> 
> - HAVE_FSEEKO is tested & defined in configure
> - If it is not defined, then all calls to fseeko will magically be 
> translated to fseek calls, and use the 'long' parameter type.
> 
> Is that right?
> 
> If so, why don't we:
> 
> #if defined(HAVE_FSEEKO)
> #define FILE_OFFSET off_t
> #define FSEEK fseeko
> #else
> #define FILE_OFFSET long
> #define FSEEK fseek
> #end if
> 
> then replace all refs to off_t with FILE_OFFSET, and fseeko with FSEEK.
> 
> Existing checks etc will then refuse to load file offsets with significant 
> bytes after the 4th byte, we will still use fseek/o in broken OS 
> implementations of off_t.

Uh, not exactly.  I have off_t as a quad, and I don't have fseeko, so
the above conditional doesn't work. I want to use off_t, but can't use
fseek().  As it turns out, the code already has options to handle no
fseek, so it seems to work anyway.  I think what you miss may be the
table of contents in the archive, if I am reading the code correctly.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: pg_dump and large files - is this a problem?

From
Philip Warner
Date:
At 10:46 PM 22/10/2002 -0400, Bruce Momjian wrote:
>Uh, not exactly.  I have off_t as a quad, and I don't have fseeko, so
>the above conditional doesn't work. I want to use off_t, but can't use
>fseek().

Then when you create dumps, they will be invalid since I assume that ftello 
is also broken in the same way. You need to fix _getFilePos as well. And 
any other place that uses an off_t needs to be looked at very carefully. 
The code was written assuming that if 'hasSeek' was set, then we could 
trust it.

Given that you say you do have support for some kind of 64 bt offset, I 
would be a lot happier with these changes if you did something akin to my 
original sauggestion:

#if defined(HAVE_FSEEKO)
#define FILE_OFFSET off_t
#define FSEEK fseeko
#elseif defined(HAVE_SOME_OTHER_FSEEK)
#define FILE_OFFSET some_other_offset
#define FSEEK some_other_fseek
#else
#define FILE_OFFSET long
#define FSEEK fseek
#end if

...assuming you have a non-broken 64 bit fseek/tell pair, then this will 
work in all cases, and make the code a lot less ugly (assuming of course 
the non-broken version can be shifted).



----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                 |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/



Re: pg_dump and large files - is this a problem?

From
Bruce Momjian
Date:
Sounds messy.  Let me see if I can code up an fseeko/ftello for BSD/OS
and add that to /port.  No reason to hold up beta for that, though.

I wonder if any other platforms have this limitation.  I think we need
to add some type of test for no-fseeko()/ftello() and sizeof(off_t) >
sizeof(long).  This fseeko/ftello/off_t is just too fluid, and the
failure modes too serious.

---------------------------------------------------------------------------

Philip Warner wrote:
> At 10:46 PM 22/10/2002 -0400, Bruce Momjian wrote:
> >Uh, not exactly.  I have off_t as a quad, and I don't have fseeko, so
> >the above conditional doesn't work. I want to use off_t, but can't use
> >fseek().
> 
> Then when you create dumps, they will be invalid since I assume that ftello 
> is also broken in the same way. You need to fix _getFilePos as well. And 
> any other place that uses an off_t needs to be looked at very carefully. 
> The code was written assuming that if 'hasSeek' was set, then we could 
> trust it.
> 
> Given that you say you do have support for some kind of 64 bt offset, I 
> would be a lot happier with these changes if you did something akin to my 
> original sauggestion:
> 
> #if defined(HAVE_FSEEKO)
> #define FILE_OFFSET off_t
> #define FSEEK fseeko
> #elseif defined(HAVE_SOME_OTHER_FSEEK)
> #define FILE_OFFSET some_other_offset
> #define FSEEK some_other_fseek
> #else
> #define FILE_OFFSET long
> #define FSEEK fseek
> #end if
> 
> ...assuming you have a non-broken 64 bit fseek/tell pair, then this will 
> work in all cases, and make the code a lot less ugly (assuming of course 
> the non-broken version can be shifted).
> 
> 
> 
> ----------------------------------------------------------------
> Philip Warner                    |     __---_____
> Albatross Consulting Pty. Ltd.   |----/       -  \
> (A.B.N. 75 008 659 498)          |          /(@)   ______---_
> Tel: (+61) 0500 83 82 81         |                 _________  \
> Fax: (+61) 0500 83 82 82         |                 ___________ |
> Http://www.rhyme.com.au          |                /           \|
>                                   |    --________--
> PGP key available upon request,  |  /
> and from pgp5.ai.mit.edu:11371   |/
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: pg_dump and large files - is this a problem?

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> I wonder if any other platforms have this limitation.  I think we need
> to add some type of test for no-fseeko()/ftello() and sizeof(off_t) >
> sizeof(long).  This fseeko/ftello/off_t is just too fluid, and the
> failure modes too serious.

I am wondering why pg_dump has to depend on either fseek or ftell.
        regards, tom lane


Re: pg_dump and large files - is this a problem?

From
Philip Warner
Date:
At 12:32 AM 23/10/2002 -0400, Tom Lane wrote:
>I am wondering why pg_dump has to depend on either fseek or ftell.

It doesn't - it just works better and has more features if they are 
available, much like zlib etc.


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                 |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/



Re: pg_dump and large files - is this a problem?

From
Philip Warner
Date:
At 12:29 AM 23/10/2002 -0400, Bruce Momjian wrote:
>This fseeko/ftello/off_t is just too fluid, and the
>failure modes too serious.

I agree. Can you think of a better solution than the one I suggested???


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                 |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/



Re: pg_dump and large files - is this a problem?

From
Bruce Momjian
Date:
OK, you are saying if we don't have fseeko(), there is no reason to use
off_t, and we may as well use long.  What limitations does that impose,
and are the limitations clear to the user.

What has me confused is that I only see two places that use a non-zero
fseeko, and in those cases, there is a non-fseeko code path that does
the same thing, or the call isn't actually required.  Both cases are in
pg_dump/pg_dump_custom.c.  It appears seeking in the file is an
optimization that prevents all the blocks from being read.  That is
fine, but we shouldn't introduce failure cases to do that.

If BSD/OS is the only problem OS, I can deal with that, but I have no
idea if other OS's have the same limitation, and because of the way our
code exists now, we are not even checking to see if there is a problem.

I did some poking around, and on BSD/OS, fgetpos/fsetpos use fpos_t,
which is actually off_t, and interestingly, lseek() uses off_t too. 
Seems only fseek/ftell is limited to long.  I can easily implemnt
fseeko/ftello using fgetpos/fsetpos, but that is only one OS.

One idea would be to patch up BSD/OS in backend/port/bsdi and add a
configure tests that actually fails if fseeko doesn't exist _and_
sizeof(off_t) > sizeof(long).  That would at least catch OS's before
they make >2gig backups that can't be restored.

---------------------------------------------------------------------------

Philip Warner wrote:
> At 10:46 PM 22/10/2002 -0400, Bruce Momjian wrote:
> >Uh, not exactly.  I have off_t as a quad, and I don't have fseeko, so
> >the above conditional doesn't work. I want to use off_t, but can't use
> >fseek().
> 
> Then when you create dumps, they will be invalid since I assume that ftello 
> is also broken in the same way. You need to fix _getFilePos as well. And 
> any other place that uses an off_t needs to be looked at very carefully. 
> The code was written assuming that if 'hasSeek' was set, then we could 
> trust it.
> 
> Given that you say you do have support for some kind of 64 bt offset, I 
> would be a lot happier with these changes if you did something akin to my 
> original sauggestion:
> 
> #if defined(HAVE_FSEEKO)
> #define FILE_OFFSET off_t
> #define FSEEK fseeko
> #elseif defined(HAVE_SOME_OTHER_FSEEK)
> #define FILE_OFFSET some_other_offset
> #define FSEEK some_other_fseek
> #else
> #define FILE_OFFSET long
> #define FSEEK fseek
> #end if
> 
> ...assuming you have a non-broken 64 bit fseek/tell pair, then this will 
> work in all cases, and make the code a lot less ugly (assuming of course 
> the non-broken version can be shifted).
> 
> 
> 
> ----------------------------------------------------------------
> Philip Warner                    |     __---_____
> Albatross Consulting Pty. Ltd.   |----/       -  \
> (A.B.N. 75 008 659 498)          |          /(@)   ______---_
> Tel: (+61) 0500 83 82 81         |                 _________  \
> Fax: (+61) 0500 83 82 82         |                 ___________ |
> Http://www.rhyme.com.au          |                /           \|
>                                   |    --________--
> PGP key available upon request,  |  /
> and from pgp5.ai.mit.edu:11371   |/
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: pg_dump and large files - is this a problem?

From
Philip Warner
Date:
At 01:02 AM 23/10/2002 -0400, Bruce Momjian wrote:

>OK, you are saying if we don't have fseeko(), there is no reason to use
>off_t, and we may as well use long.  What limitations does that impose,
>and are the limitations clear to the user.

What I'm saying is that if we have not got fseeko then we should use any 
'seek-class' function that returns a 64 bit value. We have already made the 
assumption that off_t is an integer; the same logic that came to that 
conclusion, applies just as validly to the other seek functions.

Secondly, if there is no 64 bit 'seek-class' function, then we should 
probably use a size_t, but a long would probably be fine too. I am not 
particularly attached to this part; long, int etc etc. Whatever is most 
likely to return an integer and work with whatever function we choose.

As to implications: assuming they are all integers (which as you know I 
don't like), we should have no problems.

If a system does not have any function to access 64 bit file offsets, then 
I'd say they are pretty unlikely to have files > 2GB.




----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                 |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/



Re: pg_dump and large files - is this a problem?

From
Bruce Momjian
Date:
Philip Warner wrote:
> At 01:02 AM 23/10/2002 -0400, Bruce Momjian wrote:
> 
> >OK, you are saying if we don't have fseeko(), there is no reason to use
> >off_t, and we may as well use long.  What limitations does that impose,
> >and are the limitations clear to the user.
> 
> What I'm saying is that if we have not got fseeko then we should use any 
> 'seek-class' function that returns a 64 bit value. We have already made the 
> assumption that off_t is an integer; the same logic that came to that 
> conclusion, applies just as validly to the other seek functions.

Oh, I see, so try to use fsetpos/fgetpos?  I can write wrappers for
those to look like fgetpos/fsetpos and put it in /port.

> Secondly, if there is no 64 bit 'seek-class' function, then we should 
> probably use a size_t, but a long would probably be fine too. I am not 
> particularly attached to this part; long, int etc etc. Whatever is most 
> likely to return an integer and work with whatever function we choose.
> 
> As to implications: assuming they are all integers (which as you know I 
> don't like), we should have no problems.
> 
> If a system does not have any function to access 64 bit file offsets, then 
> I'd say they are pretty unlikely to have files > 2GB.

OK, my OS can handle 64-bit files, but has only fgetpos/fsetpos, so I
could get that working.  The bigger question is what about OS's that
have 64-bit off_t/files but don't have any seek-type functions.  I did
research to find mine, but what about others that may have other
variants?

I think you are right that we have to not use off_t and use long if we
can't find a proper 64-bit seek function, but what are the failure modes
of doing this?  Exactly what happens for larger files?

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: pg_dump and large files - is this a problem?

From
Bruce Momjian
Date:
Philip Warner wrote:
> At 01:02 AM 23/10/2002 -0400, Bruce Momjian wrote:
> 
> >OK, you are saying if we don't have fseeko(), there is no reason to use
> >off_t, and we may as well use long.  What limitations does that impose,
> >and are the limitations clear to the user.
> 
> What I'm saying is that if we have not got fseeko then we should use any 
> 'seek-class' function that returns a 64 bit value. We have already made the 
> assumption that off_t is an integer; the same logic that came to that 
> conclusion, applies just as validly to the other seek functions.
> 
> Secondly, if there is no 64 bit 'seek-class' function, then we should 
> probably use a size_t, but a long would probably be fine too. I am not 
> particularly attached to this part; long, int etc etc. Whatever is most 
> likely to return an integer and work with whatever function we choose.
> 
> As to implications: assuming they are all integers (which as you know I 
> don't like), we should have no problems.
> 
> If a system does not have any function to access 64 bit file offsets, then 
> I'd say they are pretty unlikely to have files > 2GB.

Let me see if I can be clearer.  With shifting off_t, if that fails, we
will find out right away, at compile time.  I think that is acceptable.

What I am concerned about are cases that fail at runtime, specifically
during a restore of a >2gig file.  In my reading of the code, those
failures will be silent or will produce unusual error messages.  I don't
think we can ship code that has strange failure modes for data restore.

Now, if someone knows those failure cases, I would love to hear about
it.  If not, I will dig into the code today and find out where they are.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: pg_dump and large files - is this a problem?

From
Peter Eisentraut
Date:
Bruce Momjian writes:

> I think you are right that we have to not use off_t and use long if we
> can't find a proper 64-bit seek function, but what are the failure modes
> of doing this?  Exactly what happens for larger files?

First we need to decide what we want to happen and after that think about
how to implement it.  Given sizeof(off_t) > sizeof(long) and no fseeko(),
we have the following options:

1. Disable access to large files.

2. Seek in some other way.

What's it gonna be?

-- 
Peter Eisentraut   peter_e@gmx.net



Re: pg_dump and large files - is this a problem?

From
Bruce Momjian
Date:
Peter Eisentraut wrote:
> Bruce Momjian writes:
> 
> > I think you are right that we have to not use off_t and use long if we
> > can't find a proper 64-bit seek function, but what are the failure modes
> > of doing this?  Exactly what happens for larger files?
> 
> First we need to decide what we want to happen and after that think about
> how to implement it.  Given sizeof(off_t) > sizeof(long) and no fseeko(),
> we have the following options:
> 
> 1. Disable access to large files.
> 
> 2. Seek in some other way.
> 
> What's it gonna be?

OK, well BSD/OS now works, but I wonder if there are any other quad
off_t OS's out there without fseeko.

How would we disable access to large files?  Do we fstat the file and
see if it is too large?   I suppose we are looking for cases where the
file system has large files, but fseeko doesn't allow us to access them.
Should we leave this issue alone and wait to find another OS with this
problem, and we can then rejigger fseeko.c to handle that OS too?

Looking at the pg_dump code, it seems the fseeks are optional in there
anyway because it already has code to read the file sequentially rather
than use fseek, and the TOC case in pg_backup_custom.c says that is
optional too.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: pg_dump and large files - is this a problem?

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> First we need to decide what we want to happen and after that think about
> how to implement it.  Given sizeof(off_t) > sizeof(long) and no fseeko(),
> we have the following options:

It seems obvious to me that there are no platforms that offer
sizeof(off_t) > sizeof(long) but have no API for doing seeks with off_t.
That would be just plain silly.  IMHO it's acceptable for us to fail at
configure time if we can't figure out how to seek.

The question is *which* seek APIs we need to support.  Are there any
besides fseeko() and fgetpos()?
        regards, tom lane


Re: pg_dump and large files - is this a problem?

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> How would we disable access to large files?

I think configure should fail if it can't find a way to seek.
Workaround for anyone in that situation is configure --disable-largefile.
        regards, tom lane


Re: pg_dump and large files - is this a problem?

From
Bruce Momjian
Date:
Tom Lane wrote:
> Peter Eisentraut <peter_e@gmx.net> writes:
> > First we need to decide what we want to happen and after that think about
> > how to implement it.  Given sizeof(off_t) > sizeof(long) and no fseeko(),
> > we have the following options:
> 
> It seems obvious to me that there are no platforms that offer
> sizeof(off_t) > sizeof(long) but have no API for doing seeks with off_t.
> That would be just plain silly.  IMHO it's acceptable for us to fail at
> configure time if we can't figure out how to seek.

I would certainly be happy failing at configure time, so we know at the
start what is broken, rather than failures during restore.

> The question is *which* seek APIs we need to support.  Are there any
> besides fseeko() and fgetpos()?

What I have added is BSD/OS specific because only on BSD/OS do I know
fpos_t and off_t are the same type.  If we come up with other platforms,
we will have to deal with it then.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: pg_dump and large files - is this a problem?

From
Giles Lean
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:

> OK, well BSD/OS now works, but I wonder if there are any other quad
> off_t OS's out there without fseeko.

NetBSD prior to 1.6, released September 14, 2002. (Source: CVS logs.)

OpenBSD prior to 2.7, released June 15, 2000.  (Source: release notes.)

FreeBSD has had fseeko() for some time, but I'm not sure which release
introduced it -- perhaps 3.2.0, released May, 1999. (Source: CVS logs.)

Regards,

Giles






Re: pg_dump and large files - is this a problem?

From
Philip Warner
Date:
At 05:50 PM 23/10/2002 -0400, Bruce Momjian wrote:
>Looking at the pg_dump code, it seems the fseeks are optional in there
>anyway because it already has code to read the file sequentially rather

But there are features that are not available if it can't seek: eg. it will 
not restore in a different order to that in which it was written; it will 
not dump data offsets in the TOC so dump files can not be restored in 
alternate orders; restore times will be large for a single table (it has to 
read the entire file potentially).


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                 |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/



Re: pg_dump and large files - is this a problem?

From
Bruce Momjian
Date:
Philip Warner wrote:
> At 05:50 PM 23/10/2002 -0400, Bruce Momjian wrote:
> >Looking at the pg_dump code, it seems the fseeks are optional in there
> >anyway because it already has code to read the file sequentially rather
> 
> But there are features that are not available if it can't seek: eg. it will 
> not restore in a different order to that in which it was written; it will 
> not dump data offsets in the TOC so dump files can not be restored in 
> alternate orders; restore times will be large for a single table (it has to 
> read the entire file potentially).

OK, that helps.  We just got a list of 2 other OS's without fseeko and
with large file support.  Any NetBSD before Auguest 2002 has that
problem.  We are going to need to either get fseeko workarounds for
those, or disable those features in a meaningful way.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: pg_dump and large files - is this a problem?

From
Bruce Momjian
Date:
Giles Lean wrote:
> 
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> 
> > OK, well BSD/OS now works, but I wonder if there are any other quad
> > off_t OS's out there without fseeko.
> 
> NetBSD prior to 1.6, released September 14, 2002. (Source: CVS logs.)

OK, does pre-1.6 NetBSD have fgetpos/fsetpos that is off_t/quad?

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: pg_dump and large files - is this a problem?

From
Philip Warner
Date:
At 10:42 AM 23/10/2002 -0400, Bruce Momjian wrote:
>What I am concerned about are cases that fail at runtime, specifically
>during a restore of a >2gig file.

Please give an example that would still apply assuming we get a working 
seek/tell pair that works with whatever we use as an offset?

If you are concerned about reading a dump file with 8 byte offsets on a 
machine with 4 byte off_t, that case and it's permutations are already covered.


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                 |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/



Re: pg_dump and large files - is this a problem?

From
Bruce Momjian
Date:
Philip Warner wrote:
> At 10:42 AM 23/10/2002 -0400, Bruce Momjian wrote:
> >What I am concerned about are cases that fail at runtime, specifically
> >during a restore of a >2gig file.
> 
> Please give an example that would still apply assuming we get a working 
> seek/tell pair that works with whatever we use as an offset?

If we get this, everything is fine.  I have done that for BSD/OS today. 
I may need to do the same for NetBSD/OpenBSD too.

> If you are concerned about reading a dump file with 8 byte offsets on a 
> machine with 4 byte off_t, that case and it's permutations are already covered.

No, I know that is covered because it will report a proper error message
on the restore on the 4-byte off_t machine.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: pg_dump and large files - is this a problem?

From
Philip Warner
Date:
At 11:50 PM 23/10/2002 +0200, Peter Eisentraut wrote:

>1. Disable access to large files.
>
>2. Seek in some other way.

This gets my vote, but I would like to see a clean implementation (not huge 
quantities if ifdefs every time we call fseek); either we write our own 
fseek as Bruce seems to be suggesting, or we have a single header file that 
defines the FSEEK/FTELL/OFF_T to point to the 'right' functions, where 
'right' is defined as 'most likely to generate an integer and which makes 
use of the largest number of bytes'.

The way the code is currently written it does not matter if this is a 16 or 
3 byte value - so long as it is an integer.



----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                 |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/



Re: pg_dump and large files - is this a problem?

From
Bruce Momjian
Date:
Philip Warner wrote:
> At 11:50 PM 23/10/2002 +0200, Peter Eisentraut wrote:
> 
> >1. Disable access to large files.
> >
> >2. Seek in some other way.
> 
> This gets my vote, but I would like to see a clean implementation (not huge 
> quantities if ifdefs every time we call fseek); either we write our own 
> fseek as Bruce seems to be suggesting, or we have a single header file that 
> defines the FSEEK/FTELL/OFF_T to point to the 'right' functions, where 
> 'right' is defined as 'most likely to generate an integer and which makes 
> use of the largest number of bytes'.

We have to write another function because fsetpos doesn't do SEEK_CUR so
you have to implement it with more complex code.  It isn't a drop in
place thing.

> The way the code is currently written it does not matter if this is a 16 or 
> 3 byte value - so long as it is an integer.

Right. What we are assuming now is that off_t can be seeked using
whatever we defined for fseeko, which is incorrect in one, and now I
hear more than one OS.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: pg_dump and large files - is this a problem?

From
Philip Warner
Date:
At 09:36 PM 23/10/2002 -0400, Bruce Momjian wrote:
>We are going to need to either get fseeko workarounds for
>those, or disable those features in a meaningful way.

????? if we have not got a 64 bit seek function of any kind, then use a 32 
bit seek - the features don't need to be disabled. AFAICT, this is a 
non-issue: no 64 bit seek means no large files.

I'm not sure we should even worry about it, but if you are genuinely 
concerned that we have no 64 bit seek call, but we do have files > 4GB, 
then If you really want to disable seek, just modify the code that sets 
'hasSeek' - don't screw around with every seek call. But only modify clear 
it if the file is > 4GB.






----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                 |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/



Re: pg_dump and large files - is this a problem?

From
Philip Warner
Date:
At 09:41 PM 23/10/2002 -0400, Bruce Momjian wrote:
>If we get this, everything is fine.  I have done that for BSD/OS today.
>I may need to do the same for NetBSD/OpenBSD too.

What did you do to achieve this?


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                 |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/



Re: pg_dump and large files - is this a problem?

From
Bruce Momjian
Date:
Philip Warner wrote:
> At 09:41 PM 23/10/2002 -0400, Bruce Momjian wrote:
> >If we get this, everything is fine.  I have done that for BSD/OS today.
> >I may need to do the same for NetBSD/OpenBSD too.
> 
> What did you do to achieve this?

See src/port/fseeko.c in current CVS, with some configure.in glue.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: pg_dump and large files - is this a problem?

From
Philip Warner
Date:
At 09:45 PM 23/10/2002 -0400, Bruce Momjian wrote:
>We have to write another function because fsetpos doesn't do SEEK_CUR so
>you have to implement it with more complex code.  It isn't a drop in
>place thing.

The only code that uses SEEK_CUR is the code to check if seek is available 
- I am ver happy to change that to SEEK_SET - I can't even recall why I 
used SEEK_CUR. The code that does the real seeks uses SEEK_SET.



----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                 |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/



Re: pg_dump and large files - is this a problem?

From
Philip Warner
Date:
At 11:55 AM 24/10/2002 +1000, Philip Warner wrote:

>The only code that uses SEEK_CUR is the code to check if seek is available 
>- I am ver happy to change that to SEEK_SET - I can't even recall why I 
>used SEEK_CUR. The code that does the real seeks uses SEEK_SET.

Come to think of it:
    ctx->hasSeek = (fseeko(AH->FH, 0, SEEK_CUR) == 0);

should be replaced by:

#ifdef HAS_FSEEK[O]    ctx->hasSeek = TRUE;
#else    ctx->hasSeek = FALSE;
#endif

Since we're now checking for it in configure, we should remove the checks 
from the pg_dump code.




----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                 |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/



Re: pg_dump and large files - is this a problem?

From
Bruce Momjian
Date:
Philip Warner wrote:
> At 09:45 PM 23/10/2002 -0400, Bruce Momjian wrote:
> >We have to write another function because fsetpos doesn't do SEEK_CUR so
> >you have to implement it with more complex code.  It isn't a drop in
> >place thing.
> 
> The only code that uses SEEK_CUR is the code to check if seek is available 
> - I am ver happy to change that to SEEK_SET - I can't even recall why I 
> used SEEK_CUR. The code that does the real seeks uses SEEK_SET.

There are other problems.  fgetpos() expects a pointer to an fpos_t,
while ftello just returns off_t, so you need a local variable in the
function to pass to fgetpos() and they return that from the function.

It is much cleaner to just duplicate the entire API so you don't have
any limitations or failure cases.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: pg_dump and large files - is this a problem?

From
Bruce Momjian
Date:
Well, that certainly changes the functionality of the code.  I thought
that fseeko test was done so that things that couldn't be seeked on were
detected.  Not sure what isn't seek-able, maybe named pipes.  I thought
it was testing that so I didn't touch that variable.

This was my original thought, that we have non-fseeko code in place. 
Can we just trigger the non-fseeko code on HAS_FSEEKO.  The code would
be something like:if (sizeof(long) >= sizeof(off_t))    ctx->hasSeek = TRUE;else#ifdef HAVE_FSEEKO     ctx->hasSeek =
TRUE;#else    ctx->hasSeek = FALSE;#endif
 

---------------------------------------------------------------------------

Philip Warner wrote:
> At 11:55 AM 24/10/2002 +1000, Philip Warner wrote:
> 
> >The only code that uses SEEK_CUR is the code to check if seek is available 
> >- I am ver happy to change that to SEEK_SET - I can't even recall why I 
> >used SEEK_CUR. The code that does the real seeks uses SEEK_SET.
> 
> Come to think of it:
> 
>      ctx->hasSeek = (fseeko(AH->FH, 0, SEEK_CUR) == 0);
> 
> should be replaced by:
> 
> #ifdef HAS_FSEEK[O]
>      ctx->hasSeek = TRUE;
> #else
>      ctx->hasSeek = FALSE;
> #endif
> 
> Since we're now checking for it in configure, we should remove the checks 
> from the pg_dump code.
> 
> 
> 
> 
> ----------------------------------------------------------------
> Philip Warner                    |     __---_____
> Albatross Consulting Pty. Ltd.   |----/       -  \
> (A.B.N. 75 008 659 498)          |          /(@)   ______---_
> Tel: (+61) 0500 83 82 81         |                 _________  \
> Fax: (+61) 0500 83 82 82         |                 ___________ |
> Http://www.rhyme.com.au          |                /           \|
>                                   |    --________--
> PGP key available upon request,  |  /
> and from pgp5.ai.mit.edu:11371   |/
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
> 
> http://www.postgresql.org/users-lounge/docs/faq.html
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: pg_dump and large files - is this a problem?

From
Philip Warner
Date:
At 10:08 PM 23/10/2002 -0400, Bruce Momjian wrote:
>Well, that certainly changes the functionality of the code.  I thought
>that fseeko test was done so that things that couldn't be seeked on were
>detected.

You are quite correct. It should read:
        #ifdef HAVE_FSEEKO             ctx->hasSeek = fseeko(...,SEEK_SET);        #else             ctx->hasSeek =
FALSE;       #endif
 

pipes are the main case for which we are checking.


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                 |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/



Re: pg_dump and large files - is this a problem?

From
Philip Warner
Date:
At 10:03 PM 23/10/2002 -0400, Bruce Momjian wrote:
>It is much cleaner to just duplicate the entire API so you don't have
>any limitations or failure cases.

We may still end up using macros in pg_dump to cope with cases where off_t 
& fseeko are not defined - if there are any. I presume we would then just 
revert to calling fseek/ftell etc.




----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                 |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/



Re: pg_dump and large files - is this a problem?

From
Bruce Momjian
Date:
Philip Warner wrote:
> At 10:03 PM 23/10/2002 -0400, Bruce Momjian wrote:
> >It is much cleaner to just duplicate the entire API so you don't have
> >any limitations or failure cases.
> 
> We may still end up using macros in pg_dump to cope with cases where off_t 
> & fseeko are not defined - if there are any. I presume we would then just 
> revert to calling fseek/ftell etc.

Well, we have fseeko falling back to fseek already, so that is working
fine.  I don't think we will find any OS's without off_t.  We just need
a little smarts.  Let me see if I can work on it now.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: pg_dump and large files - is this a problem?

From
Giles Lean
Date:
> OK, does pre-1.6 NetBSD have fgetpos/fsetpos that is off_t/quad?

Yes:
   int   fgetpos(FILE *stream, fpos_t *pos);
   int   fsetpos(FILE *stream, const fpos_t *pos);

Per comments in <stdio.h> fpos_t is the same format as off_t, and
off_t and fpos_t have been 64 bit since 1994.
   http://cvsweb.netbsd.org/bsdweb.cgi/basesrc/include/stdio.h

Regards,

Giles






Re: pg_dump and large files - is this a problem?

From
Bruce Momjian
Date:
Looks like I have some more work to do.  Thanks.

---------------------------------------------------------------------------

Giles Lean wrote:
> 
> > OK, does pre-1.6 NetBSD have fgetpos/fsetpos that is off_t/quad?
> 
> Yes:
> 
>     int
>     fgetpos(FILE *stream, fpos_t *pos);
> 
>     int
>     fsetpos(FILE *stream, const fpos_t *pos);
> 
> Per comments in <stdio.h> fpos_t is the same format as off_t, and
> off_t and fpos_t have been 64 bit since 1994.
> 
>     http://cvsweb.netbsd.org/bsdweb.cgi/basesrc/include/stdio.h
> 
> Regards,
> 
> Giles
> 
> 
> 
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: pg_dump and large files - is this a problem?

From
Bruce Momjian
Date:
OK, NetBSD added.  

Any other OS's need this?  Is it safe for me to code something that
assumes fpos_t and off_t are identical?  I can't think of a good way to
test if two data types are identical.  I don't think sizeof is enough.

---------------------------------------------------------------------------

Giles Lean wrote:
> 
> > OK, does pre-1.6 NetBSD have fgetpos/fsetpos that is off_t/quad?
> 
> Yes:
> 
>     int
>     fgetpos(FILE *stream, fpos_t *pos);
> 
>     int
>     fsetpos(FILE *stream, const fpos_t *pos);
> 
> Per comments in <stdio.h> fpos_t is the same format as off_t, and
> off_t and fpos_t have been 64 bit since 1994.
> 
>     http://cvsweb.netbsd.org/bsdweb.cgi/basesrc/include/stdio.h
> 
> Regards,
> 
> Giles
> 
> 
> 
> 
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: pg_dump and large files - is this a problem?

From
"Zeugswetter Andreas SB SD"
Date:
> The question is *which* seek APIs we need to support.  Are there any
> besides fseeko() and fgetpos()?

On AIX we have
int fseeko64 (FILE* Stream, off64_t Offset, int Whence);
which is intended for large file access for programs that do NOT
#define _LARGE_FILES

It is functionality that is available if _LARGE_FILE_API is defined,
which is the default if _LARGE_FILES is not defined.

That would have been my preferred way of handling large files on AIX
in the two/three? places that need it (pg_dump/restore, psql and backend COPY).
This would have had the advantage that off_t is not 64 bit in all other places
where it is actually not needed, no ?

Andreas


Re: pg_dump and large files - is this a problem?

From
Peter Eisentraut
Date:
Bruce Momjian writes:

> OK, NetBSD added.
>
> Any other OS's need this?  Is it safe for me to code something that
> assumes fpos_t and off_t are identical?  I can't think of a good way to
> test if two data types are identical.  I don't think sizeof is enough.

No, you can't assume that fpos_t and off_t are identical.

But you can simulate a long fseeko() by calling fseek() multiple times, so
it should be possible to write a replacement that works on all systems.

-- 
Peter Eisentraut   peter_e@gmx.net



Re: pg_dump and large files - is this a problem?

From
Bruce Momjian
Date:
Peter Eisentraut wrote:
> Bruce Momjian writes:
> 
> > OK, NetBSD added.
> >
> > Any other OS's need this?  Is it safe for me to code something that
> > assumes fpos_t and off_t are identical?  I can't think of a good way to
> > test if two data types are identical.  I don't think sizeof is enough.
> 
> No, you can't assume that fpos_t and off_t are identical.

I was wondering --- if fpos_t and off_t are identical sizeof, and fpos_t
can do shift << or >>, that means fpos_t is also integral like off_t.
Can I then assume they are the same?

> But you can simulate a long fseeko() by calling fseek() multiple times, so
> it should be possible to write a replacement that works on all systems.

Yes, but I can't simulate ftello, so I then can't do SEEK_CUR. and if I
can't duplicate the entire API, I don't want to try.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: pg_dump and large files - is this a problem?

From
Bruce Momjian
Date:
Philip Warner wrote:
> At 10:08 PM 23/10/2002 -0400, Bruce Momjian wrote:
> >Well, that certainly changes the functionality of the code.  I thought
> >that fseeko test was done so that things that couldn't be seeked on were
> >detected.
>
> You are quite correct. It should read:
>
>          #ifdef HAVE_FSEEKO
>               ctx->hasSeek = fseeko(...,SEEK_SET);
>          #else
>               ctx->hasSeek = FALSE;
>          #endif
>
> pipes are the main case for which we are checking.

OK, I have applied the following patch to set hasSeek only if
fseek/fseeko is reliable.  This takes care of the random failure case
for large files.  Now I need to see if I can get the custom fseeko
working for more platforms.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
Index: src/bin/pg_dump/common.c
===================================================================
RCS file: /cvsroot/pgsql-server/src/bin/pg_dump/common.c,v
retrieving revision 1.71
diff -c -c -r1.71 common.c
*** src/bin/pg_dump/common.c    9 Oct 2002 16:20:25 -0000    1.71
--- src/bin/pg_dump/common.c    25 Oct 2002 01:30:51 -0000
***************
*** 290,296 ****
           * attr with the same name, then only dump it if:
           *
           * - it is NOT NULL and zero parents are NOT NULL
!          *   OR
           * - it has a default value AND the default value does not match
           *   all parent default values, or no parents specify a default.
           *
--- 290,296 ----
           * attr with the same name, then only dump it if:
           *
           * - it is NOT NULL and zero parents are NOT NULL
!          *   OR
           * - it has a default value AND the default value does not match
           *   all parent default values, or no parents specify a default.
           *
Index: src/bin/pg_dump/pg_backup_archiver.c
===================================================================
RCS file: /cvsroot/pgsql-server/src/bin/pg_dump/pg_backup_archiver.c,v
retrieving revision 1.59
diff -c -c -r1.59 pg_backup_archiver.c
*** src/bin/pg_dump/pg_backup_archiver.c    22 Oct 2002 19:15:23 -0000    1.59
--- src/bin/pg_dump/pg_backup_archiver.c    25 Oct 2002 01:30:57 -0000
***************
*** 2338,2343 ****
--- 2338,2369 ----
  }


+ /*
+  * checkSeek
+  *      check to see if fseek can be performed.
+  */
+
+ bool
+ checkSeek(FILE *fp)
+ {
+
+     if (fseek(fp, 0, SEEK_CUR) != 0)
+         return false;
+     else if (sizeof(off_t) > sizeof(long))
+     /*
+      *    At this point, off_t is too large for long, so we return
+      *    based on whether an off_t version of fseek is available.
+      */
+ #ifdef HAVE_FSEEKO
+         return true;
+ #else
+         return false;
+ #endif
+     else
+         return true;
+ }
+
+
  static void
  _SortToc(ArchiveHandle *AH, TocSortCompareFn fn)
  {
Index: src/bin/pg_dump/pg_backup_archiver.h
===================================================================
RCS file: /cvsroot/pgsql-server/src/bin/pg_dump/pg_backup_archiver.h,v
retrieving revision 1.48
diff -c -c -r1.48 pg_backup_archiver.h
*** src/bin/pg_dump/pg_backup_archiver.h    22 Oct 2002 19:15:23 -0000    1.48
--- src/bin/pg_dump/pg_backup_archiver.h    25 Oct 2002 01:30:58 -0000
***************
*** 27,32 ****
--- 27,33 ----

  #include "postgres_fe.h"

+ #include <stdio.h>
  #include <time.h>
  #include <errno.h>

***************
*** 284,289 ****
--- 285,291 ----
  extern void WriteDataChunks(ArchiveHandle *AH);

  extern int    TocIDRequired(ArchiveHandle *AH, int id, RestoreOptions *ropt);
+ extern bool checkSeek(FILE *fp);

  /*
   * Mandatory routines for each supported format
Index: src/bin/pg_dump/pg_backup_custom.c
===================================================================
RCS file: /cvsroot/pgsql-server/src/bin/pg_dump/pg_backup_custom.c,v
retrieving revision 1.22
diff -c -c -r1.22 pg_backup_custom.c
*** src/bin/pg_dump/pg_backup_custom.c    22 Oct 2002 19:15:23 -0000    1.22
--- src/bin/pg_dump/pg_backup_custom.c    25 Oct 2002 01:31:01 -0000
***************
*** 179,185 ****
          if (!AH->FH)
              die_horribly(AH, modulename, "could not open archive file %s: %s\n", AH->fSpec, strerror(errno));

!         ctx->hasSeek = (fseeko(AH->FH, 0, SEEK_CUR) == 0);
      }
      else
      {
--- 179,185 ----
          if (!AH->FH)
              die_horribly(AH, modulename, "could not open archive file %s: %s\n", AH->fSpec, strerror(errno));

!         ctx->hasSeek = checkSeek(AH->FH);
      }
      else
      {
***************
*** 190,196 ****
          if (!AH->FH)
              die_horribly(AH, modulename, "could not open archive file %s: %s\n", AH->fSpec, strerror(errno));

!         ctx->hasSeek = (fseeko(AH->FH, 0, SEEK_CUR) == 0);

          ReadHead(AH);
          ReadToc(AH);
--- 190,196 ----
          if (!AH->FH)
              die_horribly(AH, modulename, "could not open archive file %s: %s\n", AH->fSpec, strerror(errno));

!         ctx->hasSeek = checkSeek(AH->FH);

          ReadHead(AH);
          ReadToc(AH);
Index: src/bin/pg_dump/pg_backup_files.c
===================================================================
RCS file: /cvsroot/pgsql-server/src/bin/pg_dump/pg_backup_files.c,v
retrieving revision 1.20
diff -c -c -r1.20 pg_backup_files.c
*** src/bin/pg_dump/pg_backup_files.c    22 Oct 2002 19:15:23 -0000    1.20
--- src/bin/pg_dump/pg_backup_files.c    25 Oct 2002 01:31:01 -0000
***************
*** 129,135 ****
          if (AH->FH == NULL)
              die_horribly(NULL, modulename, "could not open output file: %s\n", strerror(errno));

!         ctx->hasSeek = (fseeko(AH->FH, 0, SEEK_CUR) == 0);

          if (AH->compression < 0 || AH->compression > 9)
              AH->compression = Z_DEFAULT_COMPRESSION;
--- 129,135 ----
          if (AH->FH == NULL)
              die_horribly(NULL, modulename, "could not open output file: %s\n", strerror(errno));

!         ctx->hasSeek = checkSeek(AH->FH);

          if (AH->compression < 0 || AH->compression > 9)
              AH->compression = Z_DEFAULT_COMPRESSION;
***************
*** 147,153 ****
          if (AH->FH == NULL)
              die_horribly(NULL, modulename, "could not open input file: %s\n", strerror(errno));

!         ctx->hasSeek = (fseeko(AH->FH, 0, SEEK_CUR) == 0);

          ReadHead(AH);
          ReadToc(AH);
--- 147,153 ----
          if (AH->FH == NULL)
              die_horribly(NULL, modulename, "could not open input file: %s\n", strerror(errno));

!         ctx->hasSeek = checkSeek(AH->FH);

          ReadHead(AH);
          ReadToc(AH);
Index: src/bin/pg_dump/pg_backup_tar.c
===================================================================
RCS file: /cvsroot/pgsql-server/src/bin/pg_dump/pg_backup_tar.c,v
retrieving revision 1.31
diff -c -c -r1.31 pg_backup_tar.c
*** src/bin/pg_dump/pg_backup_tar.c    22 Oct 2002 19:15:23 -0000    1.31
--- src/bin/pg_dump/pg_backup_tar.c    25 Oct 2002 01:31:04 -0000
***************
*** 190,196 ****
           */
          /* setvbuf(ctx->tarFH, NULL, _IONBF, 0); */

!         ctx->hasSeek = (fseeko(ctx->tarFH, 0, SEEK_CUR) == 0);

          if (AH->compression < 0 || AH->compression > 9)
              AH->compression = Z_DEFAULT_COMPRESSION;
--- 190,196 ----
           */
          /* setvbuf(ctx->tarFH, NULL, _IONBF, 0); */

!         ctx->hasSeek = checkSeek(ctx->tarFH);

          if (AH->compression < 0 || AH->compression > 9)
              AH->compression = Z_DEFAULT_COMPRESSION;
***************
*** 227,233 ****

          ctx->tarFHpos = 0;

!         ctx->hasSeek = (fseeko(ctx->tarFH, 0, SEEK_CUR) == 0);

          /*
           * Forcibly unmark the header as read since we use the lookahead
--- 227,233 ----

          ctx->tarFHpos = 0;

!         ctx->hasSeek = checkSeek(ctx->tarFH);

          /*
           * Forcibly unmark the header as read since we use the lookahead

Re: pg_dump and large files - is this a problem?

From
Bruce Momjian
Date:
Zeugswetter Andreas SB SD wrote:
> 
> > The question is *which* seek APIs we need to support.  Are there any
> > besides fseeko() and fgetpos()?
> 
> On AIX we have 
> int fseeko64 (FILE* Stream, off64_t Offset, int Whence);
> which is intended for large file access for programs that do NOT
> #define _LARGE_FILES
> 
> It is functionality that is available if _LARGE_FILE_API is defined,
> which is the default if _LARGE_FILES is not defined.
> 
> That would have been my preferred way of handling large files on AIX
> in the two/three? places that need it (pg_dump/restore, psql and backend COPY).
> This would have had the advantage that off_t is not 64 bit in all other places
> where it is actually not needed, no ?

OK, I am focusing on AIX now.  I don't think we can go down the road of
saying where large file support is needed or not needed.  I think for
each platform either we support large files or we don't.  Is there a way
to have off_t be 64 bits everywhere, and if it is, why wouldn't we just
enable that rather than poke around figuring out where it is needed?

Also, I have the open item:
Fix AIX + Large File + Flex problem

Is there an AIX problem with Flex?

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: pg_dump and large files - is this a problem?

From
Philip Warner
Date:
The patch will not work. Please reread my quoted email.

At 09:32 PM 24/10/2002 -0400, Bruce Momjian wrote:
>Philip Warner wrote:
> >
> > You are quite correct. It should read:
> >
> >          #ifdef HAVE_FSEEKO
> >               ctx->hasSeek = fseeko(...,SEEK_SET);
> >          #else
> >               ctx->hasSeek = FALSE;
> >          #endif
> >
> > pipes are the main case for which we are checking.
>
>OK, I have applied the following patch to set hasSeek only if
>fseek/fseeko is reliable.



----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                 |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/



Re: pg_dump and large files - is this a problem?

From
Bruce Momjian
Date:
You are going to have to be more specific than that.

---------------------------------------------------------------------------

Philip Warner wrote:
> 
> The patch will not work. Please reread my quoted email.
> 
> At 09:32 PM 24/10/2002 -0400, Bruce Momjian wrote:
> >Philip Warner wrote:
> > >
> > > You are quite correct. It should read:
> > >
> > >          #ifdef HAVE_FSEEKO
> > >               ctx->hasSeek = fseeko(...,SEEK_SET);
> > >          #else
> > >               ctx->hasSeek = FALSE;
> > >          #endif
> > >
> > > pipes are the main case for which we are checking.
> >
> >OK, I have applied the following patch to set hasSeek only if
> >fseek/fseeko is reliable.
> 
> 
> 
> ----------------------------------------------------------------
> Philip Warner                    |     __---_____
> Albatross Consulting Pty. Ltd.   |----/       -  \
> (A.B.N. 75 008 659 498)          |          /(@)   ______---_
> Tel: (+61) 0500 83 82 81         |                 _________  \
> Fax: (+61) 0500 83 82 82         |                 ___________ |
> Http://www.rhyme.com.au          |                /           \|
>                                   |    --________--
> PGP key available upon request,  |  /
> and from pgp5.ai.mit.edu:11371   |/
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: pg_dump and large files - is this a problem?

From
Philip Warner
Date:
At 09:56 PM 24/10/2002 -0400, Bruce Momjian wrote:
> > > > You are quite correct. It should read:
> > > >
> > > >          #ifdef HAVE_FSEEKO
> > > >               ctx->hasSeek = fseeko(...,SEEK_SET);
                                    ^^^^^^^^^^^^^^^^^^^^^^


> > > >          #else
> > > >               ctx->hasSeek = FALSE;
> > > >          #endif

----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                 |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/



Re: pg_dump and large files - is this a problem?

From
Philip Warner
Date:
At 09:38 PM 24/10/2002 -0400, Bruce Momjian wrote:
>OK, I am focusing on AIX now.  I don't think we can go down the road of
>saying where large file support is needed or not needed.  I think for
>each platform either we support large files or we don't.

Rather than having a different patch file for each platform and refusing to 
code fseek/tell because we can't do SEEK_CUR, why not check for FSEEKO64 
and revert to a simple solution:

#ifdef HAVE_FSEEKO64
#define FSEEK fseeko64
#define FTELL ftello64
#define FILE_OFFSET off64_t
#else
#ifdef HAVE_FSEEKO
#define FSEEK fseeko
#define FTELL ftello
#define FILE_OFFSET off_t
#else
#if HAVE_FSEEK_BETTER_THAN_32_BIT
#define FSEEK FSEEK_BETTER_THAN_32_BIT
#define FTELL FTELL_BETTER_THAN_32_BIT
#define FILE_OFFSET FILE_OFFSET_BETTER_THAN_32_BIT
#else
#if sizeof(off_t) > sizeof(long)
#define IGNORE_FSEEK
#else
#define FSEEK fseek
#define FTELL ftell
#define FILE_OFFSET long
#end if...

Then use a correct checkSeek which also checks IGNORE_FSEEK.

AFAICT, this *will* do the job on all systems discussed. And we can 
certainly skip the HAVE_FSEEK_BETTER_THAN_32_BIT bit, but coding a trivial 
seek/tell pair for fsetpos/fgetpos is easy, even in a macro.



----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                 |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/



Re: pg_dump and large files - is this a problem?

From
Philip Warner
Date:
I just reread the patch; is it valid to assume fseek and fseeko have the 
same  failure modes? Or does the call to 'fseek' actually call fseeko?


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                 |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/



Re: pg_dump and large files - is this a problem?

From
Bruce Momjian
Date:
OK, finally figured it out. I had used fseek instead of fseeko.

---------------------------------------------------------------------------

Philip Warner wrote:
> At 09:56 PM 24/10/2002 -0400, Bruce Momjian wrote:
> > > > > You are quite correct. It should read:
> > > > >
> > > > >          #ifdef HAVE_FSEEKO
> > > > >               ctx->hasSeek = fseeko(...,SEEK_SET);
> 
>                                      ^^^^^^^^^^^^^^^^^^^^^^
> 
> 
> > > > >          #else
> > > > >               ctx->hasSeek = FALSE;
> > > > >          #endif
> 
> ----------------------------------------------------------------
> Philip Warner                    |     __---_____
> Albatross Consulting Pty. Ltd.   |----/       -  \
> (A.B.N. 75 008 659 498)          |          /(@)   ______---_
> Tel: (+61) 0500 83 82 81         |                 _________  \
> Fax: (+61) 0500 83 82 82         |                 ___________ |
> Http://www.rhyme.com.au          |                /           \|
>                                   |    --________--
> PGP key available upon request,  |  /
> and from pgp5.ai.mit.edu:11371   |/
> 
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: pg_dump and large files - is this a problem?

From
Bruce Momjian
Date:
Philip Warner wrote:
> 
> I just reread the patch; is it valid to assume fseek and fseeko have the 
> same  failure modes? Or does the call to 'fseek' actually call fseeko?

The fseek was a typo.  It should have been fseeko as you suggested.
CVS updated.

Your idea of using SEEK_SET is good, except I was concerned that the
checkSeek call will move the file pointer.  Is that OK?  It doesn't seem
appropriate.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: pg_dump and large files - is this a problem?

From
Bruce Momjian
Date:
Philip Warner wrote:
> Rather than having a different patch file for each platform and refusing to 
> code fseek/tell because we can't do SEEK_CUR, why not check for FSEEKO64 
> and revert to a simple solution:
> 
> #ifdef HAVE_FSEEKO64
> #define FSEEK fseeko64
> #define FTELL ftello64
> #define FILE_OFFSET off64_t

We can do this, but there is the problem of making the code pretty ugly.
Also, it is not immediately clear when off_t is something to be used by
fseek and when it is being used in file offsets that will never be
seeked.  I am concerned about perhaps making things worse than they are
now.

> #else
> #ifdef HAVE_FSEEKO
> #define FSEEK fseeko
> #define FTELL ftello
> #define FILE_OFFSET off_t
> #else
> #if HAVE_FSEEK_BETTER_THAN_32_BIT
> #define FSEEK FSEEK_BETTER_THAN_32_BIT
> #define FTELL FTELL_BETTER_THAN_32_BIT
> #define FILE_OFFSET FILE_OFFSET_BETTER_THAN_32_BIT
> #else
> #if sizeof(off_t) > sizeof(long)

Can't do sizeof() tests in cpp, which is where the #if is processed.

> #define IGNORE_FSEEK
> #else
> #define FSEEK fseek
> #define FTELL ftell
> #define FILE_OFFSET long
> #end if...
> 
> Then use a correct checkSeek which also checks IGNORE_FSEEK.
> 
> AFAICT, this *will* do the job on all systems discussed. And we can 
> certainly skip the HAVE_FSEEK_BETTER_THAN_32_BIT bit, but coding a trivial 
> seek/tell pair for fsetpos/fgetpos is easy, even in a macro.

I don't think we can assume that off_t can be passed to fset/getpos
unless we know the platform supports it, unless people think fpos_t
being integral and the same size as fpos_t is enough.

Also, I don't think these can be done a macro, perhaps
fseeko(...,SEEK_SET), but not the others, and not ftello.  See
port/fseeko.c for the reason.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: pg_dump and large files - is this a problem?

From
Philip Warner
Date:
At 12:07 AM 25/10/2002 -0400, Bruce Momjian wrote:
>I don't think we can assume that off_t can be passed to fset/getpos
>unless we know the platform supports it, unless people think fpos_t
>being integral and the same size as fpos_t is enough.

We don't need to. We would #define FILE_OFFSET as fpos_t in that case.

>Also, I don't think these can be done a macro, perhaps
>fseeko(...,SEEK_SET), but not the others, and not ftello.  See
>port/fseeko.c for the reason.

My understanding was that you could define a block and declare variables in 
a macro; just use a local for the temp storage of the in/out args. If this 
is the only thing stopping you adopting this approach, then I am very happy 
to try to code the macros properly.

However, I do get the impression that there is more resistance to the idea 
than just this.



----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                 |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/



Re: pg_dump and large files - is this a problem?

From
Philip Warner
Date:
At 11:51 PM 24/10/2002 -0400, Bruce Momjian wrote:
>Your idea of using SEEK_SET is good, except I was concerned that the
>checkSeek call will move the file pointer.  Is that OK?  It doesn't seem
>appropriate.

The call is made just after the file is opened (or it should be!), so 
SEEK_SET, 0 will not be a problem.



----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                 |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/



Re: pg_dump and large files - is this a problem?

From
"Zeugswetter Andreas SB SD"
Date:
> > > The question is *which* seek APIs we need to support.  Are there any
> > > besides fseeko() and fgetpos()?
> >
> > On AIX we have
> > int fseeko64 (FILE* Stream, off64_t Offset, int Whence);
> > which is intended for large file access for programs that do NOT
> > #define _LARGE_FILES
> >
> > It is functionality that is available if _LARGE_FILE_API is defined,
> > which is the default if _LARGE_FILES is not defined.
> >
> > That would have been my preferred way of handling large files on AIX
> > in the two/three? places that need it (pg_dump/restore, psql and backend COPY).
> > This would have had the advantage that off_t is not 64 bit in all other places
> > where it is actually not needed, no ?
>
> OK, I am focusing on AIX now.  I don't think we can go down the road of
> saying where large file support is needed or not needed.  I think for
> each platform either we support large files or we don't.  Is there a way
> to have off_t be 64 bits everywhere, and if it is, why wouldn't we just
> enable that rather than poke around figuring out where it is needed?

if _LARGE_FILES is defined, off_t is 64 bits on AIX (and fseeko works).
The problem with flex is, that the generated c file does #include <unistd.h>
before we #include "postgres.h".
In this situation _LARGE_FILES is not defined for unistd.h and unistd.h
chooses to define _LARGE_FILE_API, those two are not compatible.

If a general off_t of 64 bits is no performance problem, we should focus
on fixing the #include <unistd.h> issue, and forget what I wanted/hinted.
Peter E. has a patch for this in his pipeline. I can give it a second try
tomorrow.

Sorry for the late answer, I am very pressed currently :-(
Andreas


Re: pg_dump and large files - is this a problem?

From
Tom Lane
Date:
"Zeugswetter Andreas SB SD" <ZeugswetterA@spardat.at> writes:
> The problem with flex is, that the generated c file does #include <unistd.h>
> before we #include "postgres.h".
> In this situation _LARGE_FILES is not defined for unistd.h and unistd.h
> chooses to define _LARGE_FILE_API, those two are not compatible.

Yeah.  AFAICS the only way around this is to avoid doing any I/O
operations in the flex-generated files.  Fortunately, that's not much
of a restriction.
        regards, tom lane


Re: pg_dump and large files - is this a problem?

From
"Zeugswetter Andreas SB SD"
Date:
> > The problem with flex is, that the generated c file does #include <unistd.h>
> > before we #include "postgres.h".
> > In this situation _LARGE_FILES is not defined for unistd.h and unistd.h
> > chooses to define _LARGE_FILE_API, those two are not compatible.
>
> Yeah.  AFAICS the only way around this is to avoid doing any I/O
> operations in the flex-generated files.  Fortunately, that's not much
> of a restriction.

Unfortunately I do not think that is sufficient, since the problem is already
at the #include level. The compiler barfs on the second #include <unistd.h>
from postgres.h

Andreas


Re: pg_dump and large files - is this a problem?

From
Tom Lane
Date:
"Zeugswetter Andreas SB SD" <ZeugswetterA@spardat.at> writes:
>> Yeah.  AFAICS the only way around this is to avoid doing any I/O
>> operations in the flex-generated files.  Fortunately, that's not much
>> of a restriction.

> Unfortunately I do not think that is sufficient, since the problem is already
> at the #include level. The compiler barfs on the second #include <unistd.h>
> from postgres.h

AIX is too stupid to wrap unistd.h in an "#ifndef" to protect against
double inclusion?  I suppose we could do that for them...
        regards, tom lane


Re: pg_dump and large files - is this a problem?

From
"Zeugswetter Andreas SB SD"
Date:
> >> Yeah.  AFAICS the only way around this is to avoid doing any I/O
> >> operations in the flex-generated files.  Fortunately,
> that's not much
> >> of a restriction.
>
> > Unfortunately I do not think that is sufficient, since the problem is already
> > at the #include level. The compiler barfs on the second #include <unistd.h>
> > from postgres.h
>
> AIX is too stupid to wrap unistd.h in an "#ifndef" to protect against
> double inclusion?  I suppose we could do that for them...

I guess that is exactly not wanted, since that would hide the actual
problem, namely that _LARGE_FILE_API gets defined (off_t --> 32bit).
Thus I think IBM did not protect unistd.h on purpose.

Andreas


Re: pg_dump and large files - is this a problem?

From
Peter Eisentraut
Date:
Zeugswetter Andreas SB SD writes:

> > AIX is too stupid to wrap unistd.h in an "#ifndef" to protect against
> > double inclusion?  I suppose we could do that for them...
>
> I guess that is exactly not wanted, since that would hide the actual
> problem, namely that _LARGE_FILE_API gets defined (off_t --> 32bit).
> Thus I think IBM did not protect unistd.h on purpose.

I think the problem is more accurately described thus:  Flex generated
files include <stdio.h> before "postgres.h" due to the way it lays out the
code in the output.  stdio.h does something which prevents switching to
the large file model later on in postgres.h.  (This manifests itself in
unistd.h, but unistd.h itself is not the problem per se.)

The proposed fix was to include the flex output in some other file (such
as the corresponding grammar file) rather than to compile it separately.
The patch just needs to be tried out.

-- 
Peter Eisentraut   peter_e@gmx.net



Re: pg_dump and large files - is this a problem?

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> The proposed fix was to include the flex output in some other file (such
> as the corresponding grammar file) rather than to compile it separately.

Seems like a reasonable solution.  Can you make that happen in the next
day or two?  If not, I'll take a whack at it ...
        regards, tom lane


Re: pg_dump and large files - is this a problem?

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> I think the problem is more accurately described thus:  Flex generated
> files include <stdio.h> before "postgres.h" due to the way it lays out the
> code in the output.  stdio.h does something which prevents switching to
> the large file model later on in postgres.h.  (This manifests itself in
> unistd.h, but unistd.h itself is not the problem per se.)

> The proposed fix was to include the flex output in some other file (such
> as the corresponding grammar file) rather than to compile it separately.

I have made this change.  CVS tip should compile cleanly now on machines
where this is an issue.
        regards, tom lane


Re: pg_dump and large files - is this a problem?

From
"Zeugswetter Andreas SB SD"
Date:
Tom Lane writes:
> > I think the problem is more accurately described thus:  Flex generated
> > files include <stdio.h> before "postgres.h" due to the way it lays out the
> > code in the output.  stdio.h does something which prevents switching to
> > the large file model later on in postgres.h.  (This manifests itself in
> > unistd.h, but unistd.h itself is not the problem per se.)
>
> > The proposed fix was to include the flex output in some other file (such
> > as the corresponding grammar file) rather than to compile it separately.
>
> I have made this change.  CVS tip should compile cleanly now on machines
> where this is an issue.

Hmm, sorry for the late response, but I was away on the (long) weekend :-(
I think your patch might be the source for Christopher's build problem
(Compile problem on FreeBSD/Alpha).

Peter already had a patch, that I tested, modified a little, and sent him back
for inclusion into CVS.

I will attach his patch with my small fixes for cross reference.
The issue is, that you need to remove the #include "bootstrap_tokens.h"
line from the lex file.

Andreas

Attachment

Re: pg_dump and large files - is this a problem?

From
Tom Lane
Date:
"Zeugswetter Andreas SB SD" <ZeugswetterA@spardat.at> writes:
> The issue is, that you need to remove the #include "bootstrap_tokens.h"
> line from the lex file.

Good point; I'm surprised gcc doesn't spit up on that.  I've made that
mod and also added the inclusion-order-correction in pqsignal.c.
        regards, tom lane


Re: pg_dump and large files - is this a problem?

From
Bruce Momjian
Date:
Does this resolve our AIX compile problem?

---------------------------------------------------------------------------

Tom Lane wrote:
> "Zeugswetter Andreas SB SD" <ZeugswetterA@spardat.at> writes:
> > The issue is, that you need to remove the #include "bootstrap_tokens.h"
> > line from the lex file.
> 
> Good point; I'm surprised gcc doesn't spit up on that.  I've made that
> mod and also added the inclusion-order-correction in pqsignal.c.
> 
>             regards, tom lane
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073