Thread: pg_basebackup failed to back up large file

pg_basebackup failed to back up large file

From
Fujii Masao
Date:
Hi,

I got the bug report of pg_basebackup off-list that it causes an error
when there is large file (e.g., 4GB) in the database cluster. It's easy
to reproduce this problem.

$ dd if=/dev/zero of=$PGDATA/test bs=1G count=4
$ pg_basebackup -D hoge -c fast
pg_basebackup: invalid tar block header size: 32768

2014-06-03 22:56:50 JST data LOG:  could not send data to client: Broken pipe
2014-06-03 22:56:50 JST data ERROR:  base backup could not send data,
aborting backup
2014-06-03 22:56:50 JST data FATAL:  connection to client lost

The cause of this problem is that pg_basebackup uses an integer to
store the size of the file to receive from the server and an integer
overflow can happen when the file is very large. I think that
pg_basebackup should be able to handle even such large file properly
because it can exist in the database cluster, for example,
the server log file under $PGDATA/pg_log can be such large one.
Attached patch changes pg_basebackup so that it uses uint64 to store
the file size and doesn't cause an integer overflow.

Thought?

Regards,

--
Fujii Masao

Attachment

Re: pg_basebackup failed to back up large file

From
Andres Freund
Date:
Hi

On 2014-06-03 23:19:37 +0900, Fujii Masao wrote:
> I got the bug report of pg_basebackup off-list that it causes an error
> when there is large file (e.g., 4GB) in the database cluster. It's easy
> to reproduce this problem.
> 
> The cause of this problem is that pg_basebackup uses an integer to
> store the size of the file to receive from the server and an integer
> overflow can happen when the file is very large. I think that
> pg_basebackup should be able to handle even such large file properly
> because it can exist in the database cluster, for example,
> the server log file under $PGDATA/pg_log can be such large one.
> Attached patch changes pg_basebackup so that it uses uint64 to store
> the file size and doesn't cause an integer overflow.

Right, makes sense.

> ---
>  src/bin/pg_basebackup/pg_basebackup.c |    4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
> index b119fc0..959f0c0 100644
> --- a/src/bin/pg_basebackup/pg_basebackup.c
> +++ b/src/bin/pg_basebackup/pg_basebackup.c
> @@ -1150,7 +1150,7 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum)
>  {
>      char        current_path[MAXPGPATH];
>      char        filename[MAXPGPATH];
> -    int            current_len_left;
> +    uint64        current_len_left;

Any reason you changed the signedness here instead of just using a
int64? Did you review all current users?

>      int            current_padding = 0;
>      bool        basetablespace = PQgetisnull(res, rownum, 0);
>      char       *copybuf = NULL;
> @@ -1216,7 +1216,7 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum)
>              }
>              totaldone += 512;
>  
> -            if (sscanf(copybuf + 124, "%11o", ¤t_len_left) != 1)
> +            if (sscanf(copybuf + 124, "%11lo", ¤t_len_left) != 1)

That's probably not going to work on 32bit platforms or windows where
you might need to use ll instead of l as a prefix. Also notice that
apparently (c.f. 9d7ded0f4277f5c0063eca8e871a34e2355a8371) sscanf can't
reliably be used for 64bit input :(. That pretty much sucks...

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: pg_basebackup failed to back up large file

From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes:
> On 2014-06-03 23:19:37 +0900, Fujii Masao wrote:
>> -            if (sscanf(copybuf + 124, "%11o", ¤t_len_left) != 1)
>> +            if (sscanf(copybuf + 124, "%11lo", ¤t_len_left) != 1)

> That's probably not going to work on 32bit platforms or windows where
> you might need to use ll instead of l as a prefix. Also notice that
> apparently (c.f. 9d7ded0f4277f5c0063eca8e871a34e2355a8371) sscanf can't
> reliably be used for 64bit input :(. That pretty much sucks...

There's a far bigger problem there, which is if we're afraid that
current_len_left might exceed 4GB then what is it exactly that guarantees
it'll fit in an 11-digit field?
        regards, tom lane



Re: pg_basebackup failed to back up large file

From
Magnus Hagander
Date:
On Tue, Jun 3, 2014 at 4:40 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Andres Freund <andres@2ndquadrant.com> writes:
> On 2014-06-03 23:19:37 +0900, Fujii Masao wrote:
>> -                    if (sscanf(copybuf + 124, "%11o", &current_len_left) != 1)
>> +                    if (sscanf(copybuf + 124, "%11lo", &current_len_left) != 1)

> That's probably not going to work on 32bit platforms or windows where
> you might need to use ll instead of l as a prefix. Also notice that
> apparently (c.f. 9d7ded0f4277f5c0063eca8e871a34e2355a8371) sscanf can't
> reliably be used for 64bit input :(. That pretty much sucks...

There's a far bigger problem there, which is if we're afraid that
current_len_left might exceed 4GB then what is it exactly that guarantees
it'll fit in an 11-digit field?

Well, we will only write 11 digits in there, that's when we read it. But print_val() on the server side should probably have an overflow check there, which it doesn't. It's going to write some strange values int here if it overflows..

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: pg_basebackup failed to back up large file

From
Tom Lane
Date:
Magnus Hagander <magnus@hagander.net> writes:
> On Tue, Jun 3, 2014 at 4:40 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> There's a far bigger problem there, which is if we're afraid that
>> current_len_left might exceed 4GB then what is it exactly that guarantees
>> it'll fit in an 11-digit field?

> Well, we will only write 11 digits in there, that's when we read it. But
> print_val() on the server side should probably have an overflow check
> there, which it doesn't. It's going to write some strange values int here
> if it overflows..

My point is that having backups crash on an overflow doesn't really seem
acceptable.  IMO we need to reconsider the basebackup protocol and make
sure we don't *need* to put values over 4GB into this field.  Where's the
requirement coming from anyway --- surely all files in PGDATA ought to be
1GB max?
        regards, tom lane



Re: pg_basebackup failed to back up large file

From
Andres Freund
Date:
On 2014-06-03 11:04:58 -0400, Tom Lane wrote:
> Magnus Hagander <magnus@hagander.net> writes:
> > On Tue, Jun 3, 2014 at 4:40 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> There's a far bigger problem there, which is if we're afraid that
> >> current_len_left might exceed 4GB then what is it exactly that guarantees
> >> it'll fit in an 11-digit field?
> 
> > Well, we will only write 11 digits in there, that's when we read it. But
> > print_val() on the server side should probably have an overflow check
> > there, which it doesn't. It's going to write some strange values int here
> > if it overflows..
> 
> My point is that having backups crash on an overflow doesn't really seem
> acceptable.  IMO we need to reconsider the basebackup protocol and make
> sure we don't *need* to put values over 4GB into this field.  Where's the
> requirement coming from anyway --- surely all files in PGDATA ought to be
> 1GB max?

Fujii's example was logfiles in pg_log. But we allow to change the
segment size via a configure flag, so we should support that or remove
the ability to change the segment size...

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: pg_basebackup failed to back up large file

From
Magnus Hagander
Date:
On Tue, Jun 3, 2014 at 5:12 PM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2014-06-03 11:04:58 -0400, Tom Lane wrote:
> Magnus Hagander <magnus@hagander.net> writes:
> > On Tue, Jun 3, 2014 at 4:40 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> There's a far bigger problem there, which is if we're afraid that
> >> current_len_left might exceed 4GB then what is it exactly that guarantees
> >> it'll fit in an 11-digit field?
>
> > Well, we will only write 11 digits in there, that's when we read it. But
> > print_val() on the server side should probably have an overflow check
> > there, which it doesn't. It's going to write some strange values int here
> > if it overflows..
>
> My point is that having backups crash on an overflow doesn't really seem
> acceptable.  IMO we need to reconsider the basebackup protocol and make
> sure we don't *need* to put values over 4GB into this field.  Where's the
> requirement coming from anyway --- surely all files in PGDATA ought to be
> 1GB max?

Fujii's example was logfiles in pg_log. But we allow to change the
segment size via a configure flag, so we should support that or remove
the ability to change the segment size...


With Fujii's fix, the new limit is 8GB, per the tar standard.

To get around that we could use the "star extension" or whatever it's actually called (implemented by both gnu and bsd tar) , to maintain compatibility.

Of course, a quick fix would be to just reject files >8GB - that may be what we have to do backpatchable.

IIRC we discussed this at the original tmie and said it would not be needed because such files shouldnt' be there - clearly we forgot to consider logfiles..


--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: pg_basebackup failed to back up large file

From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes:
> On 2014-06-03 11:04:58 -0400, Tom Lane wrote:
>> My point is that having backups crash on an overflow doesn't really seem
>> acceptable.  IMO we need to reconsider the basebackup protocol and make
>> sure we don't *need* to put values over 4GB into this field.  Where's the
>> requirement coming from anyway --- surely all files in PGDATA ought to be
>> 1GB max?

> Fujii's example was logfiles in pg_log. But we allow to change the
> segment size via a configure flag, so we should support that or remove
> the ability to change the segment size...

What we had better do, IMO, is fix things so that we don't have a filesize
limit in the basebackup format.  After a bit of googling, I found out that
recent POSIX specs for tar format include "extended headers" that among
other things support member files of unlimited size [1].  Rather than
fooling with partial fixes, we should make the basebackup logic use an
extended header when the file size is over INT_MAX.
        regards, tom lane

[1] http://pubs.opengroup.org/onlinepubs/9699919799/
see "pax" under shells & utilities



Re: pg_basebackup failed to back up large file

From
Magnus Hagander
Date:
On Tue, Jun 3, 2014 at 5:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Andres Freund <andres@2ndquadrant.com> writes:
> On 2014-06-03 11:04:58 -0400, Tom Lane wrote:
>> My point is that having backups crash on an overflow doesn't really seem
>> acceptable.  IMO we need to reconsider the basebackup protocol and make
>> sure we don't *need* to put values over 4GB into this field.  Where's the
>> requirement coming from anyway --- surely all files in PGDATA ought to be
>> 1GB max?

> Fujii's example was logfiles in pg_log. But we allow to change the
> segment size via a configure flag, so we should support that or remove
> the ability to change the segment size...

What we had better do, IMO, is fix things so that we don't have a filesize
limit in the basebackup format.  After a bit of googling, I found out that
recent POSIX specs for tar format include "extended headers" that among
other things support member files of unlimited size [1].  Rather than
fooling with partial fixes, we should make the basebackup logic use an
extended header when the file size is over INT_MAX.

Yeah, pax seems to be the way to go. It's at least supported by GNU tar - is it also supported on say BSD, or other popular platforms? (The size extension in the general ustar format seems to be, so it would be a shame if this one is less portable)

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: pg_basebackup failed to back up large file

From
Andres Freund
Date:
On 2014-06-03 11:42:49 -0400, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > On 2014-06-03 11:04:58 -0400, Tom Lane wrote:
> >> My point is that having backups crash on an overflow doesn't really seem
> >> acceptable.  IMO we need to reconsider the basebackup protocol and make
> >> sure we don't *need* to put values over 4GB into this field.  Where's the
> >> requirement coming from anyway --- surely all files in PGDATA ought to be
> >> 1GB max?
> 
> > Fujii's example was logfiles in pg_log. But we allow to change the
> > segment size via a configure flag, so we should support that or remove
> > the ability to change the segment size...
> 
> What we had better do, IMO, is fix things so that we don't have a filesize
> limit in the basebackup format.

Agreed. I am just saying that we either need to support that case *or*
remove configurations where such large files are generated. But the
former is clearly preferrable since other files can cause large files to
exist as well.

> After a bit of googling, I found out that
> recent POSIX specs for tar format include "extended headers" that among
> other things support member files of unlimited size [1].  Rather than
> fooling with partial fixes, we should make the basebackup logic use an
> extended header when the file size is over INT_MAX.

That sounds neat enough. I guess we'd still add code to error out with
larger files for <= 9.4?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: pg_basebackup failed to back up large file

From
Andres Freund
Date:
On 2014-06-03 17:57:52 +0200, Magnus Hagander wrote:
> On Tue, Jun 3, 2014 at 5:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > What we had better do, IMO, is fix things so that we don't have a filesize
> > limit in the basebackup format.  After a bit of googling, I found out that
> > recent POSIX specs for tar format include "extended headers" that among
> > other things support member files of unlimited size [1].  Rather than
> > fooling with partial fixes, we should make the basebackup logic use an
> > extended header when the file size is over INT_MAX.

> Yeah, pax seems to be the way to go. It's at least supported by GNU tar -
> is it also supported on say BSD, or other popular platforms? (The size
> extension in the general ustar format seems to be, so it would be a shame
> if this one is less portable)

PG's tar.c already uses the ustar format and the referenced extension is
an extension to ustar as far as I understand it. So at least tarballs
with files < 8GB would still continue to be readable with all currently
working implementations.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: pg_basebackup failed to back up large file

From
Magnus Hagander
Date:
<p dir="ltr"><br /> On Jun 3, 2014 6:17 PM, "Andres Freund" <<a
href="mailto:andres@2ndquadrant.com">andres@2ndquadrant.com</a>>wrote:<br /> ><br /> > On 2014-06-03 17:57:52
+0200,Magnus Hagander wrote:<br /> > > On Tue, Jun 3, 2014 at 5:42 PM, Tom Lane <<a
href="mailto:tgl@sss.pgh.pa.us">tgl@sss.pgh.pa.us</a>>wrote:<br /> > > > What we had better do, IMO, is fix
thingsso that we don't have a filesize<br /> > > > limit in the basebackup format.  After a bit of googling, I
foundout that<br /> > > > recent POSIX specs for tar format include "extended headers" that among<br /> >
>> other things support member files of unlimited size [1].  Rather than<br /> > > > fooling with
partialfixes, we should make the basebackup logic use an<br /> > > > extended header when the file size is
overINT_MAX.<br /> ><br /> > > Yeah, pax seems to be the way to go. It's at least supported by GNU tar -<br />
>> is it also supported on say BSD, or other popular platforms? (The size<br /> > > extension in the
generalustar format seems to be, so it would be a shame<br /> > > if this one is less portable)<br /> ><br />
>PG's tar.c already uses the ustar format and the referenced extension is<br /> > an extension to ustar as far as
Iunderstand it. So at least tarballs<br /> > with files < 8GB would still continue to be readable with all
currently<br/> > working implementations. <p dir="ltr">Yeah, that is a clear advantage of that method. Didn't read
upon pax format backwards compatibility, does it have some trick to achieve something similar? <p dir="ltr">/Magnus  

Re: pg_basebackup failed to back up large file

From
Andres Freund
Date:
On 2014-06-03 18:23:07 +0200, Magnus Hagander wrote:
> On Jun 3, 2014 6:17 PM, "Andres Freund" <andres@2ndquadrant.com> wrote:
> > PG's tar.c already uses the ustar format and the referenced extension is
> > an extension to ustar as far as I understand it. So at least tarballs
> > with files < 8GB would still continue to be readable with all currently
> > working implementations.
> 
> Yeah, that is a clear advantage of that method. Didn't read up on pax
> format backwards compatibility, does it have some trick to achieve
> something similar?

It just introduces a new file type 'x' that's only used when extended
features are needed. That file type then contains the extended header.
So the normal ustar header is used for small files, and if more is
needed an *additional* extended header is added.
Check:
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/pax.html#tag_20_92_13_01

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: pg_basebackup failed to back up large file

From
Tom Lane
Date:
Magnus Hagander <magnus@hagander.net> writes:
> Yeah, that is a clear advantage of that method. Didn't read up on pax
> format backwards compatibility, does it have some trick to achieve
> something similar?

I didn't read the fine print but it sounded like the extended header
would look like a separate file entry to a non-aware tar implementation,
which would write it out as a file and then get totally confused when
the length specified in the overlength file's entry didn't match the
amount of data following.  So it's a nice solution for some properties
but doesn't fail-soft for file length.  Not clear that there's any way
to achieve that though.

Another thought is we could make pg_basebackup simply skip any files that
exceed RELSEG_SIZE, on the principle that you don't really need/want
enormous log files to get copied anyhow.  We'd still need the pax
extension if the user had configured large RELSEG_SIZE, but having a
compatible tar could be documented as a requirement of doing that.
        regards, tom lane



Re: pg_basebackup failed to back up large file

From
Magnus Hagander
Date:
On Tue, Jun 3, 2014 at 6:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Magnus Hagander <magnus@hagander.net> writes:
> Yeah, that is a clear advantage of that method. Didn't read up on pax
> format backwards compatibility, does it have some trick to achieve
> something similar?

I didn't read the fine print but it sounded like the extended header
would look like a separate file entry to a non-aware tar implementation,
which would write it out as a file and then get totally confused when
the length specified in the overlength file's entry didn't match the
amount of data following.  So it's a nice solution for some properties
but doesn't fail-soft for file length.  Not clear that there's any way
to achieve that though.

Well, there is no way to make it fail completely soft on the client side I think. But at least we should make sure that our implementation doesn't go do something really bad if you were to upgrade your server but not the client. And make sure that we test with at least GNU and BSD tar to see they don't break things badly either.

 

Another thought is we could make pg_basebackup simply skip any files that
exceed RELSEG_SIZE, on the principle that you don't really need/want
enormous log files to get copied anyhow.  We'd still need the pax
extension if the user had configured large RELSEG_SIZE, but having a
compatible tar could be documented as a requirement of doing that.

I think going all the way to pax is the proper long-term thing to do, at least if we can confirm it works in the main tar implementations.

For backpatchable that seems more reasonable. It doesn't work today, and we just need to document that it doesn't, with larger RELSEG_SIZE. And then fix it properly for the future.

Not sure we want to go change the file format even for 9.4 at this time, it seems we're quite overdue for that - so I think that's a 9.5 feature.
 

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: pg_basebackup failed to back up large file

From
Tom Lane
Date:
Magnus Hagander <magnus@hagander.net> writes:
> On Tue, Jun 3, 2014 at 6:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Another thought is we could make pg_basebackup simply skip any files that
>> exceed RELSEG_SIZE, on the principle that you don't really need/want
>> enormous log files to get copied anyhow.  We'd still need the pax
>> extension if the user had configured large RELSEG_SIZE, but having a
>> compatible tar could be documented as a requirement of doing that.

> I think going all the way to pax is the proper long-term thing to do, at
> least if we can confirm it works in the main tar implementations.

> For backpatchable that seems more reasonable. It doesn't work today, and we
> just need to document that it doesn't, with larger RELSEG_SIZE. And then
> fix it properly for the future.

Agreed, this would be a reasonable quick fix that we could replace in
9.5 or later.
        regards, tom lane



Re: pg_basebackup failed to back up large file

From
Magnus Hagander
Date:
On Wednesday, June 4, 2014, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Magnus Hagander <magnus@hagander.net> writes:
> On Tue, Jun 3, 2014 at 6:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Another thought is we could make pg_basebackup simply skip any files that
>> exceed RELSEG_SIZE, on the principle that you don't really need/want
>> enormous log files to get copied anyhow.  We'd still need the pax
>> extension if the user had configured large RELSEG_SIZE, but having a
>> compatible tar could be documented as a requirement of doing that.

> I think going all the way to pax is the proper long-term thing to do, at
> least if we can confirm it works in the main tar implementations.

> For backpatchable that seems more reasonable. It doesn't work today, and we
> just need to document that it doesn't, with larger RELSEG_SIZE. And then
> fix it properly for the future.

Agreed, this would be a reasonable quick fix that we could replace in
9.5 or later.


Fujii, are you going to be able to work on this with the now expanded scope? :)



--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: pg_basebackup failed to back up large file

From
Bruce Momjian
Date:
On Mon, Jun  9, 2014 at 11:17:41AM +0200, Magnus Hagander wrote:
> On Wednesday, June 4, 2014, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> 
>     Magnus Hagander <magnus@hagander.net> writes:
>     > On Tue, Jun 3, 2014 at 6:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>     >> Another thought is we could make pg_basebackup simply skip any files
>     that
>     >> exceed RELSEG_SIZE, on the principle that you don't really need/want
>     >> enormous log files to get copied anyhow.  We'd still need the pax
>     >> extension if the user had configured large RELSEG_SIZE, but having a
>     >> compatible tar could be documented as a requirement of doing that.
> 
>     > I think going all the way to pax is the proper long-term thing to do, at
>     > least if we can confirm it works in the main tar implementations.
> 
>     > For backpatchable that seems more reasonable. It doesn't work today, and
>     we
>     > just need to document that it doesn't, with larger RELSEG_SIZE. And then
>     > fix it properly for the future.
> 
>     Agreed, this would be a reasonable quick fix that we could replace in
>     9.5 or later.
> 
> 
> 
> Fujii, are you going to be able to work on this with the now expanded scope? :)

Is this a TODO or doc item?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +