Thread: Stuck trying to backup large database - best practice?

Stuck trying to backup large database - best practice?

From

Antony Gelberg

Date:

12 January 2015, 15:21:19

Hi,

We have a postgres 9.3.x box, with 1.3TB free space, and our database of around 1.8TB. Unfortunately, we're struggling to back it up.

When we try a compressed backup with the following command:

pg_basebackup -D "$BACKUP_PATH/$TIMESTAMP" -Ft -Z9 -P -U "$DBUSER" -w

we get error:

pg_basebackup: could not get transaction log end position from server: ERROR: requested WAL segment 0000000400002B9F000000B4 has already been removed

This attempted backup reached 430GB before failing.

We were advised on IRC to try -Xs, but that only works with a plain (uncompressed) backup, and as you'll note from above, we don't have enough disk space for this.

Is there anything else we can do apart from get a bigger disk (not trivial at the moment)? Any best practice?

I suspect that setting up WAL archiving and / or playing with the wal_keep_segments setting might help, but as you can probably gather, I'd like to be sure that I'm doing something sane before I dive in.

Happy to give more detail if required.

Antony

Re: Stuck trying to backup large database - best practice?

From

Adrian Klaver

Date:

12 January 2015, 15:32:01

On 01/12/2015 07:20 AM, Antony Gelberg wrote:
> Hi,
>
> We have a postgres 9.3.x box, with 1.3TB free space, and our database of
> around 1.8TB.  Unfortunately, we're struggling to back it up.
>
> When we try a compressed backup with the following command:
>
> pg_basebackup -D "$BACKUP_PATH/$TIMESTAMP" -Ft -Z9 -P -U "$DBUSER" -w
>
> we get error:
>
> pg_basebackup: could not get transaction log end position from server:
> ERROR: requested WAL segment 0000000400002B9F000000B4 has already been
> removed
>
> This attempted backup reached 430GB before failing.

It fails because the WAL file it needs has been removed from under it.

>
> We were advised on IRC to try -Xs, but that only works with a plain
> (uncompressed) backup, and as you'll note from above, we don't have
> enough disk space for this.
>
> Is there anything else we can do apart from get a bigger disk (not
> trivial at the moment)?  Any best practice?

What is the purpose of the backup?

In other words do really want the data and the WALs together or do you
just want the data?

>
> I suspect that setting up WAL archiving and / or playing with the
> wal_keep_segments setting might help, but as you can probably gather,
> I'd like to be sure that I'm doing something sane before I dive in.
>
> Happy to give more detail if required.
>
> Antony


--
Adrian Klaver
adrian.klaver@aklaver.com

Re: Stuck trying to backup large database - best practice?

From

Antony Gelberg

Date:

12 January 2015, 16:11:32

<some snippage>

On Mon, Jan 12, 2015 at 5:31 PM, Adrian Klaver
<adrian.klaver@aklaver.com> wrote:
>
> On 01/12/2015 07:20 AM, Antony Gelberg wrote:
>>
>> pg_basebackup: could not get transaction log end position from server:
>> ERROR: requested WAL segment 0000000400002B9F000000B4 has already been
>> removed
>>
>> This attempted backup reached 430GB before failing.
>
>
> It fails because the WAL file it needs has been removed from under it.
>

Okay.  We simply understood that it took too long.  Clearly we have a
lot to learn about WAL and its intricacies.

>> We were advised on IRC to try -Xs, but that only works with a plain
>> (uncompressed) backup, and as you'll note from above, we don't have
>> enough disk space for this.
>>
>> Is there anything else we can do apart from get a bigger disk (not
>> trivial at the moment)?  Any best practice?
>
> What is the purpose of the backup?
>
> In other words do really want the data and the WALs together or do you
> just want the data?

No, we just want to be able to restore our data at a later point.  (As
as secondary point, it's not that clear to me why it would be useful
to have both, I'd be interested for some insight.)

Antony

Re: Stuck trying to backup large database - best practice?

From

Adrian Klaver

Date:

12 January 2015, 16:23:19

On 01/12/2015 08:10 AM, Antony Gelberg wrote:
> <some snippage>
>
> On Mon, Jan 12, 2015 at 5:31 PM, Adrian Klaver
> <adrian.klaver@aklaver.com> wrote:
>>
>> On 01/12/2015 07:20 AM, Antony Gelberg wrote:
>>>
>>> pg_basebackup: could not get transaction log end position from server:
>>> ERROR: requested WAL segment 0000000400002B9F000000B4 has already been
>>> removed
>>>
>>> This attempted backup reached 430GB before failing.
>>
>>
>> It fails because the WAL file it needs has been removed from under it.
>>
>
> Okay.  We simply understood that it took too long.  Clearly we have a
> lot to learn about WAL and its intricacies.

See here:

http://www.postgresql.org/docs/9.4/interactive/wal.html

>
>>> We were advised on IRC to try -Xs, but that only works with a plain
>>> (uncompressed) backup, and as you'll note from above, we don't have
>>> enough disk space for this.
>>>
>>> Is there anything else we can do apart from get a bigger disk (not
>>> trivial at the moment)?  Any best practice?
>>
>> What is the purpose of the backup?
>>
>> In other words do really want the data and the WALs together or do you
>> just want the data?
>
> No, we just want to be able to restore our data at a later point.  (As
> as secondary point, it's not that clear to me why it would be useful
> to have both, I'd be interested for some insight.)

Seems you may be better served by pg_dump:

http://www.postgresql.org/docs/9.4/interactive/app-pgdump.html

pg_basebackup has additional features which in your case are creating
issues. pg_dump on the other hand is pretty much a straight forward data
dump and if you use -Fc you get compression.

Something I failed to ask in my previous post, how are you determining
the size of the database?

In addition are you talking about a single database or the Postgres
database cluster?

>
> Antony
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com

Re: Stuck trying to backup large database - best practice?

From

Antony Gelberg

Date:

12 January 2015, 16:41:05

On Mon, Jan 12, 2015 at 6:23 PM, Adrian Klaver
<adrian.klaver@aklaver.com> wrote:
> On 01/12/2015 08:10 AM, Antony Gelberg wrote:
>>
>> <some snippage>
>>
>> On Mon, Jan 12, 2015 at 5:31 PM, Adrian Klaver
>> <adrian.klaver@aklaver.com> wrote:
>>>
>>>
>>> On 01/12/2015 07:20 AM, Antony Gelberg wrote:
>>>>
>>>>
>>>> pg_basebackup: could not get transaction log end position from server:
>>>> ERROR: requested WAL segment 0000000400002B9F000000B4 has already been
>>>> removed
>>>>
>>>> This attempted backup reached 430GB before failing.
>>>
>>>
>>>
>>> It fails because the WAL file it needs has been removed from under it.
>>>
>>
>> Okay.  We simply understood that it took too long.  Clearly we have a
>> lot to learn about WAL and its intricacies.
>
>
> See here:
>
> http://www.postgresql.org/docs/9.4/interactive/wal.html
>

Of course we read the docs before asking here, but really learning
about a subject comes with time.  :)

>>
>>>> We were advised on IRC to try -Xs, but that only works with a plain
>>>> (uncompressed) backup, and as you'll note from above, we don't have
>>>> enough disk space for this.
>>>>
>>>> Is there anything else we can do apart from get a bigger disk (not
>>>> trivial at the moment)?  Any best practice?
>>>
>>>
>>> What is the purpose of the backup?
>>>
>>> In other words do really want the data and the WALs together or do you
>>> just want the data?
>>
>>
>> No, we just want to be able to restore our data at a later point.  (As
>> as secondary point, it's not that clear to me why it would be useful
>> to have both, I'd be interested for some insight.)
>
>
> Seems you may be better served by pg_dump:
>
> http://www.postgresql.org/docs/9.4/interactive/app-pgdump.html
>
> pg_basebackup has additional features which in your case are creating
> issues. pg_dump on the other hand is pretty much a straight forward data
> dump and if you use -Fc you get compression.

So I should clarify - we want to be able to get back to the same point
as we would once the WAL was applied.  If we were to use pg_dump,
would we lose out in any way?  Appreciate insight as to how
pg_basebackup is scuppering things.

> Something I failed to ask in my previous post, how are you determining the
> size of the database?

It's a managed server - the hosting company told us it was 1.8TB.  I
just ran the query at

http://stackoverflow.com/questions/2596624/how-do-you-find-the-disk-size-of-a-postgres-postgresql-table-and-its-indexes,
and I don't have the total, but I'd say the actual table data is less,
nearer 1TB at a quick glance.

> In addition are you talking about a single database or the Postgres database
> cluster?
>

We only have one database in the cluster, so it's the same thing.

Antony

Re: Stuck trying to backup large database - best practice?

From

Adrian Klaver

Date:

12 January 2015, 17:08:18

On 01/12/2015 08:40 AM, Antony Gelberg wrote:
> On Mon, Jan 12, 2015 at 6:23 PM, Adrian Klaver
> <adrian.klaver@aklaver.com> wrote:
>> On 01/12/2015 08:10 AM, Antony Gelberg wrote:
>>>
>>> <some snippage>
>>>
>>> On Mon, Jan 12, 2015 at 5:31 PM, Adrian Klaver
>>> <adrian.klaver@aklaver.com> wrote:
>>>>
>>>>
>>>> On 01/12/2015 07:20 AM, Antony Gelberg wrote:
>>>>>
>>>>>
>>>>> pg_basebackup: could not get transaction log end position from server:
>>>>> ERROR: requested WAL segment 0000000400002B9F000000B4 has already been
>>>>> removed
>>>>>
>>>>> This attempted backup reached 430GB before failing.
>>>>
>>>>
>>>>
>>>> It fails because the WAL file it needs has been removed from under it.
>>>>
>>>
>>> Okay.  We simply understood that it took too long.  Clearly we have a
>>> lot to learn about WAL and its intricacies.
>>
>>
>> See here:
>>
>> http://www.postgresql.org/docs/9.4/interactive/wal.html
>>
>
> Of course we read the docs before asking here, but really learning
> about a subject comes with time.  :)
>
>>>
>>>>> We were advised on IRC to try -Xs, but that only works with a plain
>>>>> (uncompressed) backup, and as you'll note from above, we don't have
>>>>> enough disk space for this.
>>>>>
>>>>> Is there anything else we can do apart from get a bigger disk (not
>>>>> trivial at the moment)?  Any best practice?
>>>>
>>>>
>>>> What is the purpose of the backup?
>>>>
>>>> In other words do really want the data and the WALs together or do you
>>>> just want the data?
>>>
>>>
>>> No, we just want to be able to restore our data at a later point.  (As
>>> as secondary point, it's not that clear to me why it would be useful
>>> to have both, I'd be interested for some insight.)
>>
>>
>> Seems you may be better served by pg_dump:
>>
>> http://www.postgresql.org/docs/9.4/interactive/app-pgdump.html
>>
>> pg_basebackup has additional features which in your case are creating
>> issues. pg_dump on the other hand is pretty much a straight forward data
>> dump and if you use -Fc you get compression.
>
> So I should clarify - we want to be able to get back to the same point
> as we would once the WAL was applied.  If we were to use pg_dump,
> would we lose out in any way?

pg_dump does not save WALs, so it would not work for that purpose.

  Appreciate insight as to how
> pg_basebackup is scuppering things.

 From original post it is not entirely clear whether you are using the
-X or -x options. The command you show does not have them, but you
mention -Xs. In any case it seems wal_keep_segments will need to be
bumped up to keep WAL segments around that are being recycled during the
backup process. How much will depend on a determination of fast Postgres
is using/recycling log segments?  Looking at the turnover in the pg_xlog
directory would be a start.

>
>> Something I failed to ask in my previous post, how are you determining the
>> size of the database?
>
> It's a managed server - the hosting company told us it was 1.8TB.  I
> just ran the query at
>
http://stackoverflow.com/questions/2596624/how-do-you-find-the-disk-size-of-a-postgres-postgresql-table-and-its-indexes,
> and I don't have the total, but I'd say the actual table data is less,
> nearer 1TB at a quick glance.
>
>> In addition are you talking about a single database or the Postgres database
>> cluster?
>>
>
> We only have one database in the cluster, so it's the same thing.
>
> Antony
>


--
Adrian Klaver
adrian.klaver@aklaver.com

Re: Stuck trying to backup large database - best practice?

From

Bosco Rama

Date:

12 January 2015, 17:39:00

On 01/12/15 08:40, Antony Gelberg wrote:
>> In addition are you talking about a single database or the Postgres database
>> cluster?
>
> We only have one database in the cluster, so it's the same thing.

Not exactly.  For example, with pg_dump you would also need to dump the
'global' context separately.  Or you could use pg_dumpall for both the
context and the database with all of that then piped into a gzip or
bzip (since pg_dumpall only outputs sql text).

HTH,
Bosco.

Re: Stuck trying to backup large database - best practice?

From

Antony Gelberg

Date:

12 January 2015, 22:16:54

On Mon, Jan 12, 2015 at 7:08 PM, Adrian Klaver
<adrian.klaver@aklaver.com> wrote:
>
> On 01/12/2015 08:40 AM, Antony Gelberg wrote:
>>
>> On Mon, Jan 12, 2015 at 6:23 PM, Adrian Klaver
>> <adrian.klaver@aklaver.com> wrote:
>>>
>>> On 01/12/2015 08:10 AM, Antony Gelberg wrote:
>>>>
>>>> On Mon, Jan 12, 2015 at 5:31 PM, Adrian Klaver
>>>> <adrian.klaver@aklaver.com> wrote:
>>> pg_basebackup has additional features which in your case are creating
>>> issues. pg_dump on the other hand is pretty much a straight forward data
>>> dump and if you use -Fc you get compression.
>>
>>
>> So I should clarify - we want to be able to get back to the same point
>> as we would once the WAL was applied.  If we were to use pg_dump,
>> would we lose out in any way?
>
>
> pg_dump does not save WALs, so it would not work for that purpose.
>
>  Appreciate insight as to how
>>
>> pg_basebackup is scuppering things.
>
>
> From original post it is not entirely clear whether you are using the -X or -x options. The command you show does not
havethem, but you mention -Xs. In any case it seems wal_keep_segments will need to be bumped up to keep WAL segments
aroundthat are being recycled during the backup process. How much will depend on a determination of fast Postgres is
using/recyclinglog segments?  Looking at the turnover in the pg_xlog directory would be a start. 

The original script used -xs, but that didn't make sense, so we used
-Xs in the end, but then we cancelled the backup as we assumed that we
wouldn't have enough space for it uncompressed.  Did we miss
something?

I think your suggestion of looking in pg_xlog and tweaking
wal_keep_segments is interesting, we'll take a look, and I'll report
back with findings.

Thanks for your very detailed help.

Antony

Re: Stuck trying to backup large database - best practice?

From

Joseph Kregloh

Date:

12 January 2015, 23:02:08

I apologize if it has already been suggested. I already deleted the previous emails in this chain.

Have you looked into Barman? My current database is just a tad over 1TB. I have one master, two slaves, and another machine running Barman. The slaves are there for redundancy purposes. Master fails, a slave gets promoted. The backups are all done by Barman. This allows for PITR. I do not run any backup software on the database server, but I do on the Barman server. In Barman I keep a 7 day retention policy, then I have Bacula backing that up with a 1 month retention policy. So theoretically I could do a PITR up to a month in the past.

Thanks,

-Joseph Kregloh

On Mon, Jan 12, 2015 at 5:16 PM, Antony Gelberg <antony.gelberg@gmail.com> wrote:

On Mon, Jan 12, 2015 at 7:08 PM, Adrian Klaver
<adrian.klaver@aklaver.com> wrote:
>
> On 01/12/2015 08:40 AM, Antony Gelberg wrote:
>>
>> On Mon, Jan 12, 2015 at 6:23 PM, Adrian Klaver
>> <adrian.klaver@aklaver.com> wrote:
>>>
>>> On 01/12/2015 08:10 AM, Antony Gelberg wrote:
>>>>
>>>> On Mon, Jan 12, 2015 at 5:31 PM, Adrian Klaver
>>>> <adrian.klaver@aklaver.com> wrote:
>>> pg_basebackup has additional features which in your case are creating
>>> issues. pg_dump on the other hand is pretty much a straight forward data
>>> dump and if you use -Fc you get compression.
>>
>>
>> So I should clarify - we want to be able to get back to the same point
>> as we would once the WAL was applied. If we were to use pg_dump,
>> would we lose out in any way?
>
>
> pg_dump does not save WALs, so it would not work for that purpose.
>
> Appreciate insight as to how
>>
>> pg_basebackup is scuppering things.
>
>
> From original post it is not entirely clear whether you are using the -X or -x options. The command you show does not have them, but you mention -Xs. In any case it seems wal_keep_segments will need to be bumped up to keep WAL segments around that are being recycled during the backup process. How much will depend on a determination of fast Postgres is using/recycling log segments? Looking at the turnover in the pg_xlog directory would be a start.

The original script used -xs, but that didn't make sense, so we used
-Xs in the end, but then we cancelled the backup as we assumed that we
wouldn't have enough space for it uncompressed. Did we miss
something?

I think your suggestion of looking in pg_xlog and tweaking
wal_keep_segments is interesting, we'll take a look, and I'll report
back with findings.

Thanks for your very detailed help.

Antony

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: Stuck trying to backup large database - best practice?

From

Adrian Klaver

Date:

12 January 2015, 23:04:18

On 01/12/2015 02:16 PM, Antony Gelberg wrote:
> On Mon, Jan 12, 2015 at 7:08 PM, Adrian Klaver
> <adrian.klaver@aklaver.com> wrote:
>>
>> On 01/12/2015 08:40 AM, Antony Gelberg wrote:
>>>
>>> On Mon, Jan 12, 2015 at 6:23 PM, Adrian Klaver
>>> <adrian.klaver@aklaver.com> wrote:
>>>>
>>>> On 01/12/2015 08:10 AM, Antony Gelberg wrote:
>>>>>
>>>>> On Mon, Jan 12, 2015 at 5:31 PM, Adrian Klaver
>>>>> <adrian.klaver@aklaver.com> wrote:
>>>> pg_basebackup has additional features which in your case are creating
>>>> issues. pg_dump on the other hand is pretty much a straight forward data
>>>> dump and if you use -Fc you get compression.
>>>
>>>
>>> So I should clarify - we want to be able to get back to the same point
>>> as we would once the WAL was applied.  If we were to use pg_dump,
>>> would we lose out in any way?
>>
>>
>> pg_dump does not save WALs, so it would not work for that purpose.
>>
>>   Appreciate insight as to how
>>>
>>> pg_basebackup is scuppering things.
>>
>>
>>  From original post it is not entirely clear whether you are using the -X or -x options. The command you show does
nothave them, but you mention -Xs. In any case it seems wal_keep_segments will need to be bumped up to keep WAL
segmentsaround that are being recycled during the backup process. How much will depend on a determination of fast
Postgresis using/recycling log segments?  Looking at the turnover in the pg_xlog directory would be a start. 
>
> The original script used -xs, but that didn't make sense, so we used
> -Xs in the end, but then we cancelled the backup as we assumed that we
> wouldn't have enough space for it uncompressed.  Did we miss
> something?

Not sure missed as much as not fully understand:) When you use either -x
or -X you are telling pg_basebackup that you care that the WAL files in
the backup directory are update to the point the backup completed. If
you use -Xf which the same as -x then you are saying wait till the rest
of the backup is finished then collect and copy over all the relevant
WAL files. This is where  wal_keep_segments comes into play. It needs to
be set high enough that relevant WAL files in place at the beginning of
the backup are still there when the backup completes in order to have a
complete set. If you use -Xs, then a parallel process is started to copy
over the WAL files while the other data files are being copied over.
Though as the docs say:

http://www.postgresql.org/docs/9.4/interactive/app-pgbasebackup.html

"As long as the client can keep up with transaction log received, using
this mode requires no extra transaction logs to be saved on the master."

So it is possible for the client to fall behind and have a WAL file be
recycled before it can be transferred. If you are experiencing this then
again  wal_keep_segments is way of forcing Postgres to keep WAL files
around. The basic concept is that by default WAL files are recycled when
they fall out of scope on the primary and so you have to 'catch' them
before they do or force them to hang around.

Compression is a separate operation and applies only in the tar format
case and should not be affected by the -x(X) options.

If it where me I would start looking at another 'machine' to offload the
backup to. Otherwise you will be looking at increasingly convoluted
methods of getting two bodies to occupy one space.

>
> I think your suggestion of looking in pg_xlog and tweaking
> wal_keep_segments is interesting, we'll take a look, and I'll report
> back with findings.
>
> Thanks for your very detailed help.
>
> Antony
>
>

--
Adrian Klaver
adrian.klaver@aklaver.com

Re: Stuck trying to backup large database - best practice?

From

Sameer Kumar

Date:

13 January 2015, 04:26:56

On Tue, Jan 13, 2015 at 7:04 AM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:

On 01/12/2015 02:16 PM, Antony Gelberg wrote:
On Mon, Jan 12, 2015 at 7:08 PM, Adrian Klaver
<adrian.klaver@aklaver.com> wrote:

On 01/12/2015 08:40 AM, Antony Gelberg wrote:

On Mon, Jan 12, 2015 at 6:23 PM, Adrian Klaver
<adrian.klaver@aklaver.com> wrote:

On 01/12/2015 08:10 AM, Antony Gelberg wrote:

On Mon, Jan 12, 2015 at 5:31 PM, Adrian Klaver
<adrian.klaver@aklaver.com> wrote:
pg_basebackup has additional features which in your case are creating
issues. pg_dump on the other hand is pretty much a straight forward data
dump and if you use -Fc you get compression.

So I should clarify - we want to be able to get back to the same point
as we would once the WAL was applied. If we were to use pg_dump,
would we lose out in any way?

pg_dump does not save WALs, so it would not work for that purpose.

Appreciate insight as to how

pg_basebackup is scuppering things.

From original post it is not entirely clear whether you are using the -X or -x options. The command you show does not have them, but you mention -Xs. In any case it seems wal_keep_segments will need to be bumped up to keep WAL segments around that are being recycled during the backup process. How much will depend on a determination of fast Postgres is using/recycling log segments? Looking at the turnover in the pg_xlog directory would be a start.

The original script used -xs, but that didn't make sense, so we used
-Xs in the end, but then we cancelled the backup as we assumed that we
wouldn't have enough space for it uncompressed. Did we miss
something?

Not sure missed as much as not fully understand:) When you use either -x or -X you are telling pg_basebackup that you care that the WAL files in the backup directory are update to the point the backup completed. If you use -Xf which the same as -x then you are saying wait till the rest of the backup is finished then collect and copy over all the relevant WAL files. This is where wal_keep_segments comes into play. It needs to be set high enough that relevant WAL files in place at the beginning of the backup are still there when the backup completes in order to have a complete set. If you use -Xs, then a parallel process is started to copy over the WAL files while the other data files are being copied over. Though as the docs say:

http://www.postgresql.org/docs/9.4/interactive/app-pgbasebackup.html

"As long as the client can keep up with transaction log received, using this mode requires no extra transaction logs to be saved on the master."

Wouldn't streaming the WAL files

(in a parallel process, which is what wal-stream option in pg_basebackup does)

during the backup resolve the original issue he faced

with WAL being removed

So it is possible for the client to fall behind and have a WAL file be recycled before it can be transferred. If you are experiencing this then again wal_keep_segments is way of forcing Postgres to keep WAL files around. The basic concept is that by default WAL files are recycled when they fall out of scope on the primary and so you have to 'catch' them before they do or force them to hang around.

Compression is a separate operation and applies only in the tar format case and should not be affected by the -x(X) options.

But I guess if you take backup on a 2nd machine, the WAL or data-files are not compressed before sending them over network.

If it where me I would start looking at another 'machine' to offload the backup to. Otherwise you will be looking at increasingly convoluted methods of getting two bodies to occupy one space.

I agree...

Best Regards,

Sameer Kumar | Database Consultant

ASHNIK PTE. LTD.

101 Cecil Street, #11-11 Tong Eng Building, Singapore 069533

M: +65 8110 0350 T: +65 6438 3504 | www.ashnik.com

This email may contain confidential, privileged or copyright material and is solely for the use of the intended recipient(s).

Thread: Stuck trying to backup large database - best practice?

Attachment