Thread: Backup PostgreSQL from RDS straight to S3

Backup PostgreSQL from RDS straight to S3

From
Anthony DeBarros
Date:
Hi, folks -- I'm a longtime PostgreSQL user but a bit of a noob when it comes to maintenance. Question:

I'm running PostgreSQL 11 on Amazon RDS. Also have an EC2 box running Ubuntu that runs some Python scripts that collect data into PostgreSQL.

We're doing the standard RDS backups. However, I'd also like to do a nightly pg_dump and store that file on S3. Is there a way to go straight from RDS to S3 without having to download the dump file onto the EC2 box at any point? I want to avoid a scenario where the DB file fills the EC2 box's disk.

Thanks for any and all advice!

Anthony


Re: Backup PostgreSQL from RDS straight to S3

From
Adrian Klaver
Date:
On 9/18/19 11:32 AM, Anthony DeBarros wrote:
> Hi, folks -- I'm a longtime PostgreSQL user but a bit of a noob when it 
> comes to maintenance. Question:
> 
> I'm running PostgreSQL 11 on Amazon RDS. Also have an EC2 box running 
> Ubuntu that runs some Python scripts that collect data into PostgreSQL.
> 
> We're doing the standard RDS backups. However, I'd also like to do a 
> nightly pg_dump and store that file on S3. Is there a way to go straight 
> from RDS to S3 without having to download the dump file onto the EC2 box 
> at any point? I want to avoid a scenario where the DB file fills the EC2 
> box's disk.

?:
https://dba.stackexchange.com/questions/221454/best-way-to-pg-dump-postgresql-on-rds-to-s3

> 
> Thanks for any and all advice!
> 
> Anthony
> 
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: Backup PostgreSQL from RDS straight to S3

From
Anthony DeBarros
Date:
Great, thanks. The question I have, which at first glance isn't covered there, is whether those instructions will at any point bring the dump file onto the EC2 box, either in memory or temp file storage, on its way to S3? I don't know enough about how Linux handles data piped from one command to the next to know whether that's standard OP or not.

On Wed, Sep 18, 2019 at 2:36 PM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 9/18/19 11:32 AM, Anthony DeBarros wrote:
> Hi, folks -- I'm a longtime PostgreSQL user but a bit of a noob when it
> comes to maintenance. Question:
>
> I'm running PostgreSQL 11 on Amazon RDS. Also have an EC2 box running
> Ubuntu that runs some Python scripts that collect data into PostgreSQL.
>
> We're doing the standard RDS backups. However, I'd also like to do a
> nightly pg_dump and store that file on S3. Is there a way to go straight
> from RDS to S3 without having to download the dump file onto the EC2 box
> at any point? I want to avoid a scenario where the DB file fills the EC2
> box's disk.

?:
https://dba.stackexchange.com/questions/221454/best-way-to-pg-dump-postgresql-on-rds-to-s3

>
> Thanks for any and all advice!
>
> Anthony
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com

Re: Backup PostgreSQL from RDS straight to S3

From
Adrian Klaver
Date:
On 9/18/19 12:04 PM, Anthony DeBarros wrote:
> Great, thanks. The question I have, which at first glance isn't covered 
> there, is whether those instructions will at any point bring the dump 
> file onto the EC2 box, either in memory or temp file storage, on its way 
> to S3? I don't know enough about how Linux handles data piped from one 
> command to the next to know whether that's standard OP or not.

Good intoduction:

https://ryanstutorials.net/linuxtutorial/piping.php

The piping section is down the page

Basically | connects the output of one command directly to input of 
another. Run a test case in one terminal and top in another to see the 
effect om memory.

> 
> On Wed, Sep 18, 2019 at 2:36 PM Adrian Klaver <adrian.klaver@aklaver.com 
> <mailto:adrian.klaver@aklaver.com>> wrote:
> 
>     On 9/18/19 11:32 AM, Anthony DeBarros wrote:
>      > Hi, folks -- I'm a longtime PostgreSQL user but a bit of a noob
>     when it
>      > comes to maintenance. Question:
>      >
>      > I'm running PostgreSQL 11 on Amazon RDS. Also have an EC2 box
>     running
>      > Ubuntu that runs some Python scripts that collect data into
>     PostgreSQL.
>      >
>      > We're doing the standard RDS backups. However, I'd also like to do a
>      > nightly pg_dump and store that file on S3. Is there a way to go
>     straight
>      > from RDS to S3 without having to download the dump file onto the
>     EC2 box
>      > at any point? I want to avoid a scenario where the DB file fills
>     the EC2
>      > box's disk.
> 
>     ?:
>     https://dba.stackexchange.com/questions/221454/best-way-to-pg-dump-postgresql-on-rds-to-s3
> 
>      >
>      > Thanks for any and all advice!
>      >
>      > Anthony
>      >
>      >
> 
> 
>     -- 
>     Adrian Klaver
>     adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: Backup PostgreSQL from RDS straight to S3

From
Steven Lembark
Date:
s3fs available on linux allows mounting S3 directly as a local 
filesystem. At that point something like:

  pg_dump ... | gzip -9 -c > /mnt/s3-mount-point/$basename.pg_dump.gz;

will do the deed nicely. If your S3 volume is something like
your_name_here.com/pg_dump then you could parallize it by dumping
separate databases into URL's based on the date and database name:

    tstamp=$(date +%Y.%m.%d-%H.%M.%S);

    gzip='/bin/gzip -9 -v';
    dump='/opt/postgres/bin/pg_dump -blah -blah -blah';

    for i in your database list
    do
        echo "Dump: '$i'";
        $dump $i | $gzip > /mnt/pg-backups/$tstamp/$i.dump.gz &
    done

    # at this point however many databases are dumping...

    wait;

    echo "Goodnight.";

If you prefer to only keep a few database backups (e.g., a rolling
weekly history) then use the day-of-week for the tstamp; if you
want to keep fewer then $(( $(date +%s) / 86400 % $num_backups)) 
will do (leap-second notwhithstanding).

Check rates to see which AWS location is cheapest for the storage
and procesing to gzip the content. Also check the CPU charges for
zipping vs. storing the data -- it may be cheaper in the long run
to use "gzip --fast" with smaller, more repeatetive content than 
to pay the extra CPU charges for "gzip --best".

-- 
Steven Lembark                                        3646 Flora Place
Workhorse Computing                                St. Louis, MO 63110
lembark@wrkhors.com                                    +1 888 359 3508