Re: block-level incremental backup - Mailing list pgsql-hackers
From | Adam Brusselback |
---|---|
Subject | Re: block-level incremental backup |
Date | |
Msg-id | CAMjNa7duujQMCLK2c0NbJdS+eFqWaGzvSvYCj9aEQx5dRoMESA@mail.gmail.com Whole thread Raw |
In response to | Re: block-level incremental backup (Anastasia Lubennikova <a.lubennikova@postgrespro.ru>) |
List | pgsql-hackers |
I hope it's alright to throw in my $0.02 as a user. I've been following this (and the other thread on reading WAL to find modified blocks, prefaulting, whatever else) since the start with great excitement and would love to see the built-in backup capabilities in Postgres greatly improved. I know this is not completely on-topic for just incremental backups, so I apologize in advance. It just seemed like the most apt place to chime in.
Just to preface where I am coming from, I have been using pgBackRest for the past couple years and used wal-e prior to that. I am not a big *nix user other than all my servers, do all my development on Windows / use primarily Java. The command line is not where I feel most comfortable despite my best efforts over the last 5-6 years. Prior to Postgres, I used SQL Server for quite a few years at previous companies but was more a junior / intermediate skill set back then. I just wanted to put that out there so you can see where my bias's are.
With all that said, I would not be comfortable using pg_basebackup as my main backup tool simply because I’d have to cobble together numerous tools to get backups stored in a safe (not on the same server) location, I’d have to manage expiring backups and the WAL which is no longer needed, along with the rest of the stuff that makes these backup management tools useful.
The command line scares me, and even if I was able to get all that working, I would not feel warm and fuzzy I didn’t mess something up horribly and I may hit an edge case which destroys backups, silently corrupts data, etc.
I love that there are tools that manage all of it; backups, wal archiving, remote storage, integrate with cloud storage (S3 and the like), manages the retention of these backups with all their dependencies for me, and has all the restore options necessary built in as well.
Block level incremental backup would be amazing for my use case. I have small updates / deletes that happen to data all over some of my largest tables. With pgBackRest, since the diff/incremental backups are at the file level, I can have a single update / delete which touched a random spot in a table and now requires that whole 1gb file to be backed up again. That said, even if pg_basebackup was the only tool that did incremental block level backup tomorrow, I still wouldn’t start using it directly. I went into the issues I’d have to deal with if I used pg_basebackup above, and incremental backups without a management tool make me think using it correctly would be much harder.
I know this thread is just about incremental backup, and that pretty much everything in core is built up from small features into larger more complex ones. I understand that and am not trying to dump on any efforts, I am super excited to see work being done in this area! I just wanted to share my perspective on how crucial good backup management is to me (and I’m sure a few others may share my sentiment considering how popular all the external tools are).
I would never put a system in production unless I have some backup management in place. If core builds a backup management tool which uses pg_basebackup as building blocks for its solution…awesome! That may be something I’d use. If pg_basebackup can be improved so it can be used as the basis most external backup management tools can build on top of, that’s also great. All the external tools which practically every Postgres company have built show that it’s obviously a need for a lot of users. Core will never solve every single problem for all users, I know that. It would just be great to see some of the fundamental features of backup management baked into core in an extensible way.
With that, there could be a recommended way to set up backups (full/incremental, parallel, compressed), point in time recovery, backup retention, and perform restores (to a point in time, on a replica server, etc) with just the tooling within core with a nice and simple user interface, and great performance.
If those features core supports in the internal tooling are built in an extensible way (as has been discussed), there could be much less duplication of work implementing the same base features over and over for each external tool. Those companies can focus on more value-added features to their own products that core would never support, or on improving the tooling/performance/features core provides.
Well, this is way longer and a lot less coherent than I was hoping, so I apologize for that. Hopefully my stream of thoughts made a little bit of sense to someone.
-Adam
pgsql-hackers by date: