Re: [HACKERS] pg_basebackup: Allow use of arbitrary compression program - Mailing list pgsql-hackers

From Magnus Hagander
Subject Re: [HACKERS] pg_basebackup: Allow use of arbitrary compression program
Date
Msg-id CABUevEwzCGY+rZo3rm89F9V5ED+g0WtEfCP0R=VVUK_gUf3iPw@mail.gmail.com
Whole thread Raw
In response to [HACKERS] pg_basebackup: Allow use of arbitrary compression program  (Michael Harris <harmic@gmail.com>)
Responses Re: [HACKERS] pg_basebackup: Allow use of arbitrary compression program  (Michael Harris <harmic@gmail.com>)
List pgsql-hackers
On Fri, Apr 7, 2017 at 4:04 AM, Michael Harris <harmic@gmail.com> wrote:
Hello,

Back in pg 9.2, we hacked a copy of pg_basebackup to add a command
line option which would allow the user to specify an arbitrary
external program (potentially including arguments) to be used to
compress the tar backup.

Our motivation was to be able to use pigz (parallel gzip
implementation) to speed up the compression. It also allows using
tools like bzip2, xz, etc instead of the inbuilt zlib.

I never ended up submitting that upstream, but now it looks like I
will have to repeat the exercise for 9.6, so I was wondering if such a
feature would be welcomed.

I found one or two references to people asking for this, eg:
https://www.commandprompt.com/blog/a_pg_basebackup_wish_list/

To do it properly would require:

1) Adding command line option as follows:

  -C, --compressprog=PROG
                         Use supplied program for compression

2) The current logic either uses zlib if compiled in, or offers no
compression at all, controlled by a series of #ifdef/#endif. I would
prefer that the user can either use zlib or an external program
without having to recompile, so I would remove the #ifdefs and replace
them with run time branching.

Not sure how that would work or be needed. The reasonable thing would be if zlib is available when building the choices would be "no compression", "zlib compression" or "external compression". If there was no zlib available when building, the choices would be "no compression" or "external compression". 

Or maybe I'm misunderstanding what you're saying?

 
3) When opening the output file, if the -C option was used, use popen
to open a child process and write to that.

My questions are:
- Has anything like this already been discussed?

I think it has, but not in detail.

 
- Would this be a welcome contribution?

Yes, I definitely think this would be useful.

 
- Can anyone see any problems with the above approach?

One thing to consider is the work done recently to ensure that the output is properly synchronized when written to disk. I don't think it's reasonable to expect that from an external compression, but if it can be made optional that'd be good. Or at least be careful not to break the current one.

--

pgsql-hackers by date:

Previous
From: Jaime Casanova
Date:
Subject: [HACKERS] problems compiling in solaris 10
Next
From: Tom Lane
Date:
Subject: Re: [HACKERS] problems compiling in solaris 10