Thread: pg_receivelog completion command

pg_receivelog completion command

From
Magnus Hagander
Date:
I had a discussion with a few people recently about a hack I wrote for
pg_receivexlog at some point, but never ended up submitting, and in
cleaning that up realized I had an open item on it.

The idea is to add a switch to pg_receivexlog (in this case, -a, but
that can always be bikeshedded ot coursE) that acts somewhat like
archive_command on the backend. The idea is to have pg_receivexlog
fire off an external command at the end of each segment - for example
a command to gzip the file, or to archive it off into a Magic Cloud
(TM) or something like that.

You can do this now with basically looping around waiting for files
without the .partial suffix, but that's kind of ugly.

My current hack just fires off this command using sytem(). That will
block the pg_receivexlog process, obviously. The question is, if we
want this, what kind of behaviour would we want here? One option is to
do just that, which should be safe enough for something like gzip but
might cause trouble if the external command blocks on network for
example. Another option would be to just fork() and run it in the
background, which could in theory lead to unlimited number of
processes if they all hang. Or of course we could have a background
process that queues them up - much like we do in the main backend,
which is definitely more complicated.

Thoughts on the best way to deal with that?

-- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/



Re: pg_receivelog completion command

From
Andres Freund
Date:
On 2014-11-02 14:26:04 +0100, Magnus Hagander wrote:
> I had a discussion with a few people recently about a hack I wrote for
> pg_receivexlog at some point, but never ended up submitting, and in
> cleaning that up realized I had an open item on it.
> 
> The idea is to add a switch to pg_receivexlog (in this case, -a, but
> that can always be bikeshedded ot coursE) that acts somewhat like
> archive_command on the backend. The idea is to have pg_receivexlog
> fire off an external command at the end of each segment - for example
> a command to gzip the file, or to archive it off into a Magic Cloud
> (TM) or something like that.

I can see that to be useful.

> My current hack just fires off this command using sytem(). That will
> block the pg_receivexlog process, obviously. The question is, if we
> want this, what kind of behaviour would we want here? One option is to
> do just that, which should be safe enough for something like gzip but
> might cause trouble if the external command blocks on network for
> example. Another option would be to just fork() and run it in the
> background, which could in theory lead to unlimited number of
> processes if they all hang. Or of course we could have a background
> process that queues them up - much like we do in the main backend,
> which is definitely more complicated.

How about a middleground: Fork the command of, but wait() on it when
before you start the next command?

This will nead some persistent state about the commands success -
similar to the current archive status stuff. Given retries and
everything it might end up to be easier to have a separate process.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: pg_receivelog completion command

From
Magnus Hagander
Date:
On Sun, Nov 2, 2014 at 2:31 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2014-11-02 14:26:04 +0100, Magnus Hagander wrote:
>> I had a discussion with a few people recently about a hack I wrote for
>> pg_receivexlog at some point, but never ended up submitting, and in
>> cleaning that up realized I had an open item on it.
>>
>> The idea is to add a switch to pg_receivexlog (in this case, -a, but
>> that can always be bikeshedded ot coursE) that acts somewhat like
>> archive_command on the backend. The idea is to have pg_receivexlog
>> fire off an external command at the end of each segment - for example
>> a command to gzip the file, or to archive it off into a Magic Cloud
>> (TM) or something like that.
>
> I can see that to be useful.
>
>> My current hack just fires off this command using sytem(). That will
>> block the pg_receivexlog process, obviously. The question is, if we
>> want this, what kind of behaviour would we want here? One option is to
>> do just that, which should be safe enough for something like gzip but
>> might cause trouble if the external command blocks on network for
>> example. Another option would be to just fork() and run it in the
>> background, which could in theory lead to unlimited number of
>> processes if they all hang. Or of course we could have a background
>> process that queues them up - much like we do in the main backend,
>> which is definitely more complicated.
>
> How about a middleground: Fork the command of, but wait() on it when
> before you start the next command?
>
> This will nead some persistent state about the commands success -
> similar to the current archive status stuff. Given retries and
> everything it might end up to be easier to have a separate process.

That is mostly what I meant with my thid option, the "background
process". But I guess we can do the actual queueing in the main
process of course. But yeah, it comes down to if we wan tto deal with
retries and such at all, or just leave that up to the external
command. We could for example say that if you specify -a, we just stop
doing the rename() in pg_receivexlog and *instead* do the archive
command, making it that commands responsibility to move the file "from
.partial". That might make things simpler.

-- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/



Re: pg_receivelog completion command

From
Andres Freund
Date:
On 2014-11-02 14:33:32 +0100, Magnus Hagander wrote:
> On Sun, Nov 2, 2014 at 2:31 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> > This will nead some persistent state about the commands success -
> > similar to the current archive status stuff. Given retries and
> > everything it might end up to be easier to have a separate process.
> 
> That is mostly what I meant with my thid option, the "background
> process". But I guess we can do the actual queueing in the main
> process of course. But yeah, it comes down to if we wan tto deal with
> retries and such at all, or just leave that up to the external
> command. We could for example say that if you specify -a, we just stop
> doing the rename() in pg_receivexlog and *instead* do the archive
> command, making it that commands responsibility to move the file "from
> .partial". That might make things simpler.

I don't think that's good enough. Unless I miss something you really
can't reliably deal with pg_receivelog being stopped at arbitrary
moments that way. I also think that moving that much into the command
will nail down implementation details that we really don't want to
expose.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: pg_receivelog completion command

From
Peter Eisentraut
Date:
On 11/2/14 8:26 AM, Magnus Hagander wrote:
> The idea is to have pg_receivexlog
> fire off an external command at the end of each segment - for example
> a command to gzip the file, or to archive it off into a Magic Cloud
> (TM) or something like that.

A simple facility to allow gzipping after the file is complete might be
OK, but the cloud use case is probably too abstract to be useful.  I'd
rather write my own consumer for that, or go back to archive_command,
which has the queuing logic built in already.