Thread: pg_streamrecv for 9.1?

pg_streamrecv for 9.1?

From
Magnus Hagander
Date:
Would people be interested in putting pg_streamrecv
(http://github.com/mhagander/pg_streamrecv) in bin/ or contrib/ for
9.1? I think it would make sense to do so.

It could/should then also become the default tool for doing
base-backup-over-libpq, assuming me or Heikki (or somebody else)
finishes off the patch for that before 9.1. We need a tool for that of
some kind if we add the functionality, after all...

What do people think - is there interest in that, or is it better off
being an outside tool?

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: pg_streamrecv for 9.1?

From
Robert Haas
Date:
On Wed, Dec 29, 2010 at 5:47 AM, Magnus Hagander <magnus@hagander.net> wrote:
> Would people be interested in putting pg_streamrecv
> (http://github.com/mhagander/pg_streamrecv) in bin/ or contrib/ for
> 9.1? I think it would make sense to do so.
>
> It could/should then also become the default tool for doing
> base-backup-over-libpq, assuming me or Heikki (or somebody else)
> finishes off the patch for that before 9.1. We need a tool for that of
> some kind if we add the functionality, after all...
>
> What do people think - is there interest in that, or is it better off
> being an outside tool?

+1 for including it.  If it's reasonably mature, +1 for bin rather than contrib.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: pg_streamrecv for 9.1?

From
Euler Taveira de Oliveira
Date:
Em 29-12-2010 07:47, Magnus Hagander escreveu:
> Would people be interested in putting pg_streamrecv
> (http://github.com/mhagander/pg_streamrecv) in bin/ or contrib/ for
> 9.1? I think it would make sense to do so.
>
+1 but...

> It could/should then also become the default tool for doing
> base-backup-over-libpq, assuming me or Heikki (or somebody else)
> finishes off the patch for that before 9.1.
>
I think that the base backup feature is more important than simple streaming 
chunks of the WAL (SR already does this). Talking about the base backup over 
libpq, it is something we should implement to fulfill people's desire that 
claim an easy replication setup.

IIRC, Dimitri already coded a base backup over libpq tool [1] but it is 
written in Python.


[1] https://github.com/dimitri/pg_basebackup/


--   Euler Taveira de Oliveira  http://www.timbira.com/


Re: pg_streamrecv for 9.1?

From
Magnus Hagander
Date:
On Wed, Dec 29, 2010 at 13:03, Euler Taveira de Oliveira
<euler@timbira.com> wrote:
> Em 29-12-2010 07:47, Magnus Hagander escreveu:
>>
>> Would people be interested in putting pg_streamrecv
>> (http://github.com/mhagander/pg_streamrecv) in bin/ or contrib/ for
>> 9.1? I think it would make sense to do so.
>>
> +1 but...
>
>> It could/should then also become the default tool for doing
>> base-backup-over-libpq, assuming me or Heikki (or somebody else)
>> finishes off the patch for that before 9.1.
>>
> I think that the base backup feature is more important than simple streaming
> chunks of the WAL (SR already does this). Talking about the base backup over
> libpq, it is something we should implement to fulfill people's desire that
> claim an easy replication setup.

Yes, definitely. But that also needs server side support.


> IIRC, Dimitri already coded a base backup over libpq tool [1] but it is
> written in Python.

Yeah, the WIP patch heikki posted is simliar, except it uses tar
format and is implemented natively in the backend with no need for
pl/pythonu to be installed.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: pg_streamrecv for 9.1?

From
David Fetter
Date:
On Wed, Dec 29, 2010 at 11:47:53AM +0100, Magnus Hagander wrote:
> Would people be interested in putting pg_streamrecv
> (http://github.com/mhagander/pg_streamrecv) in bin/ or contrib/ for
> 9.1? I think it would make sense to do so.

+1 for bin/

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: pg_streamrecv for 9.1?

From
Tom Lane
Date:
David Fetter <david@fetter.org> writes:
> On Wed, Dec 29, 2010 at 11:47:53AM +0100, Magnus Hagander wrote:
>> Would people be interested in putting pg_streamrecv
>> (http://github.com/mhagander/pg_streamrecv) in bin/ or contrib/ for
>> 9.1? I think it would make sense to do so.

> +1 for bin/

Is it really stable enough for bin/?  My impression of the state of
affairs is that there is nothing whatsoever about replication that
is really stable yet.
        regards, tom lane


Re: pg_streamrecv for 9.1?

From
Robert Haas
Date:
On Dec 29, 2010, at 1:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Is it really stable enough for bin/?  My impression of the state of
> affairs is that there is nothing whatsoever about replication that
> is really stable yet.

Well, that's not stopping us from shipping a core feature called "replication".  I'll defer to others on how mature
pg_streamrecvis, but if it's no worse than replication in general I think putting it in bin/ is the right thing to do. 

...Robert

Re: pg_streamrecv for 9.1?

From
Gurjeet Singh
Date:
<div dir="ltr"><div class="gmail_quote">On Wed, Dec 29, 2010 at 1:42 PM, Robert Haas <span dir="ltr"><<a
href="mailto:robertmhaas@gmail.com">robertmhaas@gmail.com</a>></span>wrote:<br /><blockquote class="gmail_quote"
style="margin:0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;"><div class="im">On Dec
29,2010, at 1:01 PM, Tom Lane <<a href="mailto:tgl@sss.pgh.pa.us">tgl@sss.pgh.pa.us</a>> wrote:<br /> > Is it
reallystable enough for bin/?  My impression of the state of<br /> > affairs is that there is nothing whatsoever
aboutreplication that<br /> > is really stable yet.<br /><br /></div>Well, that's not stopping us from shipping a
corefeature called "replication".  I'll defer to others on how mature pg_streamrecv is, but if it's no worse than
replicationin general I think putting it in bin/ is the right thing to do.</blockquote></div><br />As the README says
thatis not self-contained (for no fault of its own) and one should typically set archive_command to guarantee zero WAL
loss.<br/><br /><quote><br />TODO: Document some ways of setting up an archive_command that works well together
withpg_streamrecv.<br /> </quote><br /><br />    I think implementing just that TODO might make it a
candidate.<br/><br />    I have neither used it nor read the code, but if it works as advertised then it is definitely
a+1 from me; no preference of bin/ or contrib/, since the community will have to maintain it anyway.<br /><br
/>Regards,<br/>-- <br />gurjeet.singh<br />@ EnterpriseDB - The Enterprise Postgres Company<br /><a
href="http://www.EnterpriseDB.com">http://www.EnterpriseDB.com</a><br/><br />singh.gurjeet@{ gmail | yahoo }.com<br
/>Twitter/Skype:singh_gurjeet<br /><br />Mail sent from my BlackLaptop device<br /></div> 

Re: pg_streamrecv for 9.1?

From
Dimitri Fontaine
Date:
Magnus Hagander <magnus@hagander.net> writes:
>>> Would people be interested in putting pg_streamrecv
>>> (http://github.com/mhagander/pg_streamrecv) in bin/ or contrib/ for
>>> 9.1? I think it would make sense to do so.

+1 for having that in core, only available for the roles WITH
REPLICATION I suppose?

>> I think that the base backup feature is more important than simple streaming
>> chunks of the WAL (SR already does this). Talking about the base backup over
>> libpq, it is something we should implement to fulfill people's desire that
>> claim an easy replication setup.
>
> Yes, definitely. But that also needs server side support.

Yeah, but it's already in core for 9.1, we have pg_read_binary_file()
there. We could propose a contrib module for previous version
implementing the function in C, that should be pretty easy to code.
 The only reason I didn't do that for pg_basebackup is that I wanted a self-contained python script, so that offering a
publicgit repo is all I needed as far as distributing the tool goes.
 

> Yeah, the WIP patch heikki posted is simliar, except it uses tar
> format and is implemented natively in the backend with no need for
> pl/pythonu to be installed.

As of HEAD the dependency on pl/whatever is easily removed.

The included C tool would need to have a parallel option from the get-go
if at all possible, but if you look at the pg_basebackup prototype, it
would be good to drop the wrong pg_xlog support in there and rely on a
proper archiving setup on the master.

Do you want to work on internal archive and restore commands over libpq
in the same effort too?  I think this tool should be either a one time
client or a daemon with support for:
- running a base backup when receiving a signal- continuous WAL streaming from a master- accepting standby connections
andstreaming to them- one-time libpq "streaming" of a WAL file, either way
 

Maybe we don't need to daemonize the tool from the get-go, but if you're
going parallel for the base-backup case you're almost there, aren't you?
Also having internal commands for archive and restore commands that rely
on this daemon running would be great too.

I'd offer more help if it wasn't for finishing the extension patches,
I'm currently working on 'alter extension ... upgrade', including how to
upgrade from pre-9.1 extensions.  But if that flies quicker than I want,
count me in for more than only user specs :)

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support


Re: pg_streamrecv for 9.1?

From
Magnus Hagander
Date:
On Wed, Dec 29, 2010 at 22:30, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote:
> Magnus Hagander <magnus@hagander.net> writes:
>>>> Would people be interested in putting pg_streamrecv
>>>> (http://github.com/mhagander/pg_streamrecv) in bin/ or contrib/ for
>>>> 9.1? I think it would make sense to do so.
>
> +1 for having that in core, only available for the roles WITH
> REPLICATION I suppose?

Yes.

Well, anybody who wants can run it, but they need those permissions on
the server to make it work. pg_streamrecv is entirely a client app.


>>> I think that the base backup feature is more important than simple streaming
>>> chunks of the WAL (SR already does this). Talking about the base backup over
>>> libpq, it is something we should implement to fulfill people's desire that
>>> claim an easy replication setup.
>>
>> Yes, definitely. But that also needs server side support.
>
> Yeah, but it's already in core for 9.1, we have pg_read_binary_file()
> there. We could propose a contrib module for previous version
> implementing the function in C, that should be pretty easy to code.

Oh. I didn't actually think about that one. So yeah, we could use that
- making it easy to code. However, I wonder how much less efficient it
would be than being able to stream the base backup. It's going to be a
*lot* more roundtrips across the network, and we're also going to
open/close the files a lot more.

Also, I haven't tested it, but a quick look at the code makes me
wonder how it will actually work with tablespaces - it seems to only
allow files under PGDATA? That could of course be changed..


>  The only reason I didn't do that for pg_basebackup is that I wanted a
>  self-contained python script, so that offering a public git repo is
>  all I needed as far as distributing the tool goes.

Right, there's an advantage with that when it comes to being able to
work on old versions.


>> Yeah, the WIP patch heikki posted is simliar, except it uses tar
>> format and is implemented natively in the backend with no need for
>> pl/pythonu to be installed.
>
> As of HEAD the dependency on pl/whatever is easily removed.
>
> The included C tool would need to have a parallel option from the get-go
> if at all possible, but if you look at the pg_basebackup prototype, it
> would be good to drop the wrong pg_xlog support in there and rely on a
> proper archiving setup on the master.
>
> Do you want to work on internal archive and restore commands over libpq
> in the same effort too?  I think this tool should be either a one time
> client or a daemon with support for:

Definitely a one-time client. If you want it to be a deamon, you write
a small wrapper that makes it one :)


>  - running a base backup when receiving a signal
>  - continuous WAL streaming from a master

Yes.

>  - accepting standby connections and streaming to them

I see that as a separate tool, I think. But still a useful one, sure.

>  - one-time libpq "streaming" of a WAL file, either way

Hmm. That might be interesting, yes.


> Maybe we don't need to daemonize the tool from the get-go, but if you're
> going parallel for the base-backup case you're almost there, aren't you?
> Also having internal commands for archive and restore commands that rely
> on this daemon running would be great too.

I don't want anything *relying* on this tool. I want to keep the
current way where you can choose whatever you prefer - I just want us
to ship a good default tool.


> I'd offer more help if it wasn't for finishing the extension patches,

:-) Yeah, focus on that, please - don't want to get it stalled.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: pg_streamrecv for 9.1?

From
Magnus Hagander
Date:
On Wed, Dec 29, 2010 at 20:19, Gurjeet Singh <singh.gurjeet@gmail.com> wrote:
> On Wed, Dec 29, 2010 at 1:42 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>
>> On Dec 29, 2010, at 1:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> > Is it really stable enough for bin/?  My impression of the state of
>> > affairs is that there is nothing whatsoever about replication that
>> > is really stable yet.
>>
>> Well, that's not stopping us from shipping a core feature called
>> "replication".  I'll defer to others on how mature pg_streamrecv is, but if
>> it's no worse than replication in general I think putting it in bin/ is the
>> right thing to do.
>
> As the README says that is not self-contained (for no fault of its own) and
> one should typically set archive_command to guarantee zero WAL loss.

Yes. Though you can combine it fine with wal_keep_segments if you
think that's safe - but archive_command is push and this tool is pull,
so if your backup server goes down for a while, pg_streamrecv will get
a gap and fail. Whereas if you configure an archive_command, it will
queue up the log on the master if it stops working, up to the point of
shutting it down because of out-of-disk. Which you *want*, if you want
to be really sure about the backups.


> <quote>
> TODO: Document some ways of setting up an archive_command that works well
> together with pg_streamrecv.
> </quote>
>
>     I think implementing just that TODO might make it a candidate.

Well, yes, that's obviously a requirement.

>     I have neither used it nor read the code, but if it works as advertised
> then it is definitely a +1 from me; no preference of bin/ or contrib/, since
> the community will have to maintain it anyway.

It's not that much code, but some more eyes on it would always be good!


--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: pg_streamrecv for 9.1?

From
Magnus Hagander
Date:
On Wed, Dec 29, 2010 at 19:42, Robert Haas <robertmhaas@gmail.com> wrote:
> On Dec 29, 2010, at 1:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Is it really stable enough for bin/?  My impression of the state of
>> affairs is that there is nothing whatsoever about replication that
>> is really stable yet.
>
> Well, that's not stopping us from shipping a core feature called "replication".  I'll defer to others on how mature
pg_streamrecvis, but if it's no worse than replication in general I think putting it in bin/ is the right thing to do. 

It has had less eyes on it, which puts it worse off than general
replication. OTOH, it's a lot simper code, which puts it better.

Either way, as long as it gets those eyes before release if we put it
in, it shouldn't be worse off than general replication.


--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: pg_streamrecv for 9.1?

From
Aidan Van Dyk
Date:
On Thu, Dec 30, 2010 at 6:41 AM, Magnus Hagander <magnus@hagander.net> wrote:

>> As the README says that is not self-contained (for no fault of its own) and
>> one should typically set archive_command to guarantee zero WAL loss.
>
> Yes. Though you can combine it fine with wal_keep_segments if you
> think that's safe - but archive_command is push and this tool is pull,
> so if your backup server goes down for a while, pg_streamrecv will get
> a gap and fail. Whereas if you configure an archive_command, it will
> queue up the log on the master if it stops working, up to the point of
> shutting it down because of out-of-disk. Which you *want*, if you want
> to be really sure about the backups.

I was thinking I'ld like use pg_streamrecv to "make" my archive, and
the archive script on the master would just "verify" the archive has
that complete segment.

This get's you an archive synced as it's made (as long as streamrecv
is running), and my "verify"archive command would make sure that if
for some reason, the backup archive went "down", the wal segments
would be blocked on the master until it's up again and current.

a.



--
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.


Re: pg_streamrecv for 9.1?

From
Magnus Hagander
Date:
On Thu, Dec 30, 2010 at 13:30, Aidan Van Dyk <aidan@highrise.ca> wrote:
> On Thu, Dec 30, 2010 at 6:41 AM, Magnus Hagander <magnus@hagander.net> wrote:
>
>>> As the README says that is not self-contained (for no fault of its own) and
>>> one should typically set archive_command to guarantee zero WAL loss.
>>
>> Yes. Though you can combine it fine with wal_keep_segments if you
>> think that's safe - but archive_command is push and this tool is pull,
>> so if your backup server goes down for a while, pg_streamrecv will get
>> a gap and fail. Whereas if you configure an archive_command, it will
>> queue up the log on the master if it stops working, up to the point of
>> shutting it down because of out-of-disk. Which you *want*, if you want
>> to be really sure about the backups.
>
> I was thinking I'ld like use pg_streamrecv to "make" my archive, and
> the archive script on the master would just "verify" the archive has
> that complete segment.
>
> This get's you an archive synced as it's made (as long as streamrecv
> is running), and my "verify"archive command would make sure that if
> for some reason, the backup archive went "down", the wal segments
> would be blocked on the master until it's up again and current.

That's exactly the method I was envisionning, and in fact that I am
using in a couple of cases - jus thaven't documented it properly :)

Since pg_streamrecv only moves a segment into the correct archive
location when it's completed, the archive_command only needs to check
if the file *exists* - if it does, it's transferred, if not, it
returns an error to make sure the wal segments don't get cleaned out.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: pg_streamrecv for 9.1?

From
Stefan Kaltenbrunner
Date:
On 12/29/2010 07:42 PM, Robert Haas wrote:
> On Dec 29, 2010, at 1:01 PM, Tom Lane<tgl@sss.pgh.pa.us>  wrote:
>> Is it really stable enough for bin/?  My impression of the state of
>> affairs is that there is nothing whatsoever about replication that
>> is really stable yet.
>
> Well, that's not stopping us from shipping a core feature called "replication".  I'll defer to others on how mature
pg_streamrecvis, but if it's no worse than replication in general I think putting it in bin/ is the right thing to do.
 

well I have not looked at how good pg_streamrecv really is but we 
desperately need to fix the basic usability issues in our current 
replication implementation and pg_streamrecv seems to be a useful tool 
to help with some.From all the people I talked to with SR they where surprised how 
complex and fragile the initial setup procedure is - it is the lack of 
providing a simple and reliable tool to do a base backup over libpq and 
also a simple way to have that tool tell the master "keep the wal 
segments I need for starting the standby". I do realize we need to keep 
the ability to do the basebackup out-of-line but for 99% of the users it 
is tool complex, scary and failure proof (I know nobody who got the 
procedure right the first time - which is a strong hint that we need to 
work on that).



Stefan