Re: pg_rewind in contrib - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: pg_rewind in contrib
Date
Msg-id 548FFD5D.80703@vmware.com
Whole thread Raw
In response to Re: pg_rewind in contrib  (Satoshi Nagayasu <snaga@uptime.jp>)
Responses Re: pg_rewind in contrib  (Satoshi Nagayasu <snaga@uptime.jp>)
List pgsql-hackers
On 12/16/2014 11:23 AM, Satoshi Nagayasu wrote:
> Hi,
>
> On 2014/12/12 23:13, Heikki Linnakangas wrote:
>   > Hi,
>   >
>   > I'd like to include pg_rewind in contrib. I originally wrote it as an
>   > external project so that I could quickly get it working with the
>   > existing versions, and because I didn't feel it was quite ready for
>   > production use yet. Now, with the WAL format changes in master, it is a
>   > lot more maintainable than before. Many bugs have been fixed since the
>   > first prototypes, and I think it's fairly robust now.
>   >
>   > I propose that we include pg_rewind in contrib/ now. Attached is a patch
>   > for that. It just includes the latest sources from the current pg_rewind
>   > repository at https://github.com/vmware/pg_rewind. It is released under
>   > the PostgreSQL license.
>   >
>   > For those who are not familiar with pg_rewind, it's a tool that allows
>   > repurposing an old master server as a new standby server, after
>   > promotion, even if the old master was not shut down cleanly. That's a
>   > very often requested feature.
>
> I'm looking into pg_rewind with a very first scenario.
> My scenario is here.
>
> https://github.com/snaga/pg_rewind_test/blob/master/pg_rewind_test.sh
>
> At least, I think a file descriptor "srcf" should be closed before
> exiting copy_file_range(). I got "can't open file" error with
> "too many open file" while running pg_rewind.
>
> ------------------------------------------------
> diff --git a/contrib/pg_rewind/copy_fetch.c b/contrib/pg_rewind/copy_fetch.c
> index bea1b09..5a8cc8e 100644
> --- a/contrib/pg_rewind/copy_fetch.c
> +++ b/contrib/pg_rewind/copy_fetch.c
> @@ -280,6 +280,8 @@ copy_file_range(const char *path, off_t begin, off_t
> end, bool trunc)
>                   write_file_range(buf, begin, readlen);
>                   begin += readlen;
>           }
> +
> +       close(srcfd);
>    }
>
>    /*
> ------------------------------------------------

Yep, good catch. I pushed a fix to the pg_rewind repository at github.

> And I have one question here.
>
> pg_rewind assumes that the source PostgreSQL has, at least, one
> checkpoint after getting promoted. I think the target timeline id
> in the pg_control file to be read is only available after the first
> checkpoint. Right?

Yes, it does assume that the source server (= old standby, new master) 
has had at least one checkpoint after promotion. It probably should be 
more explicit about it: If there hasn't been a checkpoint, you will 
currently get an error "source and target cluster are both on the same 
timeline", which isn't very informative.

I assume that by "target timeline ID" you meant the timeline ID of the 
source server, i.e. the timeline that the target server should be 
rewound to.

- Heikki




pgsql-hackers by date:

Previous
From: Mark Cave-Ayland
Date:
Subject: Re: Commitfest problems
Next
From: Mark Cave-Ayland
Date:
Subject: Re: Commitfest problems