Re: pg_upgrade and rsync - Mailing list pgsql-hackers

From Stephen Frost
Subject Re: pg_upgrade and rsync
Date
Msg-id 20150123030509.GJ3854@tamriel.snowman.net
Whole thread Raw
In response to Re: pg_upgrade and rsync  (David Steele <david@pgmasters.net>)
Responses Re: pg_upgrade and rsync
List pgsql-hackers
* David Steele (david@pgmasters.net) wrote:
> On 1/22/15 8:54 PM, Stephen Frost wrote:
> > The problem, as mentioned elsewhere, is that you have to checksum all
> > the files because the timestamps will differ.  You can actually get
> > around that with rsync if you really want though- tell it to only look
> > at file sizes instead of size+time by passing in --size-only.  I have to
> > admit that for *my* taste, at least, that's getting pretty darn
> > optimistic.  It *should* work, but I'd definitely recommend testing it
> > about a billion times in various ways before trusting it or recommending
> > it to anyone else.  I expect you'd need --inplace also, for cases where
> > the sizes are different and rsync wants to modify the file on the
> > destination to match the one on the source.
>
> I would definitely not feel comfortable using --size-only.

Yeah, it also occurs to me that if any of the catalog tables end up
being the same size between the master and the replica that they
wouldn't get copied and that'd make for one very interesting result, and
not a good one.

> In addition, there is a possible race condition in rsync where a file
> that is modified in the same second after rsync starts to copy will not
> be picked up in a subsequent rsync unless --checksum is used.  This is
> fairly easy to prove and is shown here:
>
> https://github.com/pgmasters/backrest/blob/dev/test/lib/BackRestTest/BackupTest.pm#L1667

Right, though that isn't really an issue in this specific case- we're
talking about post-pg_upgrade but before the upgraded cluster has
actually been started, so nothing should be modifying these files.

> That means the rsync hot, then rsync cold method of updating a standby
> is not *guaranteed* to work unless checksums are used.  This may seem
> like an edge case, but for a small, active database it looks like it
> could be a real issue.

That's certainly a good point though.
Thanks!
    Stephen

pgsql-hackers by date:

Previous
From: Josh Berkus
Date:
Subject: Re: Parallel Seq Scan
Next
From: David Steele
Date:
Subject: Re: pg_upgrade and rsync