On Tue, Jun 26, 2012 at 11:33 AM, Magnus Hagander <magnus@hagander.net> wrote:
> On Tue, Jun 26, 2012 at 5:24 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Sun, Jun 24, 2012 at 5:33 PM, David Kerr <dmk@mr-paradox.net> wrote:
>>> Howdy,
>>>
>>> We're using NetApp's flexclone's whenever we need to move our DB between machines.
>>>
>>> One specific case where we do that is when we're creating a new streaming replication target.
>>>
>>> The basic steps we're using are:
>>> pg_start_backup();
>>> <flex clone within the netapp>
>>> pg_stop_backup();
>>>
>>> The problem i'm seeing is that periodically the backup_label is empty, which means
>>> I can't start the new standby.
>>>
>>> I believe that since the NetApp stuff is all happening within the SAN this file hasn't been
>>> fsynced to disk by the time we take the snapshot.
>>>
>>> One option would be to do a "sync" prior to the clone, however that seems kind of like a
>>> heavy operation, and it's slightly more complicated to script. (having to have a user
>>> account on the system to sudo rather than just connecting to the db to issue the
>>> pg_start_backup(...) )
>>>
>>> Another option is to add pg_fsync(fileno(fp)) after the fflush() when creating the file (I'm not
>>> sure if fsync implies fflush or not, if it does you could just replace it.)
>>>
>>> I think this type of snapshot is fairly common, I've been doing them since 2000 with EMC,
>>> i'm sure that most SAN vendors support it.
>>
>> These seems like a good idea to me. Actually, I'm wondering if we
>> shouldn't back-patch this.
>>
>> Thoughts?
>
> Certainly can't hurt.
>
> I guess any other files that are lost this way will be recreated by
> WAL recovery - or is there something else tha tmight be of risk of
> similar treatment?
I can't think of anything. pg_start_backup does a checkpoint, which
in theory oughta be enough to make sure everything that matters hits
the platter.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company