Thread: RE: RE: xlog checkpoint depends on sync() ... seems uns afe

RE: RE: xlog checkpoint depends on sync() ... seems uns afe

From
"Mikheev, Vadim"
Date:
> > to re-write smgr. I don't know how useful is second sync() call, but
> > on Solaris (and I believe on many other *NIXes) rc0 calls it
> > three times, -:) Why?
> 
> The idea is, that by the time the last sync has run, the 
> first sync will be done flushing the buffers to disk. - this is what
> we were told by the IBM engineers when I worked tier-2/3 AIX support
> at IBM.

I was told the same a long ago about FreeBSD. How much can we count on
this undocumented sync() feature?

Vadim


Re: RE: xlog checkpoint depends on sync() ... seems uns afe

From
Tom Lane
Date:
"Mikheev, Vadim" <vmikheev@SECTORBASE.COM> writes:
>> The idea is, that by the time the last sync has run, the 
>> first sync will be done flushing the buffers to disk. - this is what
>> we were told by the IBM engineers when I worked tier-2/3 AIX support
>> at IBM.

> I was told the same a long ago about FreeBSD. How much can we count on
> this undocumented sync() feature?

Sounds quite unreliable to me.  Unless there's some interlock ... like,
say, the second sync not being able to advance past a buffer page that's
as yet unwritten by the first sync.  But would all Unixen share such a
strange detail of implementation?
        regards, tom lane


Re: RE: xlog checkpoint depends on sync() ... seems uns afe

From
Doug McNaught
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> "Mikheev, Vadim" <vmikheev@SECTORBASE.COM> writes:
> >> The idea is, that by the time the last sync has run, the 
> >> first sync will be done flushing the buffers to disk. - this is what
> >> we were told by the IBM engineers when I worked tier-2/3 AIX support
> >> at IBM.
> 
> > I was told the same a long ago about FreeBSD. How much can we count on
> > this undocumented sync() feature?
> 
> Sounds quite unreliable to me.  Unless there's some interlock ... like,
> say, the second sync not being able to advance past a buffer page that's
> as yet unwritten by the first sync.  But would all Unixen share such a
> strange detail of implementation?

I'm pretty sure it has no basis in fact, it's just one of these habits 
that gives sysadmins a warm fuzzy feeling.  ;)  It's apparently been
around a long time, though I don't remember where I read about it--it
was quite a few years ago.

-Doug



Re: RE: xlog checkpoint depends on sync() ... seems uns afe

From
Giles Lean
Date:
> Sounds quite unreliable to me.  Unless there's some interlock ... like,
> say, the second sync not being able to advance past a buffer page that's
> as yet unwritten by the first sync.  But would all Unixen share such a
> strange detail of implementation?

I heard Kirk McKusick tell this story in a 4.4BSD internals class.
His explanation was that having an *operator* type 'sync' three times
provided enough time for the first sync to do the work before the
operator powered the system down or reset it or whatever.

I've not heard of any filesystem implementation where the number of
sync() system calls issued makes a difference, and imagine that any
programmer who has written code to call sync three times has only
heard part of the story. :-)

Regards,

Giles



Re: RE: xlog checkpoint depends on sync() ... seems uns afe

From
Matthew Kirkwood
Date:
On Tue, 13 Mar 2001, Tom Lane wrote:

> > I was told the same a long ago about FreeBSD. How much can we count on
> > this undocumented sync() feature?
> 
> Sounds quite unreliable to me.  Unless there's some interlock ...
> like, say, the second sync not being able to advance past a buffer
> page that's as yet unwritten by the first sync.  But would all Unixen
> share such a strange detail of implementation?

The Linux manpage says:

NAME      sync - commit buffer cache to disk.
[..]

DESCRIPTION      sync  first commits inodes to buffers, and then buffers to      disk.
[..]

CONFORMING TO      SVr4, SVID, X/OPEN, BSD 4.3

BUGS      According to  the  standard  specification  (e.g.,  SVID),      sync()  schedules  the  writes,  but may
returnbefore the      actual writing is done.   However,  since  version  1.3.20      Linux  does actually wait.  (This
stilldoes not guarantee      data integrity: modern disks have large caches.)
 


And it's still true.  On a fast system, if you do:

$ cp /dev/zero /tmp & sleep 1; sync

the sync will often never finish.  (Of course, that's
just an implementation detail really.)

Matthew.