Thread: Problem with mirrorring

Problem with mirrorring

From
Devrim GUNDUZ
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Hi Dave,

I think we again have a stale lock file; my web mirror of PostgreSQL could
not succeed on the last 2 tries.

Regards,
- --
Devrim GUNDUZ
devrim~gunduz.org                devrim.gunduz~linux.org.tr
             http://www.tdmsoft.com
             http://www.gunduz.org
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQFBXsW+tl86P3SPfQ4RAjbNAJsEV4uVOZFi3MjFik38T0urY1VhkQCeI01c
KJNOumuaG1NEIf9kClUdrWw=
=4zFG
-----END PGP SIGNATURE-----

Re: Problem with mirrorring

From
"Dave Page"
Date:

> -----Original Message-----
> From: Devrim GUNDUZ [mailto:devrim@gunduz.org]
> Sent: 02 October 2004 20:21
> To: Dave Page
> Cc: PostgreSQL WWW Mailing List
> Subject: Re: [pgsql-www] Problem with mirrorring
>
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
>
> Merhaba, (heh it means 'hello' in Turkish)
>
> On Sat, 2 Oct 2004, Dave Page wrote:
>
> > There was a lockfile from this morning, but it won't stop you
> > mirroring
> > - you actually mirror from a completely different box. The
> lockfile is
> > only on the master site and is used to stop lot's of site builds
> > running at once if for some reason the database server
> slows right down.
> >
> > Do you get any errors when you rsync?
>
> :( Sorry to bother you but what I meant was the problem that
> has happened
> before:
>
> http://archives.postgresql.org/pgsql-www/2004-09/msg00149.php
> and your reply:
> http://archives.postgresql.org/pgsql-www/2004-09/msg00152.php
>
> There was not an 'error' exactly...

Ahh, yes. Of course, that's actually rsync working correctly! No changes
on the server, so no transfers.

This does mean that our mirroring is quite innefficient. We really
should diff the results of each build and only update the html files if
there is a change. At the moment each mirror is probably pulling the
whole site each time :-(

Regards, Dave.

Re: Problem with mirrorring

From
Devrim GUNDUZ
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Hi,

On Sat, 2 Oct 2004, Dave Page wrote:

>> There was not an 'error' exactly...
>
> Ahh, yes. Of course, that's actually rsync working correctly! No changes
> on the server, so no transfers.

Umm, every cron job outputs an email of 259K, but when there is a
lockfile, then it decreases to 800 bytes. Rsync gets all the files, even
if there is no change!

> This does mean that our mirroring is quite innefficient. We really
> should diff the results of each build and only update the html files if
> there is a change. At the moment each mirror is probably pulling the
> whole site each time :-(

Agreed but... diffing all the files that we have might be quite hard for
us -- But I now don't have a better idea...

Umm... If we can keep a track of modified files in a db (a new or updated
FAQ, doc, news, etc...) maybe we could publish only them...

Regards,

- --
Devrim GUNDUZ
devrim~gunduz.org                devrim.gunduz~linux.org.tr
             http://www.tdmsoft.com
             http://www.gunduz.org
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQFBXwi2tl86P3SPfQ4RAm2QAKC8muqKCQbalZp1OK+T6uEMRTYtygCfeiqe
pTOc+oXlqx0ju1R+PpTfXB4=
=8ZdP
-----END PGP SIGNATURE-----

Re: Problem with mirrorring

From
"Dave Page"
Date:

> -----Original Message-----
> From: Devrim GUNDUZ [mailto:devrim@gunduz.org]
> Sent: 02 October 2004 21:00
> To: Dave Page
> Cc: PostgreSQL WWW Mailing List
> Subject: Re: [pgsql-www] Problem with mirrorring
>
> >> There was not an 'error' exactly...
> >
> > Ahh, yes. Of course, that's actually rsync working correctly! No
> > changes on the server, so no transfers.
>
> Umm, every cron job outputs an email of 259K, but when there
> is a lockfile, then it decreases to 800 bytes. Rsync gets all
> the files, even if there is no change!

It shouldn't do. Andrew Tridgell (sp?) of the Samba team wrote it for
his Phd thesis, designing it specifically to minimise the data
transferred. Iirc, it only even transfers the differences between files
rather than the whole thing. It's pretty clever stuff.

Anyway (I'll stop babbling now :-) ), if it's transferring everything,
even if it hasn't even been touched at out end, then something,
somewhere is broken.

> > This does mean that our mirroring is quite innefficient. We really
> > should diff the results of each build and only update the
> html files
> > if there is a change. At the moment each mirror is probably pulling
> > the whole site each time :-(
>
> Agreed but... diffing all the files that we have might be
> quite hard for us -- But I now don't have a better idea...
>
> Umm... If we can keep a track of modified files in a db (a
> new or updated FAQ, doc, news, etc...) maybe we could publish
> only them...

It's more tricky than that, because files change even if a new user join
Gborg for example (because of the count in the left column of the
'portal' pages). Besides, thinking about it some more I don't think it
really is that much of a problem. If rsync does it's job as I remember
it should (bear in mind it's a while since I read the thesis, and it's
Saturday night!), then it should minimise the transfer even if the mod
time has been touched.

Regards, Dave.

Re: Problem with mirrorring

From
"Marc G. Fournier"
Date:
my understanding of rsync is that even if a file changes, only the changes
are sent across ... I believe its effectively a 'diff' of the file ... so
even if all files timestamps do change, very little data is sent across
...

But, aren't there args available for rsync to have it check file size vs
timestamp?  so it would miss a reversing of characters in a word, but if a
single character is added, then the byte size changes and, therefore, it
gets updated ... ?

On Sat, 2 Oct 2004, Devrim GUNDUZ wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
>
> Hi,
>
> On Sat, 2 Oct 2004, Dave Page wrote:
>
>>> There was not an 'error' exactly...
>>
>> Ahh, yes. Of course, that's actually rsync working correctly! No changes
>> on the server, so no transfers.
>
> Umm, every cron job outputs an email of 259K, but when there is a lockfile,
> then it decreases to 800 bytes. Rsync gets all the files, even if there is no
> change!
>
>> This does mean that our mirroring is quite innefficient. We really
>> should diff the results of each build and only update the html files if
>> there is a change. At the moment each mirror is probably pulling the
>> whole site each time :-(
>
> Agreed but... diffing all the files that we have might be quite hard for us
> -- But I now don't have a better idea...
>
> Umm... If we can keep a track of modified files in a db (a new or updated
> FAQ, doc, news, etc...) maybe we could publish only them...
>
> Regards,
>
> - --
> Devrim GUNDUZ devrim~gunduz.org
> devrim.gunduz~linux.org.tr
>             http://www.tdmsoft.com
>             http://www.gunduz.org
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.2.1 (GNU/Linux)
>
> iD8DBQFBXwi2tl86P3SPfQ4RAm2QAKC8muqKCQbalZp1OK+T6uEMRTYtygCfeiqe
> pTOc+oXlqx0ju1R+PpTfXB4=
> =8ZdP
> -----END PGP SIGNATURE-----
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
>              http://www.postgresql.org/docs/faqs/FAQ.html
>

----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org           Yahoo!: yscrappy              ICQ: 7615664

Re: Problem with mirrorring

From
"John Hansen"
Date:
Ahemm,...

> >
> > http://archives.postgresql.org/pgsql-www/2004-09/msg00149.php
> > and your reply:
> > http://archives.postgresql.org/pgsql-www/2004-09/msg00152.php
> >
> > There was not an 'error' exactly...
>
> Ahh, yes. Of course, that's actually rsync working correctly!
> No changes on the server, so no transfers.
>
> This does mean that our mirroring is quite innefficient. We
> really should diff the results of each build and only update
> the html files if there is a change. At the moment each
> mirror is probably pulling the whole site each time :-(


Don't you mean quite efficient... ?

I assume the email devrim receives is a list of files changed.
Rsync will do a diff on the files and copy the changes across.
If however the only difference is the timestamp, then the file
will still be listed in said email, since it's timestamp was updated.

I hope this weeds out any confusion.

... John

Re: Problem with mirrorring

From
"Dave Page"
Date:


-----Original Message-----
From: John Hansen [mailto:john@geeknet.com.au]
Sent: Sun 10/3/2004 3:10 AM
To: Dave Page; Devrim GUNDUZ
Cc: PostgreSQL WWW Mailing List
Subject: RE: [pgsql-www] Problem with mirrorring

> Don't you mean quite efficient... ?

No, quite inefficient. We rebuild every file on the site regardless of whether something has changed, hence every time
rsyncruns it sees a modified file. What I had completely forgotten when I wrote that (and later noted when I
remembered)is that rsync only transfers a diff of each files, so things aren't really that bad. 

Of course, how it does that is a mystery to me - without fully comparing both versions, how can it create a diff? I
guessthat's what made it PhD material... 

/D

Re: Problem with mirrorring

From
Peter Eisentraut
Date:
Dave Page wrote:
> Of course, how it does that is a mystery to me - without fully
> comparing both versions, how can it create a diff? I guess that's
> what made it PhD material...

It doesn't create a diff.  It computes checksums for pieces of the file
and transfers those.

Apparently, though, it first checks the modification time before doing
all that checksum stuff.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/


Re: Problem with mirrorring

From
"John Hansen"
Date:
> Of course, how it does that is a mystery to me - without
> fully comparing both versions, how can it create a diff? I
> guess that's what made it PhD material...

IIRC, it does a chunk by chunk md5 checksum on each file and transfers
only the differing chunks.
Quite clever indeed.


Re: Problem with mirrorring

From
"Marc G. Fournier"
Date:
On Sun, 3 Oct 2004, Peter Eisentraut wrote:

> Dave Page wrote:
>> Of course, how it does that is a mystery to me - without fully
>> comparing both versions, how can it create a diff? I guess that's
>> what made it PhD material...
>
> It doesn't create a diff.  It computes checksums for pieces of the file
> and transfers those.
>
> Apparently, though, it first checks the modification time before doing
> all that checksum stuff.

Right, and if you want, you can use the --size-only arg to rsync to
disable the timestamp check and made it only a size check ... so, if the
size doesn't change, the page wasn't actually modified ... and, I can't
think of many situations where if hte file changed, the size wouldn't
change by *at least* one character, but I imagine it could happen ...

----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org           Yahoo!: yscrappy              ICQ: 7615664