Thread: replication of data from postgresql DB on File System Level

replication of data from postgresql DB on File System Level

From
Saumitra Bhanage
Date:
i have some queries about replication of data from one postgresql.. but by some different approch.
as a small summry of my project,
I am working on a project of replication of data. and I have done with kernel module programming in kernel 2.6 that has two machines A and B, when i update any file(in whole directory tree) on some specified directory on machine A, my programs updates the same file on machine B..
(on each write system call on machine A, the difference in new file and old file is patched on machine B)
 
So, now i can have my PostgreSQL database on some directory say /usr/share/data (on machine A)
and have same on machine B initially.
 
now what i want to do is replicate the changes made by machine A to B.
 
so i started my program in this situations by passing whole directory.
1> stopped postgres on B
2> updated on A
3> started postgres on B
4> checked database on B IT WAS UPDATED..
 
now just problem is, the updation is taking much time.. I WANT THAT TO EXECUTE FASTER.
 
so can i AVOID replication of SOME FILES?? like log files etc?
like 000010000000 file in pg_xlog its 16MB and taking too much time for patching.
Or should i replicate only files that are in ..../base/ directory?

the replication works something like this..
i update something on machine A
1> WAL updated.(.../pg_xlog/00000100000)

2> (after about 2 min) .../base/<database number> gets updated

3> (after about 5 min) WAL updated (.../pg_xlog/00000100000)

4> ../pg_clog/0000 updated

5> ../global/pg_control gets updated.
after all this i can see the database updated!
how can i see immediate results?
after executing checkpoint, WAL flushed and database updated.. but though, problem of transferring BIG xlog files still remains. :(
 
Thanks,
 Saumitra.


Food fight? Enjoy some healthy debate
in the Yahoo! Answers Food & Drink Q&A.

Re: replication of data from postgresql DB on File System Level

From
Martijn van Oosterhout
Date:
On Tue, Mar 06, 2007 at 11:12:12PM -0800, Saumitra Bhanage wrote:
> i have some queries about replication of data from one postgresql.. but by some different approch.
>   as a small summry of my project,
>   I am working on a project of replication of data. and I have done with kernel module programming in kernel 2.6 that
hastwo machines A and B, when i update any file(in whole directory tree) on some specified directory on machine A, my
programsupdates the same file on machine B.. 
>   (on each write system call on machine A, the difference in new file and old file is patched on machine B)
>
>   So, now i can have my PostgreSQL database on some directory say /usr/share/data (on machine A)
>   and have same on machine B initially.
>
>   now what i want to do is replicate the changes made by machine A to B.

The question is why? Seems like an awfully complicated way to do it.

In any case, you can't ignore the changes in the clog/xlog, they are the
whole database. In fact, to do replication you *only* need the WAL, the
rest you can lose pretty much. If you try to replicate without then,
you'll only find that data corrupted later on...

As you noticed, the system only forces the WAL to get written out,
that's what guarentees crash safety, the actual data only gets written
out as necessary.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Attachment