Thread: Peer to peer replication of Postgresql databases
Hi all, I am trying to add some replication features to postgres (yes, I have already looked at ongoing work), in a peer to peer manner. The goal is to achive `nearly complete fault tolerence' by replicating data. The basic framework I have in mind is somewhat like this. - Postmasters are running on different computers on a networked cluster. Their data areas are identical at the beginning and recide on local storage devices. - Each postmaster is aware that they are a part of a cluster and they can communicate with each other, send multicast requests and look for each other's presence (like heartbeat in linux-ha project). - When a frontend process sends a read query, each backend process does that from its own data area. - There are two types of write queries. Postmasters use seperate communication channels for each. One is the sequencial channel which carries writes whose order is important, and the non-sequencial channel carries write queries whose order is not important. - When a frontend process sends non-sequencial write query to a backend, it is directly written to the local data area and a multicast is sent (preferably asynchronously) to the other postmasters who will also update their respective local areas. May be we can simply duplicate what goes to WAL into a TCP/IP socket (with some header info, of course). - When a sequencial-write query is requested, the corresponding postmaster informs a main-postmaster (more about in the next point), waits for his acknowledgement, and proceeds the same way as the non-sequencial write. - Each postmaster is assigned a priority. The one with the highest priority is doing some bookkeeping to handle concurrency issues etc. If he goes away, another one takes charge. Or maybe we can completely ignore the main-postmaster concept and let the clients broadcast a request to obtain locks etc. - When a new postmaster, hence a computer, joins the cluster, he will replicate the current database from one of the clients. Suggessions and critisisms are welcome. Anuradha -- Debian GNU/Linux (kernel 2.4.18-xfs-1.1) The best audience is intelligent, well-educated and a little drunk. -- Maurice Baring
On 11 Oct 2002 at 16:16, Anuradha Ratnaweera wrote: > > Hi all, > > I am trying to add some replication features to postgres (yes, I have > already looked at ongoing work), in a peer to peer manner. The goal > is to achive `nearly complete fault tolerence' by replicating data. Sounds a lot like usogres. You got it running. (I never had a chance.) I would like to hear how it compares against it. Can anybody comment how maintained usogres is. It covers an important area of replication but I am not sure how maintained that is. If it is not, I suggest we pick it up and finish it. HTH ByeShridhar -- You go slow, be gentle. It's no one-way street -- you know how youfeel and that's all. It's how the girl feels too. Don't press. Ifthe girl feels anything for you at all, you'll know. -- Kirk, "Charlie X", stardate 1535.8
On Fri, Oct 11, 2002 at 03:54:15PM +0530, Shridhar Daithankar wrote: > > On 11 Oct 2002 at 16:16, Anuradha Ratnaweera wrote: > > > I am trying to add some replication features to postgres (yes, I have > > already looked at ongoing work), in a peer to peer manner. The goal > > is to achive `nearly complete fault tolerence' by replicating data. > > Sounds a lot like usogres. You got it running. (I never had a chance.) I would > like to hear how it compares against it. > > Can anybody comment how maintained usogres is. It covers an important area of > replication but I am not sure how maintained that is. If it is not, I suggest > we pick it up and finish it. I will look at it, too. Thanks for the link. In some cases, starting anew is faster than learning unmaintained existing code. My original mail would have been much shorter if it simply stated that I want to add `application level RAID-0' to postgres ;) Anuradha -- Debian GNU/Linux (kernel 2.4.18-xfs-1.1) I think the world is run by C students. -- Al McGuire
On 11 Oct 2002 at 16:29, Anuradha Ratnaweera wrote: > On Fri, Oct 11, 2002 at 03:54:15PM +0530, Shridhar Daithankar wrote: > I will look at it, too. Thanks for the link. In some cases, starting > anew is faster than learning unmaintained existing code. While that's true, usogres code is just few files. I wouldn't take more than half an hour to read up the things. And besides it contain postgresql protocol implementation necessary which would take some time to test and debug, And it's in C++. I like that..;-) > My original mail would have been much shorter if it simply stated that I > want to add `application level RAID-0' to postgres ;) :-) ByeShridhar -- QOTD: "Do you smell something burning or is it me?" -- Joan of Arc
On Fri, Oct 11, 2002 at 04:04:29PM +0530, Shridhar Daithankar wrote: > On 11 Oct 2002 at 16:29, Anuradha Ratnaweera wrote: > > > On Fri, Oct 11, 2002 at 03:54:15PM +0530, Shridhar Daithankar wrote: > > I will look at it, too. Thanks for the link. In some cases, starting > > anew is faster than learning unmaintained existing code. Ok. Checked out what usogres is. It is not what I want. I don't want a static `main database'. It should simply a cluster of them - just like a set of Raid-0 disks, may be with a tempory controller for some tasks. Also, as a matter of fact, usogres is not unmaintained code. > While that's true, usogres code is just few files. I wouldn't take more than > half an hour to read up the things. And besides it contain postgresql protocol > implementation necessary which would take some time to test and debug, Great. I will look into this over the weekend. > And it's in C++. I like that..;-) And I DON'T like that ;) Anuradha -- Debian GNU/Linux (kernel 2.4.18-xfs-1.1) QOTD: "I ain't broke, but I'm badly bent."
On 11 Oct 2002 at 16:39, Anuradha Ratnaweera wrote: > On Fri, Oct 11, 2002 at 04:04:29PM +0530, Shridhar Daithankar wrote: > > On 11 Oct 2002 at 16:29, Anuradha Ratnaweera wrote: > > > > > On Fri, Oct 11, 2002 at 03:54:15PM +0530, Shridhar Daithankar wrote: > > > I will look at it, too. Thanks for the link. In some cases, starting > > > anew is faster than learning unmaintained existing code. > > Ok. Checked out what usogres is. It is not what I want. I don't want > a static `main database'. It should simply a cluster of them - just like > a set of Raid-0 disks, may be with a tempory controller for some tasks. Well, I don't think adding support for multiple slaves to usogres would be that problematic. Of course if you want to load balance your application queries, application has to be aware of that. I will not do sending requests to a mosix cluster anyway. > Also, as a matter of fact, usogres is not unmaintained code. Glad to know that. I wrote to author with some suggestion and never got a reply. Didn't bother joining mailing list though.. Regards,Shridhar ----------------------------------------------------------- Shridhar Daithankar LIMS CPE Team Member, PSPL. mailto:shridhar_daithankar@persistent.co.in Phone:- +91-20-5678900 Extn.270 Fax :- +91-20-5678901 -----------------------------------------------------------
On Fri, Oct 11, 2002 at 04:29:53PM +0530, Shridhar Daithankar wrote: > > Well, I don't think adding support for multiple slaves to usogres would be that > problematic. Of course if you want to load balance your application queries, > application has to be aware of that. I will not do sending requests to a mosix > cluster anyway. Have already tested postgres on a mosix cluster, and as expected results are not good. (although mosix does the correct thing in keeping all the database backend processes on one node). Anuradha -- Debian GNU/Linux (kernel 2.4.18-xfs-1.1) Remember: Silly is a state of Mind, Stupid is a way of Life. -- Dave Butler
I'd be curious to hear in a little more detail what constitutes "not good" for postgres on a mosix cluster. Greg On Fri, 2002-10-11 at 06:15, Anuradha Ratnaweera wrote: > On Fri, Oct 11, 2002 at 04:29:53PM +0530, Shridhar Daithankar wrote: > > > > Well, I don't think adding support for multiple slaves to usogres would be that > > problematic. Of course if you want to load balance your application queries, > > application has to be aware of that. I will not do sending requests to a mosix > > cluster anyway. > > Have already tested postgres on a mosix cluster, and as expected results > are not good. (although mosix does the correct thing in keeping all the > database backend processes on one node). > > Anuradha > > -- > > Debian GNU/Linux (kernel 2.4.18-xfs-1.1) > > Remember: Silly is a state of Mind, Stupid is a way of Life. > -- Dave Butler > > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
On 11 Oct 2002 at 8:30, Greg Copeland wrote: > I'd be curious to hear in a little more detail what constitutes "not > good" for postgres on a mosix cluster. > On Fri, 2002-10-11 at 06:15, Anuradha Ratnaweera wrote: > > On Fri, Oct 11, 2002 at 04:29:53PM +0530, Shridhar Daithankar wrote: > > Have already tested postgres on a mosix cluster, and as expected results > > are not good. (although mosix does the correct thing in keeping all the > > database backend processes on one node). Well, I guess in kind of replication we are talking here, the performance will be enhanced only if separate instances of psotgresql runs on separate machine. Now if mosix kernel applies some AI and puts all of them on same machine, it isn't going to be any good for the purpose replication is deployed. I guess that's what she meant.. ByeShridhar -- User n.: A programmer who will believe anything you tell him.
Well, not scalable doesn't have to mean "not good". That's why I asked. Considering this is one of the problems with mosix clusters (process migration and associated restrictions) and the nature of PostgreSQL's implementation I'm not sure what other result may of been expected. Because of that, I wasn't sure if something else was being implied. Greg On Fri, 2002-10-11 at 08:40, Shridhar Daithankar wrote: > On 11 Oct 2002 at 8:30, Greg Copeland wrote: > > > I'd be curious to hear in a little more detail what constitutes "not > > good" for postgres on a mosix cluster. > > On Fri, 2002-10-11 at 06:15, Anuradha Ratnaweera wrote: > > > On Fri, Oct 11, 2002 at 04:29:53PM +0530, Shridhar Daithankar wrote: > > > Have already tested postgres on a mosix cluster, and as expected results > > > are not good. (although mosix does the correct thing in keeping all the > > > database backend processes on one node). > > Well, I guess in kind of replication we are talking here, the performance will > be enhanced only if separate instances of psotgresql runs on separate machine. > Now if mosix kernel applies some AI and puts all of them on same machine, it > isn't going to be any good for the purpose replication is deployed. > > I guess that's what she meant.. > > Bye > Shridhar > > -- > User n.: A programmer who will believe anything you tell him. > > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org
[ pgsql-patches removed from Cc: list ] Anuradha Ratnaweera <anuradha@lklug.pdn.ac.lk> writes: > I am trying to add some replication features to postgres (yes, I have > already looked at ongoing work), in a peer to peer manner. Did you look at the research behind Postgres-R, and the pgreplication stuff? > - When a frontend process sends a read query, each backend process > does that from its own data area. Surely that's not correct -- a SELECT can be handled by *any one* node, not each and every one, right? > - There are two types of write queries. Postmasters use seperate > communication channels for each. One is the sequencial channel which > carries writes whose order is important, and the non-sequencial > channel carries write queries whose order is not important. How do you distinguish between these? Cheers, Neil -- Neil Conway <neilc@samurai.com> || PGP Key ID: DB3C29FC
For give me for responding to the beginning of this thread, but my comments only apply to this post. > already looked at ongoing work), in a peer to peer manner. The goal > is to achive `nearly complete fault tolerence' by replicating data. A worthy goal indeed! > - Postmasters are running on different computers on a networked cluster. > Their data areas are identical at the beginning and recide on local > storage devices. > > - Each postmaster is aware that they are a part of a cluster and they > can communicate with each other, send multicast requests and look for > each other's presence (like heartbeat in linux-ha project). These first two point on extending postmaster for a network cluster and HA could be a bit tricky. Have you considered using a group communication system like spread.org that already has the network cluster and heartbeat built in? > - There are two types of write queries. Postmasters use seperate > communication channels for each. One is the sequencial channel which > carries writes whose order is important, and the non-sequencial > channel carries write queries whose order is not important. This puts the burden of determining weather a conflict can happen on the application or user. Application design could become a bit tricky. If you plan to use the non-sequential channel in an application, you would need to make sure there are never any possible conflicts. > > - When a frontend process sends non-sequencial write query to a backend, > it is directly written to the local data area and a multicast is > sent (preferably asynchronously) to the other postmasters who will > also update their respective local areas. What are you planning to send? (SQL, parsed statements, or tuples) > - When a sequencial-write query is requested, the corresponding > postmaster informs a main-postmaster (more about in the next point), > waits for his acknowledgement, and proceeds the same way as the > non-sequencial write. This would make the main postmaster handle all the concurrency control for the replicated system. Are you thinking a two phased commit protocol here? > > Or maybe we can completely ignore the main-postmaster concept and > let the clients broadcast a request to obtain locks etc. If each system can obtain locks, how will you handle deadlocks across system boundaries? > > Suggessions and critisisms are welcome. > Have you taken a look at Postgres-R or the pg-replicaiton project. The goals are the same as yours, and the approach is some what similar. There is a mailing to discuss different approaches, and if you like what we are doing you can certainly participate in the development. http://gborg.postgresql.org/project/pgreplication/projdisplay.php Regards Darren
On Fri, Oct 11, 2002 at 08:30:55AM -0500, Greg Copeland wrote: > > I'd be curious to hear in a little more detail what constitutes "not > good" for postgres on a mosix cluster. It seems that almost all the postgres processes remain in the `home' node. Please notice that I am not underestimating Mosix in any way. We have tested many programs from our parallel processing project with extreme success on our mosix cluster. Anuradha -- Debian GNU/Linux (kernel 2.4.18-xfs-1.1) Ginger snap.
On Fri, Oct 11, 2002 at 07:10:26PM +0530, Shridhar Daithankar wrote: > On 11 Oct 2002 at 8:30, Greg Copeland wrote: > > > I'd be curious to hear in a little more detail what constitutes "not > > good" for postgres on a mosix cluster. > > Well, I guess in kind of replication we are talking here, the > performance will be enhanced only if separate instances of psotgresql > runs on separate machine. Now if mosix kernel applies some AI and > puts all of them on same machine, it isn't going to be any good for > the purpose replication is deployed. Exactly. First, since we know what is going on, it is not necessary for the OS to decide what's going on. Secondly, database replication is not looked after at all, unless we do some crude tricks on the filesystem. Still it won't be efficient. > I guess that's what she meant.. ^^^ Correction: "that's what _HE_ meant..." ;) Anuradha -- Debian GNU/Linux (kernel 2.4.18-xfs-1.1) All other things being equal, a bald man cannot be elected President of the United States. -- Vic Gold
On 14 Oct 2002 at 11:55, Anuradha Ratnaweera wrote: > On Fri, Oct 11, 2002 at 07:10:26PM +0530, Shridhar Daithankar wrote: > > On 11 Oct 2002 at 8:30, Greg Copeland wrote: > > > > > I'd be curious to hear in a little more detail what constitutes "not > > > good" for postgres on a mosix cluster. > > > > Well, I guess in kind of replication we are talking here, the > > performance will be enhanced only if separate instances of psotgresql > > runs on separate machine. Now if mosix kernel applies some AI and > > puts all of them on same machine, it isn't going to be any good for > > the purpose replication is deployed. > > Exactly. First, since we know what is going on, it is not necessary for > the OS to decide what's going on. Secondly, database replication is not > looked after at all, unless we do some crude tricks on the filesystem. > Still it won't be efficient. IMO any one layer of clustering should be enough. If you use mosix, you shouldn't need clustering in postgresql. If postgresql clustering is applied any heterogenous machines like freebsd/linux should do. (OK same architecture at least. No suns and PCs..) Let's keep aside mosix for the time being. Application level clustering is what postgresql needs. What next? which one should we work on? Postgres-R/Usogres/ER-server? > > > I guess that's what she meant.. > ^^^ > Correction: "that's what _HE_ meant..." ;) Argh... Extremely sorry, in India, special nouns ending with 'a' are usually feminine.. Like Radha.. Sorry again..:-) ByeShridhar -- Weiner's Law of Libraries: There are no answers, only cross references.
On Fri, Oct 11, 2002 at 12:07:00PM -0400, Neil Conway wrote: > [ pgsql-patches removed from Cc: list ] > > Anuradha Ratnaweera <anuradha@lklug.pdn.ac.lk> writes: > > I am trying to add some replication features to postgres (yes, I > > have already looked at ongoing work), in a peer to peer manner. > > Did you look at the research behind Postgres-R, and the pgreplication > stuff? Am looking at the research papers related to it now. > > - When a frontend process sends a read query, each backend process > > does that from its own data area. > > Surely that's not correct -- a SELECT can be handled by *any one* > node, not each and every one, right? Yes. Sorry about my careless wording. Unless anything is kind of locked, each node has a copy of the database, so each one can handle SELECTs individually. The actual situation will be far from this simple, because there will be database writes going on and generating consistent SELECTs would need careful handling of concurency issues. > > - There are two types of write queries. Postmasters use seperate > > communication channels for each. One is the sequencial channel which > > carries writes whose order is important, and the non-sequencial > > channel carries write queries whose order is not important. > > How do you distinguish between these? Nope. We assume that all the communication should go through the sequencial channel unless indicated by the client. In that case, we will have to find a way to indicate this from the client's side. This doesn't sound very elegant, may be we can figure out a better way. Anuradha -- Debian GNU/Linux (kernel 2.4.18-xfs-1.1) "Being against torture ought to be sort of a bipartisan thing." -- Karl Lehenbauer