Re: Reasoning behind process instead of thread based - Mailing list pgsql-general
From | Scott Marlowe |
---|---|
Subject | Re: Reasoning behind process instead of thread based |
Date | |
Msg-id | 1098894087.31930.62.camel@localhost.localdomain Whole thread Raw |
In response to | Reasoning behind process instead of thread based arch? (nd02tsk@student.hig.se) |
Responses |
Re: Reasoning behind process instead of thread based
Re: Reasoning behind process instead of thread based |
List | pgsql-general |
On Wed, 2004-10-27 at 09:56, nd02tsk@student.hig.se wrote: > Hello! > > I have a couple of final ( I hope, for your sake ) questions regarding > PostgreSQL. > > I understand PostgreSQL uses processes rather than threads. I found this > statement in the archives: > > "The developers agree that multiple processes provide > more benefits (mostly in stability and robustness) than costs (more > connection startup costs). The startup costs are easily overcome by > using connection pooling. > " > > Please explain why it is more stable and robust? More from the above > statement: This question shows up every 6 months or so. You might wanna search the archives (I use google to do that, but YMMV with the postgresql site's search engine.) Basically, there are a few issues with threading that pop up their ugly heads. One: Not all OSes thread libraries are created equal. There was a nasty bug in one of the BSDs that causes MySQL to crash a couple years ago that drove them nuts. So programming a threaded implementation means you have the vagaries of different levels of quality and robustness of thread libraries to deal with. Two: If a single process in a multi-process application crashes, that process alone dies. The buffer is flushed, and all the other child processes continue happily along. In a multi-threaded environment, when one thread dies, they all die. Three: Multi-threaded applications can be prone to race conditions that are VERY hard to troubleshoot, especially if they occur once every million or so times the triggering event happens. On some operating systems, like Windows and Solaris, processes are expensive, while threads are cheap, so to speak. this is not the case in Linux or BSD, where the differences are much smaller, and the multi-process design suffers no great disadvantage. > "Also, each query can only use one processor; a single query can't be > executed in parallel across many CPUs. However, several queries running > concurrently will be spread across the available CPUs." > > And it is because of the PostgreSQL process architecture that a query > can't be executed by many CPU:s right? Although I wonder if this is the > case in MySQL. It only says in their manual that each connection is a > thread. Actually, if it were converted to multi-threaded tomorrow, it would still be true, because the postgresql engine isn't designed to split off queries into constituent parts to be executed by seperate threads or processes. Conversely, if one wished to implement it, one could likely patch postgresql to break up parts of queries to different child processes of the current child process (grand child processes so to speak) that would allow a query to hit multiple CPUs. > Also, MySQL has a library for embedded aplications, the say: > > "We also provide MySQL Server as an embedded multi-threaded library that > you can link into your application to get a smaller, faster, > easier-to-manage product." > > Do PostgreSQL offer anything similar? No, because in that design, if your application crashes, so does, by extension, your database. Now, I'd argue that if I had to choose between which database to have crash in the middle of transactions, I'd pick PostgreSQL, it's generally considered a bad thing to have a database crash mid transaction. PostgreSQL is more robust about crash recovery, but still... That's another subject that shows up every x months, an embedded version of PostgreSQL. Basically, the suggestion is to use something like SQLlite, which is built to be embedded, and therefore has a much lower footprint than PostgreSQL could ever hope to achieve. No one wants their embedded library using up gobs of RAM and disk space when it's just handling one thread / process doing one thing. It's like delivering Pizzas with a Ferrari, you could do it, it just eouldn't make a lot of sense.
pgsql-general by date: