Re: Anyone working on better transaction locking? - Mailing list pgsql-hackers
From | Kevin Brown |
---|---|
Subject | Re: Anyone working on better transaction locking? |
Date | |
Msg-id | 20030412105452.GV1833@filer Whole thread Raw |
In response to | Re: Anyone working on better transaction locking? (Shridhar Daithankar <shridhar_daithankar@persistent.co.in>) |
Responses |
Re: Anyone working on better transaction locking?
|
List | pgsql-hackers |
Shridhar Daithankar wrote: > Apache does too many things to be a speed daemon and what it offers > is pretty impressive from performance POV. > > But database is not webserver. It is not suppose to handle tons of > concurrent requests. That is a fundamental difference. I'm not sure I necessarily agree with this. A database is just a tool, a means of reliably storing information in such a way that it can be retrieved quickly. Whether or not it "should" handle lots of concurrent requests is a question that the person trying to use it must answer. A better answer is that a database engine that can handle lots of concurrent requests can also handle a smaller number, but not vice versa. So it's clearly an advantage to have a database engine that can handle lots of concurrent requests because such an engine can be applied to a larger number of problems. That is, of course, assuming that all other things are equal... There are situations in which a database would have to handle a lot of concurrent requests. Handling ATM transactions over a large area is one such situation. A database with current weather information might be another, if it is actively queried by clients all over the country. Acting as a mail store for a large organization is another. And, of course, acting as a filesystem is definitely another. :-) > Well. Threading does not necessarily imply one thread per connection > model. Threading can be used to make CPU work during I/O and taking > advantage of SMP for things like sort etc. This is especially true > for 2.4.x linux kernels where async I/O can not be used for threaded > apps. as threads and signal do not mix together well. This is true, but whether you choose to limit the use of threads to a few specific situations or use them throughout the database, the dangers and difficulties faced by the developers when using threads will be the same. > One connection per thread is not a good model for postgresql since > it has already built a robust product around process paradigm. If I > have to start a new database project today, a mix of process+thread > is what I would choose bu postgresql is not in same stage of life. Certainly there are situations for which it would be advantageous to have multiple concurrent actions happening on behalf of a single connection, as you say. But that doesn't automatically mean that a thread is the best overall solution. On systems such as Linux that have fast process handling, processes are almost certainly the way to go. On other systems such as Solaris or Windows, threads might be the right answer (on Windows they might be the *only* answer). But my argument here is simple: the responsibility of optimizing process handling belongs to the maintainers of the OS. Application developers shouldn't have to worry about this stuff. Of course, back here in the real world they *do* have to worry about this stuff, and that's why it's important to quantify the problem. It's not sufficient to say that "processes are slow and threads are fast". Processes on the target platform may well be slow relative to other systems (and relative to threads). But the question is: for the problem being solved, how much overhead does process handling represent relative to the total amount of overhead the solution itself incurs? For instance, if we're talking about addressing the problem of distributing sorts across multiple CPUs, the amount of overhead involved in doing disk activity while sorting could easily swamp, in the typical case, the overhead involved in creating parallel processes to do the sorts themselves. And if that's the case, you may as well gain the benefits of using full-fledged processes rather than deal with the problems that come with the use of threads -- because the gains to be found by using threads will be small in relative terms. > > > At their core, threads are a context switching efficiency tweak. > > > > This is the heart of the matter. Context switching is an operating > > system problem, and *that* is where the optimization belongs. Threads > > exist in large part because operating system vendors didn't bother to > > do a good job of optimizing process context switching and > > creation/destruction. > > But why would a database need a tons of context switches if it is > not supposed to service loads to request simaltenously? If there are > 50 concurrent connections, how much context switching overhead is > involved regardless of amount of work done in a single connection? > Remeber that database state is maintened in shared memory. It does > not take a context switch to access it. If there are 50 concurrent connections with one process per connection, then there are 50 database processes. The context switch overhead is incurred whenever the current process blocks (or exhausts its time slice) and the OS activates a different process. Since database handling is generally rather I/O intensive as services go, relatively few of those 50 processes are likely to be in a runnable state, so I would expect the overall hit from context switching to be rather low -- I'd expect the I/O subsystem to fall over well before context switching became a real issue. Of course, all of that is independent of whether or not the database can handle a lot of simultaneous requests. > > Under Linux, from what I've read, process creation/destruction and > > context switching happens almost as fast as thread context switching > > on other operating systems (Windows in particular, if I'm not > > mistaken). > > I hear solaris also has very heavy processes. But postgresql has > other issues with solaris as well. Yeah, I didn't want to mention Solaris because I haven't kept up with it and thought that perhaps they had fixed this... -- Kevin Brown kevin@sysexperts.com
pgsql-hackers by date: