Re: Anyone working on better transaction locking? - Mailing list pgsql-hackers
From | Kevin Brown |
---|---|
Subject | Re: Anyone working on better transaction locking? |
Date | |
Msg-id | 20030413041710.GW1833@filer Whole thread Raw |
In response to | Re: Anyone working on better transaction locking? (Shridhar Daithankar <shridhar_daithankar@persistent.co.in>) |
Responses |
Re: Anyone working on better transaction locking?
|
List | pgsql-hackers |
Shridhar Daithankar wrote: > > There are situations in which a database would have to handle a lot of > > concurrent requests. Handling ATM transactions over a large area is > > one such situation. A database with current weather information might > > be another, if it is actively queried by clients all over the country. > > Acting as a mail store for a large organization is another. And, of > > course, acting as a filesystem is definitely another. :-) > > Well, there is another aspect one should consider. Tuning a database > engine for a specifiic workload is a hell of a job and shifting it > to altogether other end of paradigm must be justified. Certainly, but that justification comes from the problem being solved. If the nature of the problem demands tons of short transactions (and as I said, a number of problems have such a requirement), then tuning the database so that it can deal with it is a requirement if that database is to be used at all. Now, keep in mind that "tuning the database" here covers a *lot* of ground and a lot of solutions, including connection-pooling middleware. > OK. Postgresql is not optimised to handle lots of concurrent > connections, at least not much to allow one apache request handler > to use a connection. Then middleware connection pooling like done in > php might be a simpler solution to go rather than redoing the > postgresql stuff. Because it works. I completely agree. In fact, I see little reason to change PG's method of connection handling because I see little reason that a general-purpose connection pooling frontend can't be developed. Another method that could help is to prefork the postmaster. > > This is true, but whether you choose to limit the use of threads to a > > few specific situations or use them throughout the database, the > > dangers and difficulties faced by the developers when using threads > > will be the same. > > I do not agree. Let's say I put threading functions in posgresql > that do not touch shared memory interface at all. They would be hell > lot simpler to code and mainten than converting postgresql to one > thread per connection model. I think you misunderstand what I'm saying. There are two approaches we've been talking about thus far: 1. One thread per connection. In this instance, every thread shares exactly the same memory space. 2. One process per connection, with each process able to create additional worker threads to handle things like concurrentsorts. In this instance, threads that belong to the same process all share the same memory space (includingthe SysV shared memory pool that the processes use to communicate with each other), but the only memory that*all* the threads will have in common is the SysV shared memory pool. Now, the *scope* of the problems introduced by using threading is different between the two approaches, but the *nature* of the problems is the same: for any given process, the introduction of threads will significantly complicate the debugging of memory corruption issues. This problem will be there no matter which approach you use; the only difference will be the scale. And that's why you're probably better off with the third approach: 3. One process per connection, with each process able to create additional worker subprocesses to handle things like concurrent sorts. IPC between the subprocesses can be handled using a number of different mechanisms, perhaps includingthe already-used SysV shared memory pool. The reason you're probably better off with this third approach is that by the time you need the concurrency for sorting, etc., the amount of time you'll spend on the actual process of sorting, etc. will be so much larger than the amount of time it takes to create, manage, and destroy the concurrent processes (even on systems that have extremely heavyweight processes, like Solaris and Windows) that there will be no discernable difference between using threads and using processes. It may take a few milliseconds to create, manage, and destroy the subprocesses, but the amount of work to be done is likely to represent at least a couple of *hundred* milliseconds for a concurrent approach to be worth it at all. And if that's the case, you may as well save yourself the problems associated with using threads. Even if you'd gain as much as a 10% speed improvement by using threads to handle concurrent sorts and such instead of processes (an improvement that is likely to be very difficult to achieve), I think you're still going to be better off using processes. To justify the dangers of using threads, you'd need to see something like a factor of two or more gain in overall performance, and I don't see how that's going to be possible even on systems with very heavyweight processes. I might add that the case where you're likely to gain significant benefits from using either threads or subprocesses to handle concurrent sorts is one in which you probably *won't* get many concurrent connections...because if you're dealing with a lot of concurrent connections (no matter how long-lived they may be), you're probably *already* using all of the CPUs on the machine anyway. The situation where doing the concurrent subprocesses or subthreads will help you is one where the connections in question are relatively long-lived and are performing big, complex queries -- exactly the situation in which threads won't help you at all relative to subprocesses, because the amount of work to do on behalf of the connection will dwarf (that is, be many orders of magnitude greater than) the amount of time it takes to create, manage, and tear down a process. > > Of course, back here in the real world they *do* have to worry about > > this stuff, and that's why it's important to quantify the problem. > > It's not sufficient to say that "processes are slow and threads are > > fast". Processes on the target platform may well be slow relative to > > other systems (and relative to threads). But the question is: for the > > problem being solved, how much overhead does process handling > > represent relative to the total amount of overhead the solution itself > > incurs? > > That is correct. However it would be a fair assumption on part of > postgresql developers that a process once setup does not have much > of processing overhead involved as such, given the state of modern > server class OS and hardware. So postgresql as it is, fits in that > model. I mean it is fine that postgresql has heavy > connections. Simpler solution is to pool them. I'm in complete agreement here, and it's why I have very little faith that a threaded approach to any of the concurrency problems will yield enough benefits to justify the very significant drawbacks that a threaded approach brings to the table. -- Kevin Brown kevin@sysexperts.com
pgsql-hackers by date: