Re: Anyone working on better transaction locking? - Mailing list pgsql-hackers
From | Kevin Brown |
---|---|
Subject | Re: Anyone working on better transaction locking? |
Date | |
Msg-id | 20030411213259.GU1833@filer Whole thread Raw |
In response to | Re: Anyone working on better transaction locking? ("Ron Peacetree" <rjpeace@earthlink.net>) |
Responses |
Re: Anyone working on better transaction locking?
Re: Anyone working on better transaction locking? |
List | pgsql-hackers |
Ron Peacetree wrote: > Zeus had a performance ceiling roughly 3x that of Apache when Zeus > supported threading as well as pre-forking and Apache only supported > pre forking. The Apache folks now support both. DB2, Oracle, and SQL > Server all use threads. Etc, etc. You can't use Apache as an example of why you should thread a database engine, except for the cases where the database is used much like the web server is: for numerous short transactions. > That's an awful lot of very bright programmers and some serious $$ > voting that threads are worth it. Given all that, if PostgreSQL > specific thread support is =not= showing itself to be a win that's > an unexpected enough outcome that we should be asking hard questions > as to why not. It's not that there won't be any performance benefits to be had from threading (there surely will, on some platforms), but gaining those benefits comes at a very high development and maintenance cost. You lose a *lot* of robustness when all of your threads share the same memory space, and make yourself vulnerable to classes of failures that simply don't happen when you don't have shared memory space. PostgreSQL is a compromise in this regard: it *does* share memory, but it only shares memory that has to be shared, and nothing else. To get the benefits of full-fledged threads, though, requires that all memory be shared (otherwise the OS has to tweak the page tables whenever it switches contexts between your threads). > At their core, threads are a context switching efficiency tweak. This is the heart of the matter. Context switching is an operating system problem, and *that* is where the optimization belongs. Threads exist in large part because operating system vendors didn't bother to do a good job of optimizing process context switching and creation/destruction. Under Linux, from what I've read, process creation/destruction and context switching happens almost as fast as thread context switching on other operating systems (Windows in particular, if I'm not mistaken). > Since DB's switch context a lot under many circumstances, threads > should be a win under such circumstances. At the least, it should be > helpful in situations where we have multiple CPUs to split query > execution between. This is true, but I see little reason that we can't do the same thing using fork()ed processes and shared memory instead. There is context switching within databases, to be sure, but I think you'll be hard pressed to demonstrate that it is anything more than an insignificant fraction of the total overhead incurred by the database. I strongly suspect that much larger gains are to be had by optimizing other areas of the database, such as the planner, the storage manager (using mmap for file handling may prove useful here), the shared memory system (mmap may be faster than System V style shared memory), etc. The big overhead in the process model on most platforms is in creation and destruction of processes. PostgreSQL has a relatively high connection startup cost. But there are ways of dealing with this problem other than threading, namely the use of a connection caching middleware layer. Such layers exist for databases other than PostgreSQL, so the high cost of fielding and setting up a database connection is *not* unique to PostgreSQL ... which suggests that while threading may help, it doesn't help *enough*. I'd rather see some development work go into a connection caching process that understands the PostgreSQL wire protocol well enough to look like a PostgreSQL backend to connecting processes, rather than see a much larger amount of effort be spent on converting PostgreSQL to a threaded architecture (and then discover that connection caching is still needed anyway). > Certainly it's =possible= that threads have nothing to offer > PostgreSQL, but IMHO it's not =probable=. Just another thing for me > to add to my TODO heap for looking at... It's not that threads don't have anything to offer. It's that the costs associated with them are high enough that it's not at all clear that they're an overall win. -- Kevin Brown kevin@sysexperts.com
pgsql-hackers by date: