Re: [HACKERS] Threads - Mailing list pgsql-hackers
From | Lamar Owen |
---|---|
Subject | Re: [HACKERS] Threads |
Date | |
Msg-id | 37A78196.8F040AEE@wgcr.org Whole thread Raw |
In response to | Re: [HACKERS] Threads (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: [HACKERS] Threads
Re: [HACKERS] Threads |
List | pgsql-hackers |
Tom Lane wrote: > PQconnectdb() is the function that's not thread-safe; if you had > multiple threads invoking PQconnectdb() in parallel you would see a > problem. PQconndefaults() is the function that creates an API problem, > because it exposes the static variable that PQconnectdb() ought not have > had in the first place. > > There might be some other problems too, but that's the main one I'm > aware of. If we didn't mind breaking existing apps that use > PQconndefaults(), it would be straightforward to fix... Oh, this is interesting. I've been pointer and thread chasing for the last few hours trying to figure out why AOLserver (a multithreaded open source web server that supports pooled database (inluding postgresql) connections) doesn't hit this snag -- and I haven't yet found the answer... However, this does answer a question that I had had but had never asked... In any case, I have a couple of cents to throw in to the multithreaded discussion at large: 1.) While threads are nice for those programs that can benefit from them, there are those tasks that are not ideally suited to threads. Whether postgresql could benefit or not, I don't know; it would be an interesting excercise to wrewrite the executor to be multithreaded -- of course, the hard part is identifying what each thread would do, etc. 2.) A large multithreaded program, AOLserver, has just gone from a multihost multiclient multithread model to a single host multiclient multithread model: where AOLserver before would server as many virtual hosts as you wished out of a single multi-threaded process, it was determined through heavy stress-testing (after all, this server sits behind www.aol.com, www.digitalcity.com, and others), that it was more efficient to let the TCP/IP stack in the kernel handle address multiplexing -- thus, the latest version requires you to start a (multi-threaded) server process for each virtual host. The source code for this server is a model of multithreaded server design -- see aolserver.lcs.mit.edu for more. 3.) Redesigning an existing single-threaded program to efficiently utilize multithreading is non-trivial. Highly non-trivial. In fact, efficiently multithreading an existing program may involve a complete redesign of basic structures and algorithms -- it did in the case of AOLserver (then called Naviserver), which was originally a thread-take on the CERN httpd. 4.) Threading PostgreSQL is going to be a massive effort -- and the biggest part of that effort involves understanding the existing code well enough to completely redesign the interaction of the internals -- it might be found that an efficient thread model involves multiple layers of threads: one or more threads to parse the SQL source; one or more threads to optimize the query, and one or more threads to execute optimized SQL -- even while the parser is still parsing later statements -- I realize that doesn't fit very well in the existing PostgreSQL model. However, the pipelined thread model could be a good fit -- for a pooled connection or for long queries. The most benefit could be had by eliminating the postmaster/postgres linkage altogether, and having a single postgres process handle multiple connections on its port in a multiplexed-pipelined manner -- which is the model AOLserver uses. AOLserver works like this: when a connection request is received, a thread is immediately dispatched to service the connection -- if a thread in the precreated thread pool is available, it gets it, otherwise a new thread is created, up to MAXTHREADS. The connection thread then pulls the data necessary to service the HTTP request (which can include dispatching a tcl interpreter thread or a database driver thread out of the available database pools (up to MAXPOOLS) to service dynamic content). The data is sequentially streamed to the connection, the connection is closed, and the thread sleeps for a another dispatch. Pretty simple in theory; a bear in practice. So, hackers, are there areas of the backend itself that would benefit from threading? I'm sure the whole 'postmaster forking a backend' process would benefit from threading from a pure performance point of view, but stability could possibly suffer (although, this model is good enough for www.aol.com....). Can parsing/optimizing/executing be done in a parallel/semi-parallel fashion? Of course, one of the benefits is going to be effective SMP utilization on architectures that support SMP threading. Multithreading the whole shooting match also eliminates the need for interprocess communication via shared memory -- each connection thread has the whole process context to work with. The point is that it should be a full architectural redesign to properly thread something as large as an RDBMS -- is it worth it, and, if so, does anybody want to do it (who has enough pthreads experience to do it, that is)? No, I'm not volunteering -- I know enough about threads to be dangerous, and know less about the postgres backend. Not to mention a great deal of hard work is going to be involved -- every single line of code will have to be threadsafed -- not a fun prospect, IMO. Anyone interesting in this stuff should take a look at some well-threaded programs (such as AOLserver), and should be familiar with some of the essential literature (such as O'Rielly's pthreads book). Incidentally, with AOLserver's database connection pooling and persistence, you get most of the benefits of a multithreaded backend without the headaches of a multithreaded backend.... Lamar Owen WGCR Internet Radio
pgsql-hackers by date: