Re: [HACKERS] Threads - Mailing list pgsql-hackers

From Lamar Owen
Subject Re: [HACKERS] Threads
Date
Msg-id 37A78196.8F040AEE@wgcr.org
Whole thread Raw
In response to Re: [HACKERS] Threads  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [HACKERS] Threads
Re: [HACKERS] Threads
List pgsql-hackers
Tom Lane wrote:
> PQconnectdb() is the function that's not thread-safe; if you had
> multiple threads invoking PQconnectdb() in parallel you would see a
> problem.  PQconndefaults() is the function that creates an API problem,
> because it exposes the static variable that PQconnectdb() ought not have
> had in the first place.
> 
> There might be some other problems too, but that's the main one I'm
> aware of.  If we didn't mind breaking existing apps that use
> PQconndefaults(), it would be straightforward to fix...

Oh, this is interesting.  I've been pointer and thread chasing for the
last few hours trying to figure out why AOLserver (a multithreaded open
source web server that supports pooled database (inluding postgresql)
connections) doesn't hit this snag -- and I haven't yet found the
answer...

However, this does answer a question that I had had but had never
asked...

In any case, I have a couple of cents to throw in to the multithreaded
discussion at large:

1.)    While threads are nice for those programs that can benefit from
them, there are those tasks that are not ideally suited to threads. 
Whether postgresql could benefit or not, I don't know; it would be an
interesting excercise to wrewrite the executor to be multithreaded -- of
course, the hard part is identifying what each thread would do, etc.

2.)    A large multithreaded program, AOLserver, has just gone from a
multihost multiclient multithread model to a single host multiclient
multithread model: where AOLserver before would server as many virtual
hosts as you wished out of a single multi-threaded process, it was
determined through heavy stress-testing (after all, this server sits
behind www.aol.com, www.digitalcity.com, and others), that it was more
efficient to let the TCP/IP stack in the kernel handle address
multiplexing -- thus, the latest version requires you to start a
(multi-threaded) server process for each virtual host. The source code
for this server is a model of multithreaded server design -- see
aolserver.lcs.mit.edu for more.

3.)    Redesigning an existing single-threaded program to efficiently
utilize multithreading is non-trivial.  Highly non-trivial.  In fact,
efficiently multithreading an existing program may involve a complete
redesign of basic structures and algorithms -- it did in the case of
AOLserver (then called Naviserver), which was originally a thread-take
on the CERN httpd.

4.)    Threading PostgreSQL is going to be a massive effort -- and the
biggest part of that effort involves understanding the existing code
well enough to completely redesign the interaction of the internals --
it might be found that an efficient thread model involves multiple
layers of threads: one or more threads to parse the SQL source; one or
more threads to optimize the query, and one or more threads to execute
optimized SQL -- even while the parser is still parsing later statements
-- I realize that doesn't fit very well in the existing PostgreSQL
model.  However, the pipelined thread model could be a good fit -- for a
pooled connection or for long queries.  The most benefit could be had by
eliminating the postmaster/postgres linkage altogether, and having a
single postgres process handle multiple connections on its port in a
multiplexed-pipelined manner -- which is the model AOLserver uses.  

AOLserver works like this: when a connection request is received, a
thread is immediately dispatched to service the connection -- if a
thread in the precreated thread pool is available, it gets it, otherwise
a new thread is created, up to MAXTHREADS.

The connection thread then pulls the data necessary to service the HTTP
request (which can include dispatching a tcl interpreter thread or a
database driver thread out of the available database pools (up to
MAXPOOLS) to service dynamic content).  The data is sequentially
streamed to the connection, the connection is closed, and the thread
sleeps for a another dispatch.

Pretty simple in theory; a bear in practice.

So, hackers, are there areas of the backend itself that would benefit
from threading?  I'm sure the whole 'postmaster forking a backend'
process would benefit from threading from a pure performance point of
view, but stability could possibly suffer (although, this model is good
enough for www.aol.com....).  Can parsing/optimizing/executing be done
in a parallel/semi-parallel fashion?  Of course, one of the benefits is
going to be effective SMP utilization on architectures that support SMP
threading.  Multithreading the whole shooting match also eliminates the
need for interprocess communication via shared memory -- each connection
thread has the whole process context to work with.

The point is that it should be a full architectural redesign to properly
thread something as large as an RDBMS -- is it worth it, and, if so,
does anybody want to do it (who has enough pthreads experience to do it,
that is)?  No, I'm not volunteering -- I know enough about threads to be
dangerous, and know less about the postgres backend.  Not to mention a
great deal of hard work is going to be involved -- every single line of
code will have to be threadsafed -- not a fun prospect, IMO.

Anyone interesting in this stuff should take a look at some
well-threaded programs (such as AOLserver), and should be familiar with
some of the essential literature (such as O'Rielly's pthreads book).

Incidentally, with AOLserver's database connection pooling and
persistence, you get most of the benefits of a multithreaded backend
without the headaches of a multithreaded backend....

Lamar Owen
WGCR Internet Radio


pgsql-hackers by date:

Previous
From: "Ross J. Reedstrom"
Date:
Subject: Re: [HACKERS] Mariposa
Next
From: Adriaan Joubert
Date:
Subject: Re: [HACKERS] Mariposa