Re: Anyone working on better transaction locking? - Mailing list pgsql-hackers

From Kevin Brown
Subject Re: Anyone working on better transaction locking?
Date
Msg-id 20030411213259.GU1833@filer
Whole thread Raw
In response to Re: Anyone working on better transaction locking?  ("Ron Peacetree" <rjpeace@earthlink.net>)
Responses Re: Anyone working on better transaction locking?  (Shridhar Daithankar <shridhar_daithankar@persistent.co.in>)
Re: Anyone working on better transaction locking?  (Neil Conway <neilc@samurai.com>)
List pgsql-hackers
Ron Peacetree wrote:
> Zeus had a performance ceiling roughly 3x that of Apache when Zeus
> supported threading as well as pre-forking and Apache only supported
> pre forking.  The Apache folks now support both.  DB2, Oracle, and SQL
> Server all use threads.  Etc, etc.

You can't use Apache as an example of why you should thread a database
engine, except for the cases where the database is used much like the
web server is: for numerous short transactions.

> That's an awful lot of very bright programmers and some serious $$
> voting that threads are worth it.  Given all that, if PostgreSQL
> specific thread support is =not= showing itself to be a win that's
> an unexpected enough outcome that we should be asking hard questions
> as to why not.

It's not that there won't be any performance benefits to be had from
threading (there surely will, on some platforms), but gaining those
benefits comes at a very high development and maintenance cost.  You
lose a *lot* of robustness when all of your threads share the same
memory space, and make yourself vulnerable to classes of failures that
simply don't happen when you don't have shared memory space.

PostgreSQL is a compromise in this regard: it *does* share memory, but
it only shares memory that has to be shared, and nothing else.  To get
the benefits of full-fledged threads, though, requires that all memory
be shared (otherwise the OS has to tweak the page tables whenever it
switches contexts between your threads).

> At their core, threads are a context switching efficiency tweak.

This is the heart of the matter.  Context switching is an operating
system problem, and *that* is where the optimization belongs.  Threads
exist in large part because operating system vendors didn't bother to
do a good job of optimizing process context switching and
creation/destruction.

Under Linux, from what I've read, process creation/destruction and
context switching happens almost as fast as thread context switching
on other operating systems (Windows in particular, if I'm not
mistaken).

> Since DB's switch context a lot under many circumstances, threads
> should be a win under such circumstances.  At the least, it should be
> helpful in situations where we have multiple CPUs to split query
> execution between.

This is true, but I see little reason that we can't do the same thing
using fork()ed processes and shared memory instead.

There is context switching within databases, to be sure, but I think
you'll be hard pressed to demonstrate that it is anything more than an
insignificant fraction of the total overhead incurred by the database.
I strongly suspect that much larger gains are to be had by optimizing
other areas of the database, such as the planner, the storage manager
(using mmap for file handling may prove useful here), the shared
memory system (mmap may be faster than System V style shared memory),
etc.

The big overhead in the process model on most platforms is in creation
and destruction of processes.  PostgreSQL has a relatively high
connection startup cost.  But there are ways of dealing with this
problem other than threading, namely the use of a connection caching
middleware layer.  Such layers exist for databases other than
PostgreSQL, so the high cost of fielding and setting up a database
connection is *not* unique to PostgreSQL ... which suggests that while
threading may help, it doesn't help *enough*.

I'd rather see some development work go into a connection caching
process that understands the PostgreSQL wire protocol well enough to
look like a PostgreSQL backend to connecting processes, rather than
see a much larger amount of effort be spent on converting PostgreSQL
to a threaded architecture (and then discover that connection caching
is still needed anyway).

> Certainly it's =possible= that threads have nothing to offer
> PostgreSQL, but IMHO it's not =probable=.  Just another thing for me
> to add to my TODO heap for looking at...

It's not that threads don't have anything to offer.  It's that the
costs associated with them are high enough that it's not at all clear
that they're an overall win.


-- 
Kevin Brown                          kevin@sysexperts.com



pgsql-hackers by date:

Previous
From: Sean Chittenden
Date:
Subject: Re: [GENERAL] medical image on postgreSQL?
Next
From: "Nigel J. Andrews"
Date:
Subject: Re: How do you execute a postgresql function from perl?