Re: Anyone working on better transaction locking? - Mailing list pgsql-hackers

From Kevin Brown
Subject Re: Anyone working on better transaction locking?
Date
Msg-id 20030412105452.GV1833@filer
Whole thread Raw
In response to Re: Anyone working on better transaction locking?  (Shridhar Daithankar <shridhar_daithankar@persistent.co.in>)
Responses Re: Anyone working on better transaction locking?
List pgsql-hackers
Shridhar Daithankar wrote:
> Apache does too many things to be a speed daemon and what it offers
> is pretty impressive from performance POV.
>
> But database is not webserver. It is not suppose to handle tons of
> concurrent requests. That is a fundamental difference.

I'm not sure I necessarily agree with this.  A database is just a
tool, a means of reliably storing information in such a way that it
can be retrieved quickly.  Whether or not it "should" handle lots of
concurrent requests is a question that the person trying to use it
must answer.

A better answer is that a database engine that can handle lots of
concurrent requests can also handle a smaller number, but not vice
versa.  So it's clearly an advantage to have a database engine that
can handle lots of concurrent requests because such an engine can be
applied to a larger number of problems.  That is, of course, assuming
that all other things are equal...

There are situations in which a database would have to handle a lot of
concurrent requests.  Handling ATM transactions over a large area is
one such situation.  A database with current weather information might
be another, if it is actively queried by clients all over the country.
Acting as a mail store for a large organization is another.  And, of
course, acting as a filesystem is definitely another.  :-)

> Well. Threading does not necessarily imply one thread per connection
> model. Threading can be used to make CPU work during I/O and taking
> advantage of SMP for things like sort etc. This is especially true
> for 2.4.x linux kernels where async I/O can not be used for threaded
> apps. as threads and signal do not mix together well.

This is true, but whether you choose to limit the use of threads to a
few specific situations or use them throughout the database, the
dangers and difficulties faced by the developers when using threads
will be the same.

> One connection per thread is not a good model for postgresql since
> it has already built a robust product around process paradigm. If I
> have to start a new database project today, a mix of process+thread
> is what I would choose bu postgresql is not in same stage of life.

Certainly there are situations for which it would be advantageous to
have multiple concurrent actions happening on behalf of a single
connection, as you say.  But that doesn't automatically mean that a
thread is the best overall solution.  On systems such as Linux that
have fast process handling, processes are almost certainly the way to
go.  On other systems such as Solaris or Windows, threads might be the
right answer (on Windows they might be the *only* answer).  But my
argument here is simple: the responsibility of optimizing process
handling belongs to the maintainers of the OS.  Application developers
shouldn't have to worry about this stuff.

Of course, back here in the real world they *do* have to worry about
this stuff, and that's why it's important to quantify the problem.
It's not sufficient to say that "processes are slow and threads are
fast".  Processes on the target platform may well be slow relative to
other systems (and relative to threads).  But the question is: for the
problem being solved, how much overhead does process handling
represent relative to the total amount of overhead the solution itself
incurs?

For instance, if we're talking about addressing the problem of
distributing sorts across multiple CPUs, the amount of overhead
involved in doing disk activity while sorting could easily swamp, in
the typical case, the overhead involved in creating parallel processes
to do the sorts themselves.  And if that's the case, you may as well
gain the benefits of using full-fledged processes rather than deal
with the problems that come with the use of threads -- because the
gains to be found by using threads will be small in relative terms.

> > > At their core, threads are a context switching efficiency tweak.
> >
> > This is the heart of the matter.  Context switching is an operating
> > system problem, and *that* is where the optimization belongs.  Threads
> > exist in large part because operating system vendors didn't bother to
> > do a good job of optimizing process context switching and
> > creation/destruction.
> 
> But why would a database need a tons of context switches if it is
> not supposed to service loads to request simaltenously? If there are
> 50 concurrent connections, how much context switching overhead is
> involved regardless of amount of work done in a single connection? 
> Remeber that database state is maintened in shared memory. It does
> not take a context switch to access it.

If there are 50 concurrent connections with one process per
connection, then there are 50 database processes.  The context switch
overhead is incurred whenever the current process blocks (or exhausts
its time slice) and the OS activates a different process.  Since
database handling is generally rather I/O intensive as services go,
relatively few of those 50 processes are likely to be in a runnable
state, so I would expect the overall hit from context switching to be
rather low -- I'd expect the I/O subsystem to fall over well before
context switching became a real issue.

Of course, all of that is independent of whether or not the database
can handle a lot of simultaneous requests.

> > Under Linux, from what I've read, process creation/destruction and
> > context switching happens almost as fast as thread context switching
> > on other operating systems (Windows in particular, if I'm not
> > mistaken).
> 
> I hear solaris also has very heavy processes. But postgresql has
> other issues with solaris as well.

Yeah, I didn't want to mention Solaris because I haven't kept up with
it and thought that perhaps they had fixed this...


-- 
Kevin Brown                          kevin@sysexperts.com



pgsql-hackers by date:

Previous
From: Shridhar Daithankar
Date:
Subject: Re: Anyone working on better transaction locking?
Next
From: Shridhar Daithankar
Date:
Subject: Re: Anyone working on better transaction locking?