Thread: Re: Reasoning behind process instead of thread based

Re: Reasoning behind process instead of thread based

From
Marco Colombo
Date:
On Thu, 28 Oct 2004 Richard_D_Levine@raytheon.com wrote:

>
> Marco,
>
> Wouldn't locking a process to a CPU cause the CPU to be idle during IO
> waits and semaphore locks?  Even if you didn't lock each DB process to a
> CPU, IO waits and locks for one session would stop processing on other
> sessions managed by the same process.  Moving the scheduler to user space
> seems to be reimplementing something the kernel knows best about.  Ever
> worked with Ada tasking architectures?  Not pretty.

Quick answers:
- there won't be any I/O wait;
- there won't be any semaphore-related wait;
- in my previous message, I've never mentioned the kernel scheduler;
- no, the kernel knows nothing about PostgreSQL sessions.

It seems quite obvious to me that in the "one flow of execution per CPU"
model, all I/O is non-blocking. Everything is event-driven.

With session "scheduler" I was referring to the (generic) operation
of serving multiple sessions. On a 1-way system we do want to serve more
than one client. Right now, we relay on the kernel in choosing which one
to run at a given moment. We _do_ know better of it in many cases, see
the priority inversion problem mentioned a few days ago on the list.

The above is true for most N-ways systems, since we still want to serve
M sessions, where usually M >> N.

.TM.
--
       ____/  ____/   /
      /      /       /            Marco Colombo
     ___/  ___  /   /              Technical Manager
    /          /   /             ESI s.r.l.
  _____/ _____/  _/               Colombo@ESI.it

Re: Reasoning behind process instead of thread based

From
Martijn van Oosterhout
Date:
On Thu, Oct 28, 2004 at 04:32:34PM +0200, Marco Colombo wrote:
> On Thu, 28 Oct 2004 Richard_D_Levine@raytheon.com wrote:
>
> >
> >Marco,
> >
> >Wouldn't locking a process to a CPU cause the CPU to be idle during IO
> >waits and semaphore locks?  Even if you didn't lock each DB process to a
> >CPU, IO waits and locks for one session would stop processing on other
> >sessions managed by the same process.  Moving the scheduler to user space
> >seems to be reimplementing something the kernel knows best about.  Ever
> >worked with Ada tasking architectures?  Not pretty.
>
> Quick answers:
> - there won't be any I/O wait;
> - there won't be any semaphore-related wait;
> - in my previous message, I've never mentioned the kernel scheduler;
> - no, the kernel knows nothing about PostgreSQL sessions.
>
> It seems quite obvious to me that in the "one flow of execution per CPU"
> model, all I/O is non-blocking. Everything is event-driven.

Ok, let me think here. I think what you are suggesting is like the way
IRC servers work. They have one process with a few thousand sockets and
simply process the messages as they come in.

Only you want to do it in a system that actually needs to do a lot of
processing. Basically, you'd need to make all I/O non-blocking and all
the code would probably need to be converted to callback routines or
something similar. If you have no global variables this is quite
doable, though of questionable benefit. The issues I can think of:

1. non-blocking is nice, but lots of OSes (eg POSIX) don't support it
on disk I/O unless you use a completely different interface.

2. If one of your 'processes' decides to do work for half an hour (say,
a really big merge sort), you're stuck. At least with multiple
processes the kernel will preempt you eventually.

3. On Linux anyway, most process scheduling occurs on I/O waits anyway.
Forcable preemption can only really occur up to 100 times per second
and unless someone else is available to run (ie not I/O wait) nothing
will change.

I honestly don't think you could really do a much better job of
scheduling than the kernel. The kernel has a much better idea of what
processes are waiting on, and more importantly, what other work is
happening on the same machine that also needs CPU time.
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Attachment

Re: Reasoning behind process instead of thread based

From
Thomas Hallgren
Date:
Martijn,
> I honestly don't think you could really do a much better job of
> scheduling than the kernel. The kernel has a much better idea of what
> processes are waiting on, and more importantly, what other work is
> happening on the same machine that also needs CPU time.
 >
I agree 100% with Martijn. Below is a reply that I sent to Marco some
days ago, although for some reason it was never received by the mailing
list.

----

Marco,

 > You ask what an event is? An event can be:
 > - input from a connection (usually a new query);
 > - notification that I/O needed by a pending query has completed;
 > - if we don't want a single query starve the server, an alarm of kind
 >   (I think this is a corner case, but still possible;)
 > - something else I haven't thought about.


Sounds very much like a description of the preemption points that a
user-space thread scheduler would use.

 > At any given moment, there are many pending queries. Most of them
 > will be waiting for I/O to complete. That's how the server handles
 > concurrent users.


In order to determine from where an event origins, say an I/O complete
event, you need to associate some structure with the I/O operation. That
structure defines the logical flow of all events for one particular
session or query, and as such it's not far from a lightweigth thread.
The only difference is that your "thread" resumes execution in a logical
sense (from the event loop) rather than a physical program counter
position. The resource consumption/performance would stay more or less
the same.

 > (*) They're oriented to general purpose processes. Think of how CPU
 > usage affects relative priorities. In a DB context, there may be
 > other criteria of greater significance. Roughly speaking, the larger
 > the part of the data a single session holds locked, the sooner it should
 > be completed. The kernel has no knowledge of this. To the kernel,
 > "big" processes are those that are using a lot of CPU. And the policy is
 > to slow them down. To a DB, a "big" queries are those that force the most
 > serialization ("lock a lot"), and they should be completed as soon as
 > possible.


Criteria based prioritisation is very interesting but I think your model
has some flaws:
- Since the kernel has no idea your process servers a lot of sessions
_it_ will be considered a "big" process.
- If a process/thread will do lots of I/O waits (likely for a "big"
query) it's unlikely that the kernel will consider it a CPU hog.
- Most big queries are read-only and hence, do not lock a lot of things.
- PostgreSQL uses MVCC which brings the concurrent lock problem down to
a minimum, even for queries that are not read-only.
- Giving big queries a lot of resources is not the desired behavior in
many cases.
- Your scheduler is confined to one CPU and cannot react to the system
as a whole.

I think it is more important that the scheduler can balance _all_
sessions among _all_ available resources on the machine.

Regards,
Thomas Hallgren




Re: Reasoning behind process instead of thread based

From
Neil Conway
Date:
I don't see the big difference between what Marco is suggesting and user
threads -- or to be more precise, I think user threads and event-based
programming are just two sides of the same coin. A user thread just
represents the state of a computation -- say, a register context and
some stack. It is exactly that *state* that is passed to a callback
function in the event-based model. The only difference is that with user
threads the system manages context for you, whereas the event-based
model lets the programmer manage it. Which model is better is difficult
to say.

Martijn van Oosterhout wrote:
> 1. non-blocking is nice, but lots of OSes (eg POSIX) don't support it
> on disk I/O unless you use a completely different interface.

We could implement I/O via something like POSIX AIO or a pool of worker
threads that do the actual I/O in a synchronous fashion. But yeah,
either way it's a major change.

> 2. If one of your 'processes' decides to do work for half an hour (say,
> a really big merge sort), you're stuck.

It would be relatively easy to insert yield points into the code to
prevent this from occurring. However, preemptive scheduling would come
in handy when running "foreign" code (e.g. user-defined functions in C).

> I honestly don't think you could really do a much better job of
> scheduling than the kernel.

I think we could do better than the kernel by taking advantage of
domain-specific knowledge, I'm just not sure we could beat the kernel by
enough to make this worth doing.

BTW, I think this thread is really interesting -- certainly more
informative than a rehash of the usual "processes vs. threads" debate.

-Neil

Re: Reasoning behind process instead of thread based

From
Marco Colombo
Date:
[Cc: list minimized]

On Tue, 2 Nov 2004, Neil Conway wrote:

> I don't see the big difference between what Marco is suggesting and user
> threads -- or to be more precise, I think user threads and event-based
> programming are just two sides of the same coin. A user thread just
> represents the state of a computation -- say, a register context and some
> stack. It is exactly that *state* that is passed to a callback function in
> the event-based model. The only difference is that with user threads the
> system manages context for you, whereas the event-based model lets the
> programmer manage it. Which model is better is difficult to say.

Well, the difference is that in a pure event-driven model, you
(the programmer) have full control over what the state is. Any thread
library offers a "general purpose" thread, which may be more than
what you want/need.
Of course, very often userland threads are good implementation of
an even-driven model. Think of GUIs.

The problem is not threads or not. The problem is one thread/process
per session, as opposed to a few specialized threads or one thread per
outstanding query. We can start another "thread" :-) on threads in
general but it would be largely off-topic here.

> Martijn van Oosterhout wrote:
>> 1. non-blocking is nice, but lots of OSes (eg POSIX) don't support it
>> on disk I/O unless you use a completely different interface.
>
> We could implement I/O via something like POSIX AIO or a pool of worker
> threads that do the actual I/O in a synchronous fashion. But yeah, either way
> it's a major change.
>
>> 2. If one of your 'processes' decides to do work for half an hour (say,
>> a really big merge sort), you're stuck.
>
> It would be relatively easy to insert yield points into the code to prevent
> this from occurring. However, preemptive scheduling would come in handy when
> running "foreign" code (e.g. user-defined functions in C).
>
>> I honestly don't think you could really do a much better job of
>> scheduling than the kernel.
>
> I think we could do better than the kernel by taking advantage of
> domain-specific knowledge, I'm just not sure we could beat the kernel by
> enough to make this worth doing.
>
> BTW, I think this thread is really interesting -- certainly more informative
> than a rehash of the usual "processes vs. threads" debate.

Thanks, that was the whole point.

I thought that the even-driven model was well-understood, I personally
consider it an established alternative to the threads/processes one.
I'd do a bad and pointless job in further explaining it. Please let me
just throw a few URLs in...

http://www.usenix.org/events/usenix01/full_papers/chandra/chandra_html/index.html

A random quote to attract readers: :-)

   In general, thread-per-connection servers have the drawback of large
   forking and context-switching overhead. In addition, the memory usage
   due to threads' individual stack space can become huge for handling
   large number of concurrent connections. The problem is even more
   pronounced if the operating system does not support kernel-level
   threads, and the application has to use processes or user-level
   threads. It has been shown that thread-based servers do not scale well
   at high loads [7]. Hence, many servers are structured as event-based
   applications, whose performance is determined by the efficiency of event
   notification mechanisms they employ. Pure event-based servers do not
   scale to multiprocessor machines, and hence, on SMP machines, hybrid
   schemes need to be employed, where we have a multi-threaded server
   with each thread using event-handling as a mechanism for servicing
   concurrent connections. Even with a hybrid server, the performance of
   event-based mechanisms is an important issue. Since efficient event
   dispatching is at the core of both event-based and hybrid servers,
   we will focus on the former here.


http://www.kegel.com/c10k.html

This paper is very complete, it covers almost all possible techniques
to implement even-driver servers, and it's a very interesting reading
anyway.
Please note that the rationale behind it is the "C10k problem", which
I _don't_ think we're facing here. There are some nice properties
of even-driven servers other than being able to  handle 100K connections,
IMHO.

All this started from the priority inversion problem, a few messages ago
on this list. The problem was to 'slow down' a query.
In general, I've been thinking about a not-so-cooperative environment,
which demands for some active measures to limit resources used by a
session (other than the DBA yelling at the (mis)user). Think of high
density web services, with hundreds of sites on the same host.
Even-driven servers easily allow to take full control over the resources
allocated to each session.

.TM.
--
       ____/  ____/   /
      /      /       /            Marco Colombo
     ___/  ___  /   /              Technical Manager
    /          /   /             ESI s.r.l.
  _____/ _____/  _/               Colombo@ESI.it