Thread: Threads

Threads

From
Shridhar Daithankar
Date:
Hi all,

I am sure, many of you would like to delete this message before reading, hold
on. :-)

There is much talk about threading on this list and the idea is always
deferred for want of robust thread models across all supported platforms and
feasibility of gains v/s efforts required.

I think threads are useful in difference situations namely parallelising
blocking conditions and using multiple CPUs.

Attached is a framework that I ported to C from a C++ server I have written.
It has threadpool and threads implementation based on pthreads.

This code expects minimum pthreads implementation and does not assume anything
on threads part (e.g kernel threads or not etc.)

I request hackers on this list to take a look at it. It should be easily
pluggable in any source code and is released without any strings for any use.

This framework allows to plug-in the worker function and argument on the fly.
The threads created are sleeping by default and can be woken up s and when
required.

I propose to use it incrementally in postgresql. Let's start with I/O. When a
block of data is being read, rather than blocking for read, we can set up
creator-consumer link between two threads That we way can utilize that I/O
time in a overlapped fashion.

Further threads can be useful when the server has more CPUs. It can spread CPU
intensive work to different threads such as index creation or sorting. This
way we can utilise idle CPU which we can not as of now.

There are many advantages that I can see.

1)Threads can be optionally turned on/off depending upon the configuration. So
we can entirely keep existing functionality and convert them one-by-one to
threaded application.

2)For each functionality we can have two code branches, one that do not use
threads i.e. current code base and one that can use threads. Agreed the
binary will be bit bloated but that would give enormous flexibility. If we
find a thread implementation buggy, we simply switch it off either in
compilation or inconfiguration.

3) Not much efforts should be required to plug code into this model. The idea
of using threads is to assign exclusive work to each thread. So that should
not require much of a locking.

In case of using multiple CPUs, separate functions need be written that can
handle the things in a thread-safe fashion. Also a merger function would be
required which would merge results of worker threads. That would be totally
additional.

I would say two threads per CPU per back-end should be a reasonable default as
that would cover I/O blocking well. Of course unless threading is turned off
in build or in configuration.

Please note that I have tested the code in C++ and my C is rusty. Quite likely
there are bugs in the code. I will stress test the code on monday but I would
like to seek an opinion on this as soon as possible. ( Hey but it compiles
clean..)

If required I can post example usage of this code, but I don't think that
should be necessary.:-)

ByeShridhar

Re: Threads

From
mlw
Date:
Please no threading threads!!!

Has anyone calculated the interval and period of "PostgreSQL needs 
threads" posts?

The *ONLY* advantage threading has over multiple processes is the time 
and resources used in creating new processes.

That being said, I admit that creating a threaded program is easier than 
one with multiple processes, but PostgreSQL is already there and working.

Drawbacks to a threaded model:

(1) One thread screws up, the whole process dies. In a multiple process 
application this is not too much of an issue.

(2) Heap fragmentation. In a long uptime application, such as a 
database, heap fragmentation is an important consideration. With 
multiple processes, each process manages its own heap and what ever 
fragmentation that exists goes away when the connection is closed.  A 
threaded server is far more vulnerable because the heap has to manage 
many threads and the heap has to stay active and unfragmented in 
perpetuity. This is why Windows applications usually end up using 2G of 
memory after 3 months of use. (Well, this AND memory leaks)

(3) Stack space. In a threaded application they are more limits to stack 
usage. I'm not sure, but I bet PostgreSQL would have a problem with a 
fixed size stack, I know the old ODBC driver did.

(4) Lock Contention. The various single points of access in a process 
have to be serialized for multiple threads. heap allocation, 
deallocation, etc all have to be managed. In a multple process model, 
these resources would be separated by process contexts.

(5) Lastly, why bother? Seriously? Process creation time is an issue 
true, but its an issue with threads as well, just not as bad. Anyone who 
is looking for performance should be using a connection pooling 
mechanism as is done in things like PHP.

I have done both threaded and process servers. The threaded servers are 
easier to write. The process based severs are more robust. From an 
operational point of view, a "select foo from bar where x > y" will take 
he same amount of time.





Re: Threads

From
"Dann Corbit"
Date:
> -----Original Message-----
> From: mlw [mailto:pgsql@mohawksoft.com]
> Sent: Friday, January 03, 2003 12:47 PM
> To: Shridhar Daithankar
> Cc: PGHackers
> Subject: Re: [HACKERS] Threads
>
>
> Please no threading threads!!!
>
> Has anyone calculated the interval and period of "PostgreSQL needs
> threads" posts?
>
> The *ONLY* advantage threading has over multiple processes is
> the time
> and resources used in creating new processes.

Threading is absurdly easier to do portably than fork().

Will you fork() successfully on MVS, VMS, OS/2, Win32?

On some operating systems, thread creation is absurdly faster than
process creation (many orders of magnitude).
> That being said, I admit that creating a threaded program is
> easier than
> one with multiple processes, but PostgreSQL is already there
> and working.
>
> Drawbacks to a threaded model:
>
> (1) One thread screws up, the whole process dies. In a
> multiple process
> application this is not too much of an issue.

If you use C++ you can try/catch and nothing bad happens to anything but
the naughty thread.
> (2) Heap fragmentation. In a long uptime application, such as a
> database, heap fragmentation is an important consideration. With
> multiple processes, each process manages its own heap and what ever
> fragmentation that exists goes away when the connection is closed.  A
> threaded server is far more vulnerable because the heap has to manage
> many threads and the heap has to stay active and unfragmented in
> perpetuity. This is why Windows applications usually end up
> using 2G of
> memory after 3 months of use. (Well, this AND memory leaks)

Poorly written applications leak memory.  Fragmentation is a legitimate
concern.
> (3) Stack space. In a threaded application they are more
> limits to stack
> usage. I'm not sure, but I bet PostgreSQL would have a problem with a
> fixed size stack, I know the old ODBC driver did.

A single server with 20 threads will consume less total free store
memory and automatic memory than 20 servers.  You have to decide how
much stack to give a thread, that's true.
> (4) Lock Contention. The various single points of access in a process
> have to be serialized for multiple threads. heap allocation,
> deallocation, etc all have to be managed. In a multple process model,
> these resources would be separated by process contexts.

Semaphores are more complicated than critical sections.  If anything, a
shared memory approach is more problematic and fragile, especially when
porting to multiple operating systems.
> (5) Lastly, why bother? Seriously? Process creation time is an issue
> true, but its an issue with threads as well, just not as bad.
> Anyone who
> is looking for performance should be using a connection pooling
> mechanism as is done in things like PHP.
>
> I have done both threaded and process servers. The threaded
> servers are
> easier to write. The process based severs are more robust. From an
> operational point of view, a "select foo from bar where x >
> y" will take
> he same amount of time.

Probably true.  I think a better solution is a server that can start
threads or processes or both.  But that's neither here nor there and I'm
certainly not volunteering to write it.

Here is a solution to the dilemma.  Make the one who suggests the
feature be the first volunteer on the team that writes it.

Is it a FAQ?  If not, it ought to be.


Re: Threads

From
Greg Copeland
Date:
On Fri, 2003-01-03 at 14:47, mlw wrote:
> Please no threading threads!!!
> 

Ya, I'm very pro threads but I've long since been sold on no threads for
PostgreSQL.  AIO on the other hand... ;)

Your summary so accurately addresses the issue it should be a whole FAQ
entry on threads and PostgreSQL.  :)


> Drawbacks to a threaded model:
> 
> (1) One thread screws up, the whole process dies. In a multiple process 
> application this is not too much of an issue.
> 
> (2) Heap fragmentation. In a long uptime application, such as a 
> database, heap fragmentation is an important consideration. With 
> multiple processes, each process manages its own heap and what ever 
> fragmentation that exists goes away when the connection is closed.  A 
> threaded server is far more vulnerable because the heap has to manage 
> many threads and the heap has to stay active and unfragmented in 
> perpetuity. This is why Windows applications usually end up using 2G of 
> memory after 3 months of use. (Well, this AND memory leaks)


These are things that can't be stressed enough.  IMO, these are some of
the many reasons why applications running on MS platforms tend to have
much lower application and system up times (that and resources leaks
which are inherent to the platform).

BTW, if you do much in the way of threaded coding, there is libHorde
which is a heap library for heavily threaded, memory hungry
applications.  It excels in performance, reduces heap lock contention
(maintains multiple heaps in a very thread smart manner), and goes a
long way toward reducing heap fragmentation which is common for heavily
memory based, threaded applications.


> (3) Stack space. In a threaded application they are more limits to stack 
> usage. I'm not sure, but I bet PostgreSQL would have a problem with a 
> fixed size stack, I know the old ODBC driver did.
> 

Most modern thread implementations use a page guard on the stack to
determine if it needs to grow or not.  Generally speaking, for most
modern platforms which support threading, stack considerations rarely
become an issue.


> (5) Lastly, why bother? Seriously? Process creation time is an issue 
> true, but its an issue with threads as well, just not as bad. Anyone who 
> is looking for performance should be using a connection pooling 
> mechanism as is done in things like PHP.
> 
> I have done both threaded and process servers. The threaded servers are 
> easier to write. The process based severs are more robust. From an 
> operational point of view, a "select foo from bar where x > y" will take 
> he same amount of time.
> 

I agree with this, however, using threads does open the door for things
like splitting queries and sorts across multiple CPUs.  Something the
current process model, which was previously agreed on, would not be able
to address because of cost.

Example: "select foo from bar where x > y order by foo ;", could be run
on multiple CPUs if the sort were large enough to justify.

After it's all said and done, I do agree that threading just doesn't
seem like a good fit for PostgreSQL.

-- 
Greg Copeland <greg@copelandconsulting.net>
Copeland Computer Consulting



Re: Threads

From
Greg Copeland
Date:
On Fri, 2003-01-03 at 14:52, Dann Corbit wrote:
> > -----Original Message-----
> > (1) One thread screws up, the whole process dies. In a 
> > multiple process 
> > application this is not too much of an issue.
> 
> If you use C++ you can try/catch and nothing bad happens to anything but
> the naughty thread.

That doesn't protect against the type of issues he's talking about. 
Invalid pointer reference is a very common snafu which really hoses
threaded applications.  Not to mention resource leaks AND LOCKED
resources which are inherently an issue on Win32.

Besides, it's doubtful that PostgreSQL is going to be rewritten in C++
so bringing up try/catch is pretty much an invalid argument.

>  
> > (2) Heap fragmentation. In a long uptime application, such as a 
> > database, heap fragmentation is an important consideration. With 
> > multiple processes, each process manages its own heap and what ever 
> > fragmentation that exists goes away when the connection is closed.  A 
> > threaded server is far more vulnerable because the heap has to manage 
> > many threads and the heap has to stay active and unfragmented in 
> > perpetuity. This is why Windows applications usually end up 
> > using 2G of 
> > memory after 3 months of use. (Well, this AND memory leaks)
> 
> Poorly written applications leak memory.  Fragmentation is a legitimate
> concern.

And well written applications which attempt to safely handle segfaults,
etc., often leak memory and lock resources like crazy.  On Win32,
depending on the nature of the resources, once this happens, even
process termination will not free/unlock the resources.

> > (4) Lock Contention. The various single points of access in a process 
> > have to be serialized for multiple threads. heap allocation, 
> > deallocation, etc all have to be managed. In a multple process model, 
> > these resources would be separated by process contexts.
> 
> Semaphores are more complicated than critical sections.  If anything, a
> shared memory approach is more problematic and fragile, especially when
> porting to multiple operating systems.

And critical sections lead to low performance on SMP systems for Win32
platforms.  No task can switch on ANY CPU for the duration of the
critical section.  It's highly recommend by MS as the majority of Win32
applications expect uniprocessor systems and they are VERY fast.  As
soon as multiple processors come into the mix, critical sections become
a HORRIBLE idea if any soft of scalability is desired.


> Is it a FAQ?  If not, it ought to be.

I agree.  I think mlw's list of reasons should be added to a faq.  It
terse yet says it all!


-- 
Greg Copeland <greg@copelandconsulting.net>
Copeland Computer Consulting



Re: Threads

From
"bbaker@priefert.com"
Date:
>
>
>I am sure, many of you would like to delete this message before reading, hold 
>on. :-)
>

I'm afraid most posters did not read the message.  Those who replied 
"Why bother?" did not address your challenge:

>I think threads are useful in difference situations namely parallelising 
>blocking conditions and using multiple CPUs.
>  
>
This is indeed one of the few good reasons for threads.  Indeed, 
large/robust systems use a mix.

The consensus of the group is that those who do the work are not ready 
for threads.  Which is fine.  Looking into my crystal ball, I see that 
it will happen, though it appears so far away.

bbaker




Re: Threads

From
mlw
Date:

Greg Copeland wrote:

>On Fri, 2003-01-03 at 14:47, mlw wrote:
>  
>
>>Please no threading threads!!!
>>
>>    
>>
>
>Ya, I'm very pro threads but I've long since been sold on no threads for
>PostgreSQL.  AIO on the other hand... ;)
>
>Your summary so accurately addresses the issue it should be a whole FAQ
>entry on threads and PostgreSQL.  :)
>
Thanks! I do like threads myself. Love them! Loving them, however does 
not mean that one should ignore their weaknesses. I have a PHP session 
handler (msession) which is threaded, but I am very careful with memory 
allocation, locks, and so on. I also do a lot of padding in memory 
allocations. I know it is wasteful in the short term, but it keeps the 
little gnats from hosing up the heap.

>>Drawbacks to a threaded model:
>>
>>(1) One thread screws up, the whole process dies. In a multiple process 
>>application this is not too much of an issue.
>>
>>(2) Heap fragmentation. In a long uptime application, such as a 
>>database, heap fragmentation is an important consideration. With 
>>multiple processes, each process manages its own heap and what ever 
>>fragmentation that exists goes away when the connection is closed.  A 
>>threaded server is far more vulnerable because the heap has to manage 
>>many threads and the heap has to stay active and unfragmented in 
>>perpetuity. This is why Windows applications usually end up using 2G of 
>>memory after 3 months of use. (Well, this AND memory leaks)
>>    
>>
>
>
>These are things that can't be stressed enough.  IMO, these are some of
>the many reasons why applications running on MS platforms tend to have
>much lower application and system up times (that and resources leaks
>which are inherent to the platform).
>
>BTW, if you do much in the way of threaded coding, there is libHorde
>which is a heap library for heavily threaded, memory hungry
>applications.  It excels in performance, reduces heap lock contention
>(maintains multiple heaps in a very thread smart manner), and goes a
>long way toward reducing heap fragmentation which is common for heavily
>memory based, threaded applications.
>
Thank's I'll take a look.

>  
>
>>(3) Stack space. In a threaded application they are more limits to stack 
>>usage. I'm not sure, but I bet PostgreSQL would have a problem with a 
>>fixed size stack, I know the old ODBC driver did.
>>
>>    
>>
>
>Most modern thread implementations use a page guard on the stack to
>determine if it needs to grow or not.  Generally speaking, for most
>modern platforms which support threading, stack considerations rarely
>become an issue.
>
One of my projects, msesson, I wrote a SQL (PG and ODBC) plugin. The 
main system thread didn't crash, the server threads went down quickly. I 
had to bump the thread stack up to 250K to work. That doesn't sound like 
much, but if you have 200 connections to your server, thats a lot of 
memory that has to be fit into the process space.

>>(5) Lastly, why bother? Seriously? Process creation time is an issue 
>>true, but its an issue with threads as well, just not as bad. Anyone who 
>>is looking for performance should be using a connection pooling 
>>mechanism as is done in things like PHP.
>>
>>I have done both threaded and process servers. The threaded servers are 
>>easier to write. The process based severs are more robust. From an 
>>operational point of view, a "select foo from bar where x > y" will take 
>>he same amount of time.
>>
>>    
>>
>
>I agree with this, however, using threads does open the door for things
>like splitting queries and sorts across multiple CPUs.  Something the
>current process model, which was previously agreed on, would not be able
>to address because of cost.
>
>Example: "select foo from bar where x > y order by foo ;", could be run
>on multiple CPUs if the sort were large enough to justify.
>
>After it's all said and done, I do agree that threading just doesn't
>seem like a good fit for PostgreSQL.
>
Yes, absolutely, if PostgreSQL ever grew threads, I think that should be 
the focus, forget the threaded connection crap, threaded queries!!

How about this:
select T1.foo, X1.bar from (select * from T) as T1, (select * from X) as 
X1 where T1.id = X1.id


The two sub queries could execute in parallel. That would rock!

>
>  
>




Re: Threads

From
"Serguei Mokhov"
Date:
----- Original Message ----- 
From: "Greg Copeland" <greg@CopelandConsulting.Net>
Sent: January 03, 2003 4:45 PM

> > > (1) One thread screws up, the whole process dies. In a 
> > > multiple process 
> > > application this is not too much of an issue.
> > 
> > If you use C++ you can try/catch and nothing bad happens to anything but
> > the naughty thread.
> 
> That doesn't protect against the type of issues he's talking about. 
> Invalid pointer reference is a very common snafu which really hoses
> threaded applications.  Not to mention resource leaks AND LOCKED
> resources which are inherently an issue on Win32.

(1) is an issue only for user-level threads. And besides...

----- Original Message ----- 
From: "Dann Corbit" <DCorbit@connx.com>
Sent: January 03, 2003 3:52 PM

> Here is a solution to the dilemma.  Make the one who suggests the
> feature be the first volunteer on the team that writes it.

.. and that's exactly what Shridhar did - he's sent in the code
_already_ in his post, allowing framework to plug in the model
into PG, as he says allowing turning on and off threads where
appropriate and keeping the current model as well. But noone
bothered to go over it if it makes sense.


----- Original Message ----- 
From: "Greg Copeland" <greg@CopelandConsulting.Net>
Sent: January 03, 2003 4:45 PM

> > Is it a FAQ?  If not, it ought to be.
> 
> I agree.  I think mlw's list of reasons should be added to a faq.  It
> terse yet says it all!

<http://developer.postgresql.org/readtext.php?src/FAQ/FAQ_DEV.html+Developers-FAQ#1.9>

But it's not as complete as this threaded thread.

-s


Re: Threads

From
Tom Lane
Date:
"Serguei Mokhov" <mokhov@cs.concordia.ca> writes:
>>> (1) One thread screws up, the whole process dies. In a 
>>> multiple process application this is not too much of an issue.

> (1) is an issue only for user-level threads.

Uh, what other kind of thread have you got in mind here?

I suppose the lack-of-cross-thread-protection issue would go away if
our objective was only to use threads for internal parallelism in each
backend instance (ie, you still have one process per connection, but
internally it would use multiple threads to process subqueries in
parallel).

Of course that gives up the hope of faster connection startup that has
always been touted as a major reason to want Postgres to be threaded...
        regards, tom lane


Re: Threads

From
Greg Copeland
Date:
On Fri, 2003-01-03 at 19:34, Tom Lane wrote:
> "Serguei Mokhov" <mokhov@cs.concordia.ca> writes:
> >>> (1) One thread screws up, the whole process dies. In a 
> >>> multiple process application this is not too much of an issue.
> 
> > (1) is an issue only for user-level threads.
> 


Umm.  No.  User or system level threads, the statement is true.  If a
thread kills over, the process goes with it.  Furthermore, on Win32
platforms, it opens a whole can of worms no matter how you care to
address it.

> Uh, what other kind of thread have you got in mind here?
> 
> I suppose the lack-of-cross-thread-protection issue would go away if
> our objective was only to use threads for internal parallelism in each
> backend instance (ie, you still have one process per connection, but
> internally it would use multiple threads to process subqueries in
> parallel).
> 

Several have previously spoken about a hybrid approach (ala Apache). 
IIRC, it was never ruled out but it was simply stated that no one had
the energy to put into such a concept.

> Of course that gives up the hope of faster connection startup that has
> always been touted as a major reason to want Postgres to be threaded...
> 
>             regards, tom lane

Faster startup, should never be the primary reason as there are many
ways to address that issue already.  Connection pooling and caching are
by far, the most common way to address this issue.  Not only that, but
by definition, it's almost an oxymoron.  If you really need high
performance, you shouldn't be using transient connections, no matter how
fast they are.  This, in turn, brings you back to persistent connections
or connection pools/caches.


-- 
Greg Copeland <greg@copelandconsulting.net>
Copeland Computer Consulting



Re: Threads

From
mlw
Date:
<br /><br /> Greg Copeland wrote:<br /><blockquote cite="mid1041646276.15927.202.camel@mouse.copelandconsulting.net"
type="cite"><prewrap=""> </pre><blockquote type="cite"><pre wrap="">Of course that gives up the hope of faster
connectionstartup that has
 
always been touted as a major reason to want Postgres to be threaded...
        regards, tom lane   </pre></blockquote><pre wrap="">
Faster startup, should never be the primary reason as there are many
ways to address that issue already.  Connection pooling and caching are
by far, the most common way to address this issue.  Not only that, but
by definition, it's almost an oxymoron.  If you really need high
performance, you shouldn't be using transient connections, no matter how
fast they are.  This, in turn, brings you back to persistent connections
or connection pools/caches.</pre></blockquote> Connection time should *never* be in the critical path. There, I've said
it!!People who complain about connection time are barking up the wrong tree. Regardless of the methodology, EVERY OS
hasissues with thread creation, process creation, the memory allocation, and system manipulation  required to manage
it.Under load this is ALWAYS slower. <br /><br /> I think that if there is ever a choice, "do I make startup time
faster?"or "Do I make PostgreSQL not need a dump/restore for upgrade" the upgrade problem has a much higher impact to
realPostgreSQL sites.<br /> 

Re: Threads

From
Greg Copeland
Date:
On Fri, 2003-01-03 at 21:39, mlw wrote:
> Connection time should *never* be in the critical path. There, I've
> said it!! People who complain about connection time are barking up the
> wrong tree. Regardless of the methodology, EVERY OS has issues with
> thread creation, process creation, the memory allocation, and system
> manipulation  required to manage it. Under load this is ALWAYS slower.
> 
> I think that if there is ever a choice, "do I make startup time
> faster?" or "Do I make PostgreSQL not need a dump/restore for upgrade"
> the upgrade problem has a much higher impact to real PostgreSQL sites.


Exactly.  Trying to speed up something that shouldn't be in the critical
path is exactly what I'm talking about.

I completely agree with you!


-- 
Greg Copeland <greg@copelandconsulting.net>
Copeland Computer Consulting



Re: Threads

From
Christopher Kings-Lynne
Date:
Also remember that in even well developed OS's like FreeBSD, all a
process's threads will execute only on one CPU.  This might change in
FreeBSD 5.0, but still a threaded app (such as MySQL) cannot use mutliple
CPUs on a FreeBSD system.

Chris

On Fri, 3 Jan 2003, mlw wrote:

> Please no threading threads!!!
>
> Has anyone calculated the interval and period of "PostgreSQL needs
> threads" posts?
>
> The *ONLY* advantage threading has over multiple processes is the time
> and resources used in creating new processes.
>
> That being said, I admit that creating a threaded program is easier than
> one with multiple processes, but PostgreSQL is already there and working.
>
> Drawbacks to a threaded model:
>
> (1) One thread screws up, the whole process dies. In a multiple process
> application this is not too much of an issue.
>
> (2) Heap fragmentation. In a long uptime application, such as a
> database, heap fragmentation is an important consideration. With
> multiple processes, each process manages its own heap and what ever
> fragmentation that exists goes away when the connection is closed.  A
> threaded server is far more vulnerable because the heap has to manage
> many threads and the heap has to stay active and unfragmented in
> perpetuity. This is why Windows applications usually end up using 2G of
> memory after 3 months of use. (Well, this AND memory leaks)
>
> (3) Stack space. In a threaded application they are more limits to stack
> usage. I'm not sure, but I bet PostgreSQL would have a problem with a
> fixed size stack, I know the old ODBC driver did.
>
> (4) Lock Contention. The various single points of access in a process
> have to be serialized for multiple threads. heap allocation,
> deallocation, etc all have to be managed. In a multple process model,
> these resources would be separated by process contexts.
>
> (5) Lastly, why bother? Seriously? Process creation time is an issue
> true, but its an issue with threads as well, just not as bad. Anyone who
> is looking for performance should be using a connection pooling
> mechanism as is done in things like PHP.
>
> I have done both threaded and process servers. The threaded servers are
> easier to write. The process based severs are more robust. From an
> operational point of view, a "select foo from bar where x > y" will take
> he same amount of time.
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org
>



Re: Threads

From
Kaare Rasmussen
Date:
> Umm.  No.  User or system level threads, the statement is true.  If a
> thread kills over, the process goes with it.  Furthermore, on Win32

Hm. This is a database system. If one of the backend processes dies 
unexpectedly, I'm not sure I would trust the consistency and state of the 
others.

Or maybe I'm just being chicken.

-- 
Kaare Rasmussen            --Linux, spil,--        Tlf:        3816 2582
Kaki Data                tshirts, merchandize      Fax:        3816 2501
Howitzvej 75               Åben 12.00-18.00        Email: kar@kakidata.dk
2000 Frederiksberg        Lørdag 12.00-16.00       Web:      www.suse.dk


Re: Threads

From
Greg Copeland
Date:
On Sat, 2003-01-04 at 06:59, Kaare Rasmussen wrote:
> > Umm.  No.  User or system level threads, the statement is true.  If a
> > thread kills over, the process goes with it.  Furthermore, on Win32
> 
> Hm. This is a database system. If one of the backend processes dies 
> unexpectedly, I'm not sure I would trust the consistency and state of the 
> others.
> 
> Or maybe I'm just being chicken.

I'd call that being wise.  That's the problem with using threads. 
Should a thread do something naughty, the state of the entire process is
in question.  This is true regardless if it is a user mode, kernel mode,
or hybrid thread implementation.  That's the power of using the process
model that is currently in use.  Should it do something naughty, we
bitch and complain politely, throw our hands in the air and exit.  We no
longer have to worry about the state and validity of that backend.  This
creates a huge systemic reliability surplus.

This is also why the concept of a hybrid thread/process implementation
keeps coming to the surface on the list.  If you maintain the process
model and only use threads for things that ONLY relate to the single
process (single session/connection), should a thread cause a problem,
you can still throw you hands in the air and exit just as is done now
without causing problems for, or questioning the validity of, other
backends.

The cool thing about such a concept is that it still opens the door for
things like parallel sorts and queries as it relates to a single
backend.


-- 
Greg Copeland <greg@copelandconsulting.net>
Copeland Computer Consulting



Re: Threads

From
Shridhar Daithankar
Date:
On Saturday 04 January 2003 03:20 am, you wrote:
> >I am sure, many of you would like to delete this message before reading,
> > hold on. :-)
>
> I'm afraid most posters did not read the message.  Those who replied
>
> "Why bother?" did not address your challenge:

Our challenges may be..;-) 

Anyway you are absolutely right. Looks like evrybody thought it as an attempt 
to convert postgresql to a thread per connection model.

> >I think threads are useful in difference situations namely parallelising
> >blocking conditions and using multiple CPUs.
>
> This is indeed one of the few good reasons for threads.  Indeed,
> large/robust systems use a mix.
>
> The consensus of the group is that those who do the work are not ready
> for threads.  Which is fine.  Looking into my crystal ball, I see that
> it will happen, though it appears so far away.

I hope it happens and I will be able to contribute to it if I can.
Shridhar


Re: Threads

From
Sailesh Krishnamurthy
Date:
>>>>> "Shridhar" == Shridhar Daithankar <shridhar_daithankar@persistent.co.in> writes:
   Shridhar> On Saturday 04 January 2003 03:20 am, you wrote:   >> >I am sure, many of you would like to delete this
message  >> before reading, > hold on. :-)   >>    >> I'm afraid most posters did not read the message.  Those who   >>
replied  >>    >> "Why bother?" did not address your challenge:
 
   Shridhar> Our challenges may be..;-)

Not having threading does reduce some of the freedom we've been having
in our work. But then we have ripped the process model a fair bit and
we have the freedom of an entirely new process to deal with data
streams entering the system and we're experimenting with threading for
asynchronous I/O there.

However, in general I agree with the spirit of the previous messages
in this thread that threading isn't the main issue for PG.

One thing that I missed so far in the threading thread. Context
switches are (IMHO) far cheaper between threads, because you save TLB
flushes. Whether this makes a real difference in a data intensive
application, I don't know. I wonder how easy it is to measure the x86
counters to see TLB flushes/misses.

In a database system, even if one process dies, I'd be very chary of
trusting it. So I am not too swayed by the fact that a
process-per-connection gets you better isolation. 

BTW, many commercial database systems also use per-process models on
Unix. However they are very aggressive with connection sharing and
reuse - even to the point of reusing the same process for multiple
active connections .. maybe at transaction boundaries. Good when a
connection is maintained for a long duaration with short-lived
transactions separated by fair amouns of time. 

Moreover, in db2 for instance, the same code base is used for both
per-thread and per-process models - in other words, the entire code is
MT-safe, and the scheduling mechanism is treated as a policy (Win32 is
MT, and some Unices MP). AFAICT though, postgres code, such as perhaps
the memory contexts is not MT-safe (of course the bufferpool/shmem
accesses are safe). 

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh


Re: Threads

From
"Ulrich Neumann"
Date:
Hello all,

it's very interesting to see the discussion of "threads" again.

I've portet PostgreSQL to a "thread-per-connection" model based on
pthreads
and it is functional. Most of the work was finding all the static
globals in the sourcefiles
and swapping them between threads and freeing memory if a thread
terminates.
(PostgreSQL isn't written very clean in the aspects of memory
handling).

My version of the thread-based PostgreSQL is not very efficient at the
moment because
I haven't done any optimisation of the code to better support threads
and I'm using just a
simple semaphore to control switching of data but this could be a
starting point for
others who want to see this code. If this direction will be taken
seriously I'm very willing
to help.

If someone is interested in the code I can send a zip file to everyone
who wants.

Ulrich
---------------------------------- This e-mail is virus scanned Diese e-mail ist virusgeprueft



Re: Threads

From
"Shridhar Daithankar"
Date:
On 6 Jan 2003 at 12:22, Ulrich Neumann wrote:

> Hello all,
> If someone is interested in the code I can send a zip file to everyone
> who wants.

I suggest you preserver your work. The reason I suggested thread are mainly two 
folds.

1) Get I/O time used fuitfully
2) Use multiple CPU better.

It will not require as much code cleaning as your efforts might had. However 
your work will be very useful if somebody decides to use thread in any fashion 
in core postgresql.

I was hoping for bit more optimistic response given that what I suggested was 
totally optional at any point of time but very important from performance 
point. Besides the change would have been gradual as required..

Anyway..

ByeShridhar

--
Robot, n.:    University administrator.



Re: Threads

From
Greg Copeland
Date:
On Mon, 2003-01-06 at 05:36, Shridhar Daithankar wrote:
> On 6 Jan 2003 at 12:22, Ulrich Neumann wrote:
> 
> > Hello all,
> > If someone is interested in the code I can send a zip file to everyone
> > who wants.
> 
> I suggest you preserver your work. The reason I suggested thread are mainly two 
> folds.
> 
> 1) Get I/O time used fuitfully


AIO may address this without the need for integrated threading. 
Arguably, from the long thread that last appeared on the topic of AIO,
some hold that AIO doesn't even offer anything beyond the current
implementation.  As such, it's highly doubtful that integrated threading
is going to offer anything beyond what a sound AIO implementation can
achieve.


> 2) Use multiple CPU better.
> 


Multiple processes tend to universally support multiple CPUs better than
does threading.  On some platforms, the level of threading support is
currently only user mode implementations which means no additional CPU
use.  Furthermore, some platforms where user-mode threads are defacto,
they don't even allow for scheduling bias resulting is less work being
accomplished within the same time interval (work slice must be divided
between n-threads within the process, all of which run on a single CPU).


> It will not require as much code cleaning as your efforts might had. However 
> your work will be very useful if somebody decides to use thread in any fashion 
> in core postgresql.
> 
> I was hoping for bit more optimistic response given that what I suggested was 
> totally optional at any point of time but very important from performance 
> point. Besides the change would have been gradual as required..
> 


Speaking for my self, I probably would of been more excited if the
offered framework had addressed several issues.  The short list is:

o Code needs to be more robust.  It shouldn't be calling exit directly
as, I believe, it should be allowing for PostgreSQL to clean up some. 
Correct me as needed.  I would of also expected the code of adopted
PostgreSQL's semantics and mechanisms as needed (error reporting, etc). 
I do understand it was an initial attempt to simply get something in
front of some eyes and have something to talk about.  Just the same, I
was expecting something that we could actually pull the trigger with.

o Code isn't very portable.  Looked fairly okay for pthread platforms,
however, there is new emphasis on the Win32 platform.  I think it would
be a mistake to introduce something as significant as threading without
addressing Win32 from the get-go.

o I would desire a more highly abstracted/portable interface which
allows for different threading and synchronization primitives to be
used.  Current implementation is tightly coupled to pthreads. 
Furthermore, on platforms such as Solaris, I would hope it would easily
allow for plugging in its native threading primitives which are touted
to be much more efficient than pthreads on said platform.

o Code is not commented.  I would hope that adding new code for
something as important as threading would be commented.

o Code is fairly trivial and does not address other primitives
(semaphores, mutexs, conditions, TSS, etc) portably which would be
required for anything but the most trivial of threaded work.  This is
especially true in such an application where data IS the application. 
As such, you must reasonably assume that threads need some form of
portable serialization primitives, not to mention mechanisms for
non-trivial communication.

o Does not address issues such as thread signaling or status reporting.

o Pool interface is rather simplistic.  Does not currently support
concepts such as wake pool, stop pool, pool status, assigning a pool to
work, etc.  In fact, it's not altogether obvious what the capabilities
intent is of the current pool implementation.

o Doesn't seem to address any form of thread communication facilities
(mailboxes, queues, etc).


There are probably other things that I can find if I spend more than
just a couple of minutes looking at the code.  Honestly, I love threads
but I can see that the current code offering is not much more than a
token in its current form.  No offense meant.

After it's all said and done, I'd have to see a lot more meat before I'd
be convinced that threading is ready for PostgreSQL; from both a social
and technological perspective.


Regards,

-- 
Greg Copeland <greg@copelandconsulting.net>
Copeland Computer Consulting



Re: Threads

From
"Shridhar Daithankar"
Date:
On 6 Jan 2003 at 6:48, Greg Copeland wrote:
> > 1) Get I/O time used fuitfully
> AIO may address this without the need for integrated threading. 
> Arguably, from the long thread that last appeared on the topic of AIO,
> some hold that AIO doesn't even offer anything beyond the current
> implementation.  As such, it's highly doubtful that integrated threading
> is going to offer anything beyond what a sound AIO implementation can
> achieve.

Either way, a complete aio or threading implementation is not available on 
major platforms that postgresql runs. Linux definitely does not have one, last 
I checked.

If postgresql is not using aio or threading, we should start using one of them, 
is what I feel. What do you say?

> > 2) Use multiple CPU better.
> Multiple processes tend to universally support multiple CPUs better than
> does threading.  On some platforms, the level of threading support is
> currently only user mode implementations which means no additional CPU
> use.  Furthermore, some platforms where user-mode threads are defacto,
> they don't even allow for scheduling bias resulting is less work being
> accomplished within the same time interval (work slice must be divided
> between n-threads within the process, all of which run on a single CPU).

The frame-work I have posted, threading is optional at build and should be a 
configuration option if it gets integrated. So for the platforms that can not 
spread threads across multiple CPUs, it can simply be turned off..

> Speaking for my self, I probably would of been more excited if the
> offered framework had addressed several issues.  The short list is:
> 
> o Code needs to be more robust.  It shouldn't be calling exit directly
> as, I believe, it should be allowing for PostgreSQL to clean up some. 
> Correct me as needed.  I would of also expected the code of adopted
> PostgreSQL's semantics and mechanisms as needed (error reporting, etc). 
> I do understand it was an initial attempt to simply get something in
> front of some eyes and have something to talk about.  Just the same, I
> was expecting something that we could actually pull the trigger with.

That could be done.

> 
> o Code isn't very portable.  Looked fairly okay for pthread platforms,
> however, there is new emphasis on the Win32 platform.  I think it would
> be a mistake to introduce something as significant as threading without
> addressing Win32 from the get-go.

If you search for "pthread" in thread.c, there are not many instances. Same 
goes for thread.h. From what I understand windows threading, it would be less 
than 10 minutes job to #ifdef the pthread related part on either file.

It is just that I have not played with windows threading and nor I am inclined 
to...;-)

> 
> o I would desire a more highly abstracted/portable interface which
> allows for different threading and synchronization primitives to be
> used.  Current implementation is tightly coupled to pthreads. 
> Furthermore, on platforms such as Solaris, I would hope it would easily
> allow for plugging in its native threading primitives which are touted
> to be much more efficient than pthreads on said platform.

Same as above. If there can be two cases separated with #ifdef, there can be 
more.. But what is important is to have a thread that can be woken up as and 
when required with any function desired. That is the basic idea.

> o Code is not commented.  I would hope that adding new code for
> something as important as threading would be commented.

Agreed. 
> o Code is fairly trivial and does not address other primitives
> (semaphores, mutexs, conditions, TSS, etc) portably which would be
> required for anything but the most trivial of threaded work.  This is
> especially true in such an application where data IS the application. 
> As such, you must reasonably assume that threads need some form of
> portable serialization primitives, not to mention mechanisms for
> non-trivial communication.

I don't get this. Probably I should post a working example. It is not threads 
responsibility to make a function thread safe which is changed on the fly. The 
function has to make sure that it is thread safe. That is altogether different 
effort..
> o Does not address issues such as thread signaling or status reporting.

From what I learnt from pthreads on linux, I would not mix threads and signals. 
One can easily add code in runner function that disables any signals for thread 
while the thread starts running. This would leave original signal handling 
mechanism in place.

As far as status reporting is concerned, the thread sould be initiated while 
back-end starts and terminated with backend termination. What is about status 
reporting?
> o Pool interface is rather simplistic.  Does not currently support
> concepts such as wake pool, stop pool, pool status, assigning a pool to
> work, etc.  In fact, it's not altogether obvious what the capabilities
> intent is of the current pool implementation.

Could you please elaborate? I am using same interface in c++ for a server 
application and never faced a problem like that..;-)

> o Doesn't seem to address any form of thread communication facilities
> (mailboxes, queues, etc).

Not part of this abstraction of threading mechanism. Intentionally left out to 
keep things clean.

> There are probably other things that I can find if I spend more than
> just a couple of minutes looking at the code.  Honestly, I love threads
> but I can see that the current code offering is not much more than a
> token in its current form.  No offense meant.

None taken. Point is it is useful and that is enough for me. If you could 
elaborate examples for any problems you see, I can probably modify it. (Code 
documentation is what I will do now)

> After it's all said and done, I'd have to see a lot more meat before I'd
> be convinced that threading is ready for PostgreSQL; from both a social
> and technological perspective.

Tell me about it..

ByeShridhar

--
What's this script do?    unzip ; touch ; finger ; mount ; gasp ; yes ; umount 
; sleepHint for the answer: not everything is computer-oriented. Sometimes 
you'rein a sleeping bag, camping out.(Contributed by Frans van der Zande.)



Re: Threads

From
Greg Copeland
Date:
On Tue, 2003-01-07 at 02:00, Shridhar Daithankar wrote:
> On 6 Jan 2003 at 6:48, Greg Copeland wrote:
> > > 1) Get I/O time used fuitfully
> > AIO may address this without the need for integrated threading. 
> > Arguably, from the long thread that last appeared on the topic of AIO,
> > some hold that AIO doesn't even offer anything beyond the current
> > implementation.  As such, it's highly doubtful that integrated threading
> > is going to offer anything beyond what a sound AIO implementation can
> > achieve.
> 
> Either way, a complete aio or threading implementation is not available on 
> major platforms that postgresql runs. Linux definitely does not have one, last 
> I checked.
> 

There are two or three significant AIO implementation efforts currently
underway for Linux.  One such implementation is available from the Red
Hat Server Edition (IIRC) and has been available for some time now.  I
believe Oracle is using it.  SGI also has an effort and I forget where
the other one comes from.  Nonetheless, I believe it's going to be a
hard fought battle to get AIO implemented simply because I don't think
anyone, yet, can truly argue a case on the gain vs effort.

> If postgresql is not using aio or threading, we should start using one of them, 
> is what I feel. What do you say?
> 

I did originally say that I'd like to see an AIO implementation.  Then
again, I don't current have a position to stand other than simply saying
it *might* perform better.  ;)  Not exactly a position that's going to
win the masses over.  

> > was expecting something that we could actually pull the trigger with.
> 
> That could be done.
> 

I'm sure it can, but that's probably the easiest item to address.

> > 
> > o Code isn't very portable.  Looked fairly okay for pthread platforms,
> > however, there is new emphasis on the Win32 platform.  I think it would
> > be a mistake to introduce something as significant as threading without
> > addressing Win32 from the get-go.
> 
> If you search for "pthread" in thread.c, there are not many instances. Same 
> goes for thread.h. From what I understand windows threading, it would be less 
> than 10 minutes job to #ifdef the pthread related part on either file.
> 
> It is just that I have not played with windows threading and nor I am inclined 
> to...;-)
> 

Well, the method above is going to create a semi-ugly mess.  I've
written thread abstraction layers which cover OS/2, NT, and pthreads. 
Each have subtle distinction.  What really needs to be done is the
creation of another abstraction layer which your current code would sit
on top of.  That way, everything contained within is clear and easy to
read.  The big bonus is that as additional threading implementations
need to be added, only the "low-level" abstraction stuff needs to
modified.  Done properly, each thread implementation would be it's own
module requiring little #if clutter.

As you can see, that's a fair amount of work and far from where the code
currently is.

> > 
> > o I would desire a more highly abstracted/portable interface which
> > allows for different threading and synchronization primitives to be
> > used.  Current implementation is tightly coupled to pthreads. 
> > Furthermore, on platforms such as Solaris, I would hope it would easily
> > allow for plugging in its native threading primitives which are touted
> > to be much more efficient than pthreads on said platform.
> 
> Same as above. If there can be two cases separated with #ifdef, there can be 
> more.. But what is important is to have a thread that can be woken up as and 
> when required with any function desired. That is the basic idea.
> 

Again, there's a lot of work in creating a well formed abstraction layer
for all of the mechanics that are required.  Furthermore, different
thread implementations have slightly different semantics which further
complicates things.  Worse, some types of primitives are simply not
available with some thread implementations.  That means those platforms
require it to be written from the primitives that are available on the
platform.  Yet more work.


> > o Code is fairly trivial and does not address other primitives
> > (semaphores, mutexs, conditions, TSS, etc) portably which would be
> > required for anything but the most trivial of threaded work.  This is
> > especially true in such an application where data IS the application. 
> > As such, you must reasonably assume that threads need some form of
> > portable serialization primitives, not to mention mechanisms for
> > non-trivial communication.
> 
> I don't get this. Probably I should post a working example. It is not threads 
> responsibility to make a function thread safe which is changed on the fly. The 
> function has to make sure that it is thread safe. That is altogether different 
> effort..


You're right, it's not the thread's responsibility, however, it is the
threading toolkit's.  In this case, you're offering to be the toolkit
which functions across two platforms, just for starters.  Reasonably,
you should expect a third to quickly follow.

>  
> > o Does not address issues such as thread signaling or status reporting.
> 
> >From what I learnt from pthreads on linux, I would not mix threads and signals. 
> One can easily add code in runner function that disables any signals for thread 
> while the thread starts running. This would leave original signal handling 
> mechanism in place.
> 
> As far as status reporting is concerned, the thread sould be initiated while 
> back-end starts and terminated with backend termination. What is about status 
> reporting?
>  
> > o Pool interface is rather simplistic.  Does not currently support
> > concepts such as wake pool, stop pool, pool status, assigning a pool to
> > work, etc.  In fact, it's not altogether obvious what the capabilities
> > intent is of the current pool implementation.
> 
> Could you please elaborate? I am using same interface in c++ for a server 
> application and never faced a problem like that..;-)
> 
>  
> > o Doesn't seem to address any form of thread communication facilities
> > (mailboxes, queues, etc).
> 
> Not part of this abstraction of threading mechanism. Intentionally left out to 
> keep things clean.
> 
> > There are probably other things that I can find if I spend more than
> > just a couple of minutes looking at the code.  Honestly, I love threads
> > but I can see that the current code offering is not much more than a
> > token in its current form.  No offense meant.
> 
> None taken. Point is it is useful and that is enough for me. If you could 
> elaborate examples for any problems you see, I can probably modify it. (Code 
> documentation is what I will do now)
> 
> > After it's all said and done, I'd have to see a lot more meat before I'd
> > be convinced that threading is ready for PostgreSQL; from both a social
> > and technological perspective.
> 
> Tell me about it..
> 

Long story short, if PostgreSQL is to use threads, it shouldn't be
handicapped by having a very limited subset of functionality.  With the
code that has been currently submitted, I don't believe you could even 
effectively implement a parallel sort.

To get an idea of the types of things that would be needed, check out
the ACE Toolkit.  There are a couple of other fairly popular toolkits as
well.  Nonetheless, it's a significant effort and the current code is a
long ways off from being usable.


-- 
Greg Copeland <greg@copelandconsulting.net>
Copeland Computer Consulting



Re: Threads

From
Greg Stark
Date:
Greg Copeland <greg@CopelandConsulting.Net> writes:

> That's the power of using the process model that is currently in use. Should
> it do something naughty, we bitch and complain politely, throw our hands in
> the air and exit. We no longer have to worry about the state and validity of
> that backend.

You missed the point of his post. If one process in your database does
something nasty you damn well should worry about the state of and validity of
the entire database, not just that one backend.

Are you really sure you caught the problem before it screwed up the data in
shared memory? On disk?


This whole topic is in need of some serious FUD-dispelling and careful
analysis. Here's a more calm explanation of the situation on this particular
point. Perhaps I'll follow up with something on IO concurrency later.

The point in consideration here is really memory isolation. Threads by default
have zero isolation between threads. They can all access each other's memory
even including their stack. Most of that memory is in fact only needed by a
single thread. 

Processes by default have complete memory isolation. However postgres actually
weakens that by doing a lot of work in a shared memory pool. That memory gets
exactly the same protection as it would get in a threaded model, which is to
say none.

So the reality is that if you have a bug most likely you've only corrupted the
local data which can be easily cleaned up either way. In the thread model
there's also the unlikely but scary risk that you've damaged other threads'
memory. And in either case there's the possibility that you've damaged the
shared pool which is unrecoverable.

In theory minimising the one case of corrupting other threads' local data
shouldn't make a big difference to the risk in the case of an assertion
failure. I'm not sure in practice if that's true though. Processes probably
reduce the temptation to do work in the shared area too.

--
greg



Re: Threads

From
Greg Copeland
Date:
On Tue, 2003-01-07 at 12:21, Greg Stark wrote:
> Greg Copeland <greg@CopelandConsulting.Net> writes:
> 
> > That's the power of using the process model that is currently in use. Should
> > it do something naughty, we bitch and complain politely, throw our hands in
> > the air and exit. We no longer have to worry about the state and validity of
> > that backend.
> 
> You missed the point of his post. If one process in your database does
> something nasty you damn well should worry about the state of and validity of
> the entire database, not just that one backend.
> 

I can assure you I did not miss the point.  No idea why you're
continuing to spell it out.  In this case, it appears the quotation is
being taken out of context or it was originally stated in an improper
context.

> Are you really sure you caught the problem before it screwed up the data in
> shared memory? On disk?
> 
> 
> This whole topic is in need of some serious FUD-dispelling and careful
> analysis. Here's a more calm explanation of the situation on this particular
> point. Perhaps I'll follow up with something on IO concurrency later.
> 


Hmmm.  Not sure what needs to be dispelled since I've not seen any FUD.


> The point in consideration here is really memory isolation. Threads by default
> have zero isolation between threads. They can all access each other's memory
> even including their stack. Most of that memory is in fact only needed by a
> single thread. 
> 

Again, this has been covered already.


> Processes by default have complete memory isolation. However postgres actually
> weakens that by doing a lot of work in a shared memory pool. That memory gets
> exactly the same protection as it would get in a threaded model, which is to
> say none.
> 

Again, this has all been covered, more or less.  You're comments seem to
imply that you did not fully read what has been said on the topic thus
far or that you misunderstood something that was said.  Of course, it's
also possible that I may of said something out of it's proper context
which may be confusing you.

I think it's safe to say I don't have any further comment unless
something new is being brought to the table.  Should there be something
new to cover, I'm happy to talk about it.  At this point, however, it
appears that it's been beat to death already.


-- 
Greg Copeland <greg@copelandconsulting.net>
Copeland Computer Consulting



Re: Threads

From
Tom Lane
Date:
Greg Stark <gsstark@mit.edu> writes:
> You missed the point of his post. If one process in your database does
> something nasty you damn well should worry about the state of and validity of
> the entire database, not just that one backend.

Right.  And in fact we do blow away all the processes when any one of
them crashes or panics.  Nonetheless, memory isolation between processes
is a Good Thing, because it reduces the chances that a process gone
wrong will cause damage via other processes before they can be shut
down.

Here is a simple example of a scenario where that isolation buys us
something: suppose that we have a bug that tromps on memory starting at
some point X until it falls off the sbrk boundary and dumps core.
(There are plenty of ways to make that happen, such as miscalculating
the length of a memcpy or memset operation as -1.)  Such a bug causes
no serious damage in isolation, because the process suffering the
failure will be in a tight data-copying or data-zeroing loop until it
gets the SIGSEGV exception.  It won't do anything bad based on all the
data structures it has clobbered during its march to the end of memory.

However, put that same bug in a multithreading context, and it becomes
entirely possible that some other thread will be dispatched and will
try to make use of already-clobbered data structures before the ultimate
SIGSEGV exception happens.  Now you have the potential for unlimited
trouble.

In general, isolation buys you some safety anytime there is a delay
between the occurrence of a failure and its detection.

> Processes by default have complete memory isolation. However postgres
> actually weakens that by doing a lot of work in a shared memory
> pool. That memory gets exactly the same protection as it would get in
> a threaded model, which is to say none.

Yes.  We try to minimize the risk by keeping the shared memory pool
relatively small and not doing more than we have to in it.  (For
example, this was one of the arguments against creating a shared plan
cache.)  It's also very helpful that in most platforms, shared memory
is not address-wise contiguous to normal memory; thus for example a
process caught in a memset death march will hit a SIGSEGV before it
gets to the shared memory block.

It's interesting to note that this can be made into an argument for
not making shared_buffers very large: the larger the fraction of your
address space that the shared buffers occupy, the larger the chance
that a wild store will overwrite something you'd wish it didn't.
I can't recall anyone having made that point during our many discussions
of appropriate shared_buffer sizing.

> So the reality is that if you have a bug most likely you've only corrupted the
> local data which can be easily cleaned up either way. In the thread model
> there's also the unlikely but scary risk that you've damaged other threads'
> memory. And in either case there's the possibility that you've damaged the
> shared pool which is unrecoverable.

In a thread model, *most* of the accessible memory space would be shared
with other threads, at least potentially.  So I think you're wrong to
categorize the second case as unlikely.
        regards, tom lane


Re: Threads

From
Curt Sampson
Date:
On Sat, 4 Jan 2003, Christopher Kings-Lynne wrote:

> Also remember that in even well developed OS's like FreeBSD, all a
> process's threads will execute only on one CPU.

I would say that that's not terribly well developed. Solaris will split
a single processes' threads over multiple CPUs, and I expect most other
major vendors Unixes will as well. In the world of free software, the
next release of NetBSD will do the same. (The scheduler activations
system, which support m userland to n kernel threads mapping, was
recently merged from its branch into NetBSD-current.)

From my experience, threaded sorts would be a big win. I managed to
shave index generation time for a large table from about 12 hours to
about 8 hours by generating two indices in parallel after I'd added a
primary key to the table. It would have been much more of a win to be
able to generate the primary key followed by other indexes with parallel
sorts rather than having to generate the primary key on one CPU (while
the other remains idle), wait while that completes, generate two more
indices, and then generate the last one .

cjs
-- 
Curt Sampson  <cjs@cynic.net>   +81 90 7737 2974   http://www.netbsd.org   Don't you know, in this new Dark Age, we're
alllight.  --XTC
 


Re: Threads

From
Steve Wampler
Date:
On Sat, 4 Jan 2003, Christopher Kings-Lynne wrote:
> 
> Also remember that in even well developed OS's like FreeBSD, all a
> process's threads will execute only on one CPU.

I doubt that - it certainly isn't the case on Linux and Solaris.
A thread may *start* execution on the same CPU as it's parent, but
native threads are not likely to be constrained to a specific CPU
with an SMP OS.

-- 
Steve Wampler <swampler@noao.edu>
National Solar Observatory


Re: Threads

From
Shridhar Daithankar
Date:
On Thursday 23 January 2003 08:42 pm, you wrote:
> On Sat, 4 Jan 2003, Christopher Kings-Lynne wrote:
> > Also remember that in even well developed OS's like FreeBSD, all a
> > process's threads will execute only on one CPU.
>
> I doubt that - it certainly isn't the case on Linux and Solaris.
> A thread may *start* execution on the same CPU as it's parent, but
> native threads are not likely to be constrained to a specific CPU
> with an SMP OS.

I am told that linuxthreads port available on freeBSD uses rfork and is 
capable of using multiple CPUs within a single process.

Native freeBSD threads can not do that. Need to check that with freeBSD5.0.
Shridhar




Re: Threads

From
Greg Copeland
Date:
On Thu, 2003-01-23 at 09:12, Steve Wampler wrote:
> On Sat, 4 Jan 2003, Christopher Kings-Lynne wrote:
> > 
> > Also remember that in even well developed OS's like FreeBSD, all a
> > process's threads will execute only on one CPU.
> 
> I doubt that - it certainly isn't the case on Linux and Solaris.
> A thread may *start* execution on the same CPU as it's parent, but
> native threads are not likely to be constrained to a specific CPU
> with an SMP OS.

You are correct.  When spawning additional threads, should an idle CPU
be available, it's very doubtful that the new thread will show any bias
toward the original thread's CPU.  Most modern OS's do run each thread
within a process spread across n-CPUs.  Those that don't are probably
attempting to modernize as we speak.

-- 
Greg Copeland <greg@copelandconsulting.net>
Copeland Computer Consulting



Re: Threads

From
mlw
Date:
Greg Copeland wrote:<br /><blockquote cite="mid1043344940.2714.4.camel@mouse.copelandconsulting.net" type="cite"><pre
wrap="">OnThu, 2003-01-23 at 09:12, Steve Wampler wrote: </pre><blockquote type="cite"><pre wrap="">On Sat, 4 Jan 2003,
ChristopherKings-Lynne wrote:   </pre><blockquote type="cite"><pre wrap="">Also remember that in even well developed
OS'slike FreeBSD, all a
 
process's threads will execute only on one CPU.     </pre></blockquote><pre wrap="">I doubt that - it certainly isn't
thecase on Linux and Solaris.
 
A thread may *start* execution on the same CPU as it's parent, but
native threads are not likely to be constrained to a specific CPU
with an SMP OS.   </pre></blockquote><pre wrap="">
You are correct.  When spawning additional threads, should an idle CPU
be available, it's very doubtful that the new thread will show any bias
toward the original thread's CPU.  Most modern OS's do run each thread
within a process spread across n-CPUs.  Those that don't are probably
attempting to modernize as we speak</pre></blockquote> AFAIK, FreeBSD is one of the OSes that are trying to modernize.
LastI looked it did not have kernel threads.<br /><blockquote
cite="mid1043344940.2714.4.camel@mouse.copelandconsulting.net"type="cite"><pre wrap=""> </pre></blockquote>