Re: Threads - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Threads
Date
Msg-id 19749.1041966113@sss.pgh.pa.us
Whole thread Raw
In response to Re: Threads  (Greg Stark <gsstark@mit.edu>)
List pgsql-hackers
Greg Stark <gsstark@mit.edu> writes:
> You missed the point of his post. If one process in your database does
> something nasty you damn well should worry about the state of and validity of
> the entire database, not just that one backend.

Right.  And in fact we do blow away all the processes when any one of
them crashes or panics.  Nonetheless, memory isolation between processes
is a Good Thing, because it reduces the chances that a process gone
wrong will cause damage via other processes before they can be shut
down.

Here is a simple example of a scenario where that isolation buys us
something: suppose that we have a bug that tromps on memory starting at
some point X until it falls off the sbrk boundary and dumps core.
(There are plenty of ways to make that happen, such as miscalculating
the length of a memcpy or memset operation as -1.)  Such a bug causes
no serious damage in isolation, because the process suffering the
failure will be in a tight data-copying or data-zeroing loop until it
gets the SIGSEGV exception.  It won't do anything bad based on all the
data structures it has clobbered during its march to the end of memory.

However, put that same bug in a multithreading context, and it becomes
entirely possible that some other thread will be dispatched and will
try to make use of already-clobbered data structures before the ultimate
SIGSEGV exception happens.  Now you have the potential for unlimited
trouble.

In general, isolation buys you some safety anytime there is a delay
between the occurrence of a failure and its detection.

> Processes by default have complete memory isolation. However postgres
> actually weakens that by doing a lot of work in a shared memory
> pool. That memory gets exactly the same protection as it would get in
> a threaded model, which is to say none.

Yes.  We try to minimize the risk by keeping the shared memory pool
relatively small and not doing more than we have to in it.  (For
example, this was one of the arguments against creating a shared plan
cache.)  It's also very helpful that in most platforms, shared memory
is not address-wise contiguous to normal memory; thus for example a
process caught in a memset death march will hit a SIGSEGV before it
gets to the shared memory block.

It's interesting to note that this can be made into an argument for
not making shared_buffers very large: the larger the fraction of your
address space that the shared buffers occupy, the larger the chance
that a wild store will overwrite something you'd wish it didn't.
I can't recall anyone having made that point during our many discussions
of appropriate shared_buffer sizing.

> So the reality is that if you have a bug most likely you've only corrupted the
> local data which can be easily cleaned up either way. In the thread model
> there's also the unlikely but scary risk that you've damaged other threads'
> memory. And in either case there's the possibility that you've damaged the
> shared pool which is unrecoverable.

In a thread model, *most* of the accessible memory space would be shared
with other threads, at least potentially.  So I think you're wrong to
categorize the second case as unlikely.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Greg Copeland
Date:
Subject: Re: Threads
Next
From: "Marc G. Fournier"
Date:
Subject: Re: [pgsql-advocacy] Thank-you to Cybertec Geschwinde &