Thread: Multithread Query Planner
Hi folks. Is there any restriction in create and start threads inside Postgres? I'm trying to develop a multithread planner, and some times is raised a exception of access memory. I'm debugging the code to see if is a bug in the planner, but until now, I still not found. I tried to use the same memorycontext of root process and create a new context to each new thread, but doesn't worked. Any tips? Att, Fred Enviado via iPad
On Fri, Jan 13, 2012 at 3:14 PM, Frederico <zepfred@gmail.com> wrote: > Hi folks. > > Is there any restriction in create and start threads inside Postgres? > > I'm trying to develop a multithread planner, and some times is raised a exception of access memory. > > I'm debugging the code to see if is a bug in the planner, but until now, I still not found. I tried to use the same memorycontext of root process and create a new context to each new thread, but doesn't worked. > > > Any tips? Yes, don't try to use threads. <http://wiki.postgresql.org/wiki/Developer_FAQ#Why_don.27t_you_use_threads.2C_raw_devices.2C_async-I.2FO.2C_.3Cinsert_your_favorite_wizz-bang_feature_here.3E.3F> ... threads are not currently used instead of multiple processes for backends because: Historically, threads were poorly supported and buggy. An error in one backend can corrupt other backends if they're threads within a single process Speed improvements using threads are small compared to the remaining backend startup time. The backend code would be more complex. Terminating backend processes allows the OS tocleanly and quickly free all resources, protecting against memory and file descriptor leaks and making backend shutdown cheaper and faster Debugging threaded programs is much harder than debugging worker processes, and core dumps are much less useful Sharing of read-only executable mappings and the use of shared_buffers means processes, like threads, are very memory efficient Regular creation and destruction of processes helps protect against memory fragmentation, which can be hard to manage in long-running processes There's a pretty large burden of reasons *not* to use threads, and while some of them have diminished in importance, most have not. -- When confronted by a difficult problem, solve it by reducing it to the question, "How would the Lone Ranger handle this?"
Christopher Browne <cbbrowne@gmail.com> writes: > Yes, don't try to use threads. > > <http://wiki.postgresql.org/wiki/Developer_FAQ#Why_don.27t_you_use_threads.2C_raw_devices.2C_async-I.2FO.2C_.3Cinsert_your_favorite_wizz-bang_feature_here.3E.3F> > > ... threads are not currently used instead of multiple processes for > backends because: I would only add that the backend code is really written in a process based perspective, with a giant number of private variables that are in fact global variables. Trying to “clean” that out in order to get to threads… wow. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
This means it's possible use threads? Att, Fred Enviado via iPad Em 13/01/2012, às 20:47, Dimitri Fontaine <dimitri@2ndQuadrant.fr> escreveu: > Christopher Browne <cbbrowne@gmail.com> writes: >> Yes, don't try to use threads. >> >> <http://wiki.postgresql.org/wiki/Developer_FAQ#Why_don.27t_you_use_threads.2C_raw_devices.2C_async-I.2FO.2C_.3Cinsert_your_favorite_wizz-bang_feature_here.3E.3F> >> >> ... threads are not currently used instead of multiple processes for >> backends because: > > I would only add that the backend code is really written in a process > based perspective, with a giant number of private variables that are in > fact global variables. > > Trying to “clean” that out in order to get to threads… wow. > > Regards, > -- > Dimitri Fontaine > http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Frederico <zepfred@gmail.com> writes: > This means it's possible use threads? The short answer is “no”. -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On 13 January 2012 20:14, Frederico <zepfred@gmail.com> wrote: > I'm trying to develop a multithread planner, and some times is raised a exception of access memory. I was a bit confused about what you are trying to do -- somehow use concurrency during the planning phase, or during execution (maybe having produced concurrency-aware plans)? Here is my naive thought: Since threads are not really an option as explained by others, you could use helper processes to implement executor concurrency, by replacing nodes with proxies that talk to helper processes (perhaps obtained from a per-cluster pool). The proxy nodes would send their child subplans and the information needed to get the appropriate snapshot, and receive tuples via some kind of IPC (perhaps shmem-backed queues or pipes or whatever). A common use case in other RDBMSs is running queries over multiple partitions using parallelism. In the above scheme that could be done if the children of Append nodes were candidates for emigration to helper processes. OTOH there are some plans produced by UNION and certain kinds of OR that could probably benefit. There may be some relevant stuff in PostgreSQL-XC?
On 2012-01-13 21:14, Frederico wrote: > Hi folks. > > Is there any restriction in create and start threads inside Postgres? > > I'm trying to develop a multithread planner, and some times is raised a exception of access memory. > > I'm debugging the code to see if is a bug in the planner, but until now, I still not found. I tried to use the same memorycontext of root process and create a new context to each new thread, but doesn't worked. > > > Any tips? Not sure if it is of any use to you, but the vldb paper 'Parallelizing Query Optimization' http://www.vldb.org/pvldb describes a experimental implementation in PostgreSQL. regards, Yeb
On Fri, Jan 13, 2012 at 2:29 PM, Christopher Browne <cbbrowne@gmail.com> wrote: > On Fri, Jan 13, 2012 at 3:14 PM, Frederico <zepfred@gmail.com> wrote: >> Hi folks. >> >> Is there any restriction in create and start threads inside Postgres? >> >> I'm trying to develop a multithread planner, and some times is raised a exception of access memory. >> >> I'm debugging the code to see if is a bug in the planner, but until now, I still not found. I tried to use the same memorycontext of root process and create a new context to each new thread, but doesn't worked. >> >> >> Any tips? > > Yes, don't try to use threads. > > <http://wiki.postgresql.org/wiki/Developer_FAQ#Why_don.27t_you_use_threads.2C_raw_devices.2C_async-I.2FO.2C_.3Cinsert_your_favorite_wizz-bang_feature_here.3E.3F> > > ... threads are not currently used instead of multiple processes for > backends because: Yes, but OP is proposing to use multiple threads inside the forked execution process. That's a completely different beast. Many other databases support parallel execution of a single query and it might very well be better/easier to do that with threads. merlin
On Mon, Jan 23, 2012 at 2:45 PM, Merlin Moncure <mmoncure@gmail.com> wrote: > Yes, but OP is proposing to use multiple threads inside the forked > execution process. That's a completely different beast. Many other > databases support parallel execution of a single query and it might > very well be better/easier to do that with threads. I doubt it. Almost nothing in the backend is thread-safe. You can't acquire a heavyweight lock, a lightweight lock, or a spinlock. You can't do anything that might elog() or ereport(). None of those things are reentrant. Consequently, you can't do anything that involves reading or pinning a buffer, making a syscache lookup, or writing WAL. You can't even do something like parallelize the qsort() of a chunk of data that's already been read into a private buffer... because you'd have to call the comparison functions for the data type, and they might elog() or ereport(). Of course, in certain special cases (like int4) you could make it safe, but it's hard for to imagine anyone wanting to go to that amount of effort for such a small payoff. If we're going to do parallel query in PG, and I think we are going to need to do that eventually, we're going to need a system where large chunks of work can be handed off, as in the oft-repeated example of parallelizing an append node by executing multiple branches concurrently. That's where the big wins are. And that means either overhauling the entire backend to make it thread-safe, or using multiple backends. The latter will be hard, but it'll still be a lot easier than the former. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > I doubt it. Almost nothing in the backend is thread-safe. You can't > acquire a heavyweight lock, a lightweight lock, or a spinlock. You > can't do anything that might elog() or ereport(). None of those > things are reentrant. Not to mention palloc, another extremely fundamental and non-reentrant subsystem. Possibly we could work on making all that stuff re-entrant, but it would be a huge amount of work for a distant and uncertain payoff. regards, tom lane
On Tue, Jan 24, 2012 at 11:25 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> I doubt it. Almost nothing in the backend is thread-safe. You can't >> acquire a heavyweight lock, a lightweight lock, or a spinlock. You >> can't do anything that might elog() or ereport(). None of those >> things are reentrant. > > Not to mention palloc, another extremely fundamental and non-reentrant > subsystem. > > Possibly we could work on making all that stuff re-entrant, but it would > be a huge amount of work for a distant and uncertain payoff. Right. I think it makes more sense to try to get parallelism working first with the infrastructure we have. Converting to use threading, if we ever do it at all, should be something we view as a later performance optimization. But I suspect we won't want to do it anyway; I think there will be easier ways to get where we want to be. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
>> Not to mention palloc, another extremely fundamental and non-reentrant >> subsystem. >> >> Possibly we could work on making all that stuff re-entrant, but it would >> be a huge amount of work for a distant and uncertain payoff. > Right. I think it makes more sense to try to get parallelism working > first with the infrastructure we have. Converting to use threading, > if we ever do it at all, should be something we view as a later > performance optimization. But I suspect we won't want to do it > anyway; I think there will be easier ways to get where we want to be. Multithreading got fashionable with the arrival of the Dual-core CPU a few years ago. However, multithreading as it is used currently has a huge problem : usually, threads share all of their memory. This opens the door to an infinite number of hard to find bugs, and more importantly, defeats the purpose. "Re-entrant palloc()" is nonsense. Suppose you can make a reentrant palloc() which scales OK at 2 threads thanks to a cleverly placed atomic instruction. How is it going to scale on 64 cores ? On HP's new 1000-core ARM server with non-uniform memory access ? Probably it would suck very very badly... not to mention the horror of multithreaded exception-safe deallocation when 1 thread among many blows up on an error... For the ultimate in parallelism, ask a FPGA guy. Is he using shared memory to wire together his 12000 DSP blocks ? Nope, he's using isolated Processes which share nothing and communicate through FIFOs and hardware message passing. Like shell pipes, basically. Or Erlang. Good parallelism = reduce shared state and communicate through data/message channels. Shared-everything multithreading is going to be in a lot of trouble on future many-core machines. Incidentally, Postgres, with its Processes, sharing only what is needed, has a good head start... With more and more cores coming, you guys are going to have to fight to reduce the quantity of shared state between processes, not augment it by using shared memory threads !... Say you want to parallelize sorting. Sorting is a black-box with one input data pipe and one output data pipe. Data pipes are good for parallelism, just like FIFOs. FPGA guys love black boxes with FIFOs between them. Say you manage to send tuples through a FIFO like zeromq. Now you can even run the sort on another machine and allow it to use all the RAM if you like. Now split the black box in two black boxes (qsort and merge), instanciate as many qsort boxes as necessary, and connect that together with pipes. Run some boxes on some of this machine's cores, some other boxes on another machine, etc. That would be very flexible (and scalable). Of course the black box has a small backdoor : some comparison functions can access shared state, which is basically *the* issue (not reentrant stuff, which you do not need).
Ok, thanks.
Att,
Fred
2012/1/24 Robert Haas <robertmhaas@gmail.com>
Right. I think it makes more sense to try to get parallelism workingOn Tue, Jan 24, 2012 at 11:25 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> I doubt it. Almost nothing in the backend is thread-safe. You can't
>> acquire a heavyweight lock, a lightweight lock, or a spinlock. You
>> can't do anything that might elog() or ereport(). None of those
>> things are reentrant.
>
> Not to mention palloc, another extremely fundamental and non-reentrant
> subsystem.
>
> Possibly we could work on making all that stuff re-entrant, but it would
> be a huge amount of work for a distant and uncertain payoff.
first with the infrastructure we have. Converting to use threading,
if we ever do it at all, should be something we view as a later
performance optimization. But I suspect we won't want to do it
anyway; I think there will be easier ways to get where we want to be.
On Fri, Jan 27, 2012 at 2:56 PM, Pierre C <lists@peufeu.com> wrote: >> Right. I think it makes more sense to try to get parallelism working >> first with the infrastructure we have. Converting to use threading, >> if we ever do it at all, should be something we view as a later >> performance optimization. But I suspect we won't want to do it >> anyway; I think there will be easier ways to get where we want to be. > > Multithreading got fashionable with the arrival of the Dual-core CPU a few > years ago. However, multithreading as it is used currently has a huge > problem : usually, threads share all of their memory. This opens the door to > an infinite number of hard to find bugs, and more importantly, defeats the > purpose. > > "Re-entrant palloc()" is nonsense. Suppose you can make a reentrant palloc() > which scales OK at 2 threads thanks to a cleverly placed atomic instruction. > How is it going to scale on 64 cores ? On HP's new 1000-core ARM server with > non-uniform memory access ? Probably it would suck very very badly... not to > mention the horror of multithreaded exception-safe deallocation when 1 > thread among many blows up on an error... There are academic papers out there on how to build a thread-safe, highly concurrent memory allocator. You seem to be assuming that everyone doing allocations would need to compete for access to a single freelist, or something like that, which is simply not true. A lot of effort and study has been put into figuring out how to get past bottlenecks in this area, because there is a lot of multi-threaded code out there that needs to surmount these problems. I don't believe that the problem is that it can't be done, but rather that we haven't done it. > For the ultimate in parallelism, ask a FPGA guy. Is he using shared memory > to wire together his 12000 DSP blocks ? Nope, he's using isolated Processes > which share nothing and communicate through FIFOs and hardware message > passing. Like shell pipes, basically. Or Erlang. I'm not sure we can conclude much from this example. The programming style of people using FPGAs is probably governed by the nature of the interface and the type of computation they are doing rather than anything else. > Good parallelism = reduce shared state and communicate through data/message > channels. > > Shared-everything multithreading is going to be in a lot of trouble on > future many-core machines. Incidentally, Postgres, with its Processes, > sharing only what is needed, has a good head start... > > With more and more cores coming, you guys are going to have to fight to > reduce the quantity of shared state between processes, not augment it by > using shared memory threads !... I do agree that it's important to reduce shared state. We've seen some optimizations this release cycle that work precisely because they cut down on the rate at which cache lines must be passed between cores, and it's pretty clear that we need to go farther in that direction. On the other hand, I think it's a mistake to confuse the programming model with the amount of shared state. In a multi-threaded programming model there is likely to be a lot more memory that is technically "shared" in the sense that any thread could technically access it. But if the application is coded in such a way that actual sharing is minimal, then it's not necessarily any worse than a process model as far as concurrency is concerned. Threading provides a couple of key advantages which, with our process model, we can't get: it avoids the cost of a copy-on-write operation every time a child is forked, and it allows arbitrary amounts of memory rather than being limited to a single shared memory segment that must be sized in advance. The major disadvantage is really with robustness, not performance, I think: in a threaded environment, with a shared address space, the consequences of a random memory stomp will be less predictable. > Say you want to parallelize sorting. > Sorting is a black-box with one input data pipe and one output data pipe. > Data pipes are good for parallelism, just like FIFOs. FPGA guys love black > boxes with FIFOs between them. > > Say you manage to send tuples through a FIFO like zeromq. Now you can even > run the sort on another machine and allow it to use all the RAM if you like. > Now split the black box in two black boxes (qsort and merge), instanciate as > many qsort boxes as necessary, and connect that together with pipes. Run > some boxes on some of this machine's cores, some other boxes on another > machine, etc. That would be very flexible (and scalable). > > Of course the black box has a small backdoor : some comparison functions can > access shared state, which is basically *the* issue (not reentrant stuff, > which you do not need). Well, you do need reentrant stuff, if the comparator does anything non-trivial. It's easy to imagine that comparing strings or dates or whatever is a trivial operation that's done without allocating any memory or throwing any errors, but it's not really true. I think the challenge of using GPU acceleration or JIT or threading or other things that are used in really high-performance computing is going to be that a lot of our apparently-trivial operations are actually, well, not so trivial, because we have error checks, overflow checks, nontrivial encoding/decoding from the on-disk format, etc. There's a tendency to wave that stuff away as peripheral, but I think that's a mistake. Someone who knows how to do it can probably write a muti-threaded, just-in-time-compiled, and/or GPU-accelerated program in an afternoon that solves pretty complex problems much more quickly than PostgreSQL, but doing it without throwing all the error checks and on numeric as well as int4 and in a way that's portable to every architecture we support - ah, well, there's the hard part. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company