Thread: Threaded PosgreSQL server
Are there any plans to merge the sources from the experimental threaded server and the forked server so that a compile switch could choose the model?
If someone wanted to submit appropriate patches for the v7.3 development tree, that merge cleanly, I can't see why this wouldn't be a good thing ... On Mon, 4 Feb 2002, Dann Corbit wrote: > Are there any plans to merge the sources from the experimental threaded > server and the forked server so that a compile switch could choose the > model? > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) >
I would love to see this happen but they are already quite different and drifting further apart every day. I am trying integrate parts of the real PostgreSQL into threaded postgres as time permits. I think threaded postgres could serve as a vehicle for testing the relative value of using threads, but trying to merge patches would be a major task. I found the interesting marketing white paper the covers PostgreSQL, Illustra, Informix, DSA ( using threads ), and Datablade extensions. If nothing else, it shows that PostgreSQL extension model can be used in threaded environment. www.databaseassociates.com/pdf/infobj.pdf Myron Scott mkscott@sacadia.com On Mon, 4 Feb 2002, Marc G. Fournier wrote: > > If someone wanted to submit appropriate patches for the v7.3 development > tree, that merge cleanly, I can't see why this wouldn't be a good thing > ... > > > On Mon, 4 Feb 2002, Dann Corbit wrote: > > > Are there any plans to merge the sources from the experimental threaded > > server and the forked server so that a compile switch could choose the > > model? > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 2: you can get off all lists at once with the unregister command > > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) >
I would have to contend that the two will never been merged into one source base. If the threaded server is done correctly, then many of the internal structures and logic will be radically different. I have to commend Mr. Scott for continuing on with this work when it was pretty obvious from previous discussions that this would not be "well received". On Mon, 4 Feb 2002 mkscott@sacadia.com wrote: > > > I would love to see this happen but they are already quite different and > drifting further apart every day. I am trying integrate parts of the real > PostgreSQL into threaded postgres as time permits. > > I think threaded postgres could serve as a vehicle for testing the > relative value of using threads, but trying to merge patches would be a > major task. I found the interesting marketing white paper the covers > PostgreSQL, Illustra, Informix, DSA ( using threads ), and Datablade > extensions. If > nothing else, it shows that PostgreSQL extension model can be used in > threaded environment. > > www.databaseassociates.com/pdf/infobj.pdf > > Myron Scott > mkscott@sacadia.com > > > On Mon, 4 Feb 2002, Marc G. Fournier wrote: > > > > > If someone wanted to submit appropriate patches for the v7.3 development > > tree, that merge cleanly, I can't see why this wouldn't be a good thing > > ... > > > > > > On Mon, 4 Feb 2002, Dann Corbit wrote: > > > > > Are there any plans to merge the sources from the experimental threaded > > > server and the forked server so that a compile switch could choose the > > > model? > > > -- //========================================================\\ || D. Hageman <dhageman@dracken.com> || \\========================================================//
> If someone wanted to submit appropriate patches for the v7.3 development > tree, that merge cleanly, I can't see why this wouldn't be a good thing > ... I thought that the one thread instead of one process per client model would only be an advantage for the "native Windows port" ? Imho a useful threaded model on unix would involve a separation of threads and clients. ( 1 CPU thread per physical CPU, several IO threads) But that would involve a complete redesign. Andreas > > Are there any plans to merge the sources from the experimental threaded > > server and the forked server so that a compile switch could choose the > > model?
"Marc G. Fournier" <scrappy@hub.org> writes: > If someone wanted to submit appropriate patches for the v7.3 development > tree, that merge cleanly, I can't see why this wouldn't be a good thing > ... I would resist it. I do not think we need the portability and reliability headaches that would come with it. Furthermore, an #ifdef'd implementation would be the worst of all possible worlds, as it would do major damage to readability of the code. regards, tom lane
Dann Corbit wrote: > > Are there any plans to merge the sources from the experimental threaded > server and the forked server so that a compile switch could choose the > model? Just a question, in order to elighten my thought. Does the current experimental threaded server disable multi-process model? Or does it *add* the functionality as a compile switch? (This would be the other way round as the one you pointed out.) I think it is important as to evaluate resistance to go multithreading. If they disabled the original method, I agree with Tom. If they *merged* both flawlessly, I would try to consider it for the current tree. Any comments? Regards, Haroldo.
On Tue, 5 Feb 2002, Haroldo Stenger wrote: > Dann Corbit wrote: > > > > Are there any plans to merge the sources from the experimental threaded > > server and the forked server so that a compile switch could choose the > > model? > > Just a question, in order to elighten my thought. Does the current experimental > threaded server disable multi-process model? Or does it *add* the functionality > as a compile switch? (This would be the other way round as the one you pointed > out.) > > I think it is important as to evaluate resistance to go multithreading. > > If they disabled the original method, I agree with Tom. If they *merged* both > flawlessly, I would try to consider it for the current tree. > > Any comments? That's kinda what I was hoping ... is it something that could be seamlessly integrated to have minimal impact on the code itself ... even if there was some way of having a 'thread.c' vs 'non-thread.c' that could be link'd in, with wrapper functions? Tha again, has anyone looked at the apache project? Apache2 has several "process models" ... prefork being one (like ours), or a 'worker', which is a prefork/threaded model where you can have n child processes, with m 'threads' inside of each ... not sure if something like that coul be retrofit'd into what we have, but ... ?
-----Original Message----- From: Marc G. Fournier [mailto:scrappy@hub.org] Sent: Tuesday, February 05, 2002 11:37 AM To: Haroldo Stenger Cc: Dann Corbit; Tom Lane; pgsql-hackers@postgresql.org Subject: Re: [HACKERS] Threaded PosgreSQL server [snip] > That's kinda what I was hoping ... is it something that could be > seamlessly integrated to have minimal impact on the code itself ... even > if there was some way of having a 'thread.c' vs 'non-thread.c' that could > be link'd in, with wrapper functions? > Tha again, has anyone looked at the apache project? Apache2 has several > "process models" ... prefork being one (like ours), or a 'worker', which > is a prefork/threaded model where you can have n child processes, with m > 'threads' inside of each ... not sure if something like that coul be > retrofit'd into what we have, but ... ? It could be done, but it might be an effort. As an example the ACE project: http://www.cs.wustl.edu/~schmidt/ACE.html has a number of easily selected threading models. It is also portable to an enormous number of platforms (including all flavors of UNIX). However, it is C++ rather than C, and so that particular transition would probably be pretty traumatic if someone tried to use ACE as a toolset. But at least it does demonstrate that such a thing is feasible. As a "for instance" you can look at the Jaws web server (which is both open source and very much faster than the Apache server). It can easily be built with many different threading models.
> > On Tue, 5 Feb 2002, Haroldo Stenger wrote: > > > Just a question, in order to elighten my thought. Does the current experimental > > threaded server disable multi-process model? Or does it *add* the functionality > > as a compile switch? (This would be the other way round as the one you pointed > > out.) > > Currently, exper. threaded postgres can have multiple processes using multiple threads with the same shared memory. There is no forking involved in the process though. Shared memory, mutexes, and conditonal locks go global or private to the process based on a run-time flag. > > That's kinda what I was hoping ... is it something that could be > seamlessly integrated to have minimal impact on the code itself ... even > if there was some way of having a 'thread.c' vs 'non-thread.c' that could > be link'd in, with wrapper functions? > The first basic problem is that global variables are scattered throughout the source as well as some static stack variables. Hunting these down and finding a home for them is, in and of itself, a major task. For example, flex produces code that is not thread safe, you have to modify that too. The current work around in exper. thrreaded postgres is not pretty, one "environment" structure that holds all the normal postgres globals in thread local storage. This makes compile time choices impractical I think. Cheers, Myron mkscott@sacadia.com
Le Mardi 5 Février 2002 20:36, Marc G. Fournier a écrit : > Apache2 has several "process models" ... prefork being one (like ours), or a 'worker', which is a prefork/threaded model where you can have n child processes, with m 'threads' inside of each ... not sure if something like that coul be retrofit'd into what we have, but ... ? Why not try to link Cygwin staticly? Best regards, Jean-Michel POURE
On Tue, 5 Feb 2002 mkscott@sacadia.com wrote: > The first basic problem is that global variables are scattered > throughout the source as well as some static stack variables. Hunting > these down and finding a home for them is, in and of itself, a major > task. For example, flex produces code that is not thread safe, you have > to modify that too. The current work around in exper. thrreaded > postgres is not pretty, one "environment" structure that holds all the > normal postgres globals in thread local storage. This makes compile > time choices impractical I think. Okay, but this has been discussed in the past concerning threading ... the first make work that would have to be done was 'cleaning the code' so that it was thread-safe ... Basically, if we were to look at moving *towards* a fork/thread model in the future, what can we learn and incorporate from the work already being done? How much of the work in the threaded server is cleaning up the code to be thread-safe, that would benefit the base code itself and start us down that path? Right now, from everythign I've heard, making the code thread-safe is one big onerous task ... but if we were to start incorporating changes from the 'thread work' that is being done now, into the base server, and ppl start thinking thread-safe when they are coding new stuff, over time, this task becomes smaller ...
On Tue, Feb 05, 2002 at 03:36:41PM -0400, Marc G. Fournier wrote: > Tha again, has anyone looked at the apache project? Apache2 has several > "process models" ... prefork being one (like ours), or a 'worker', which > is a prefork/threaded model where you can have n child processes, with m > 'threads' inside of each ... not sure if something like that coul be > retrofit'd into what we have, but ... ? We could even use the nice Apache Portable Runtime, which is a platform-independant layer over threading/networking/shm/etc (there's a summary here: http://apr.apache.org/docs/apr/modules.html). This might improve PostgreSQL on non-UNIX platforms, namely Win32. However, I think using threads is only a good idea if it gets us a substantial performance increase. From what I've seen, that isn't the case; and even if the time to create a connection is a bottleneck, there are other, more conservative ways of improving it (e.g. pre-forking, persistent backends, and IIRC some work Tom Lane was doing to reduce backend startup time). And given the complexity and reduced reliability that threads bring, I think the only advantage would be buzzword-compliance -- which isn't a priority, personally. Cheers, Neil
nconway@klamath.dyndns.org (Neil Conway) writes: > However, I think using threads is only a good idea if it gets us a > substantial performance increase. From what I've seen, that isn't the > case; and even if the time to create a connection is a bottleneck, there > are other, more conservative ways of improving it (e.g. pre-forking, > persistent backends, and IIRC some work Tom Lane was doing to reduce > backend startup time). The one place where it could be a clear win would be in splitting single very large queries over multiple CPUs. This would probably require an even larger redesign of the whole system than moving to a query-per-thread rather than per-process model. I think "real" multi-master replication and clustering is a better goal in the short term... -Doug -- Let us cross over the river, and rest under the shade of the trees. --T. J. Jackson, 1863
Doug McNaught wrote: > The one place where it could be a clear win would be in splitting > single very large queries over multiple CPUs. This would probably > require an even larger redesign of the whole system than moving to a > query-per-thread rather than per-process model. I think "real" > multi-master replication and clustering is a better goal in the short > term... Agreed. Though, starting to think & code thread safe would be nice too. Regards, Haroldo.
On Wed, 6 Feb 2002, Marc G. Fournier wrote: > Right now, from everythign I've heard, making the code thread-safe is one > big onerous task ... but if we were to start incorporating changes from > the 'thread work' that is being done now, into the base server, and ppl > start thinking thread-safe when they are coding new stuff, over time, this > task becomes smaller ... > I agree, once the move is made to thread-safe it becomes much easier to maintain thread-safe code. I also very much like the idea of multiple thread/process models that could be chosen from. I think the question has always been the inital cost vs. benefit. The group has not seen much to be gained for the amount of initial work involved. After working with the code, I too felt it wasn't worth it. After revisiting the threaded code after a long break I now see some real benefits to threading. For example, I was able to incorporate Tom Lane's lazy_vacuum code to do relation clean up automatically when a threshold of page writes occurred. I was also able to use the freespace information to be shared among threads in the process without touching shared mem. As a result, a pgbench run with 20 clients and over 1,000,000 trasactions maintained a more or less constant tps with manual vacuum commands and far less heap expansion. You can do this with processes (planned for 7.3 I think) but I think it was much easier with threads. Other things may open up with threads as well like Java stored procedures. Anyway, now I think it is worth it. Myron mkscott@sacadia.com
On Wed, 6 Feb 2002, Peter Eisentraut wrote: > Haroldo Stenger writes: > > > Though, starting to think & code thread safe would be nice too. > > The thing about thread-safeness is that it's only actually useful when > you're using threads. Otherwise it wastes everybody's time -- the > programmer's, the computer's, and the user's. The thing is, there are several areas where using threads would be a benefit, from what I've read on this list over the years ... as time goes on, less and less of the OSs in use dont' have threads, so we have to start *somewhere* to work towards that sort of hybrid system ...
Haroldo Stenger writes: > Though, starting to think & code thread safe would be nice too. The thing about thread-safeness is that it's only actually useful when you're using threads. Otherwise it wastes everybody's time -- the programmer's, the computer's, and the user's. -- Peter Eisentraut peter_e@gmx.net
On Wed, 6 Feb 2002 mkscott@sacadia.com wrote: > After revisiting the threaded code after a long break I now see some > real benefits to threading. For example, I was able to incorporate Tom > Lane's lazy_vacuum code to do relation clean up automatically when a > threshold of page writes occurred. I was also able to use the freespace > information to be shared among threads in the process without touching > shared mem. As a result, a pgbench run with 20 clients and over > 1,000,000 trasactions maintained a more or less constant tps with manual > vacuum commands and far less heap expansion. You can do this with > processes (planned for 7.3 I think) but I think it was much easier with > threads. Other things may open up with threads as well like Java stored > procedures. Anyway, now I think it is worth it. Are there code clean-ups that have gone into the thread'd code that could be incorporated into the existing code base that would start us down that path? For instance, based my limited understanding of threaded servers, I believe that 'global variables' are generally considered "A Real Bad Thing" ... in one of your email's, you mentioned: "The first basic problem is that global variables are scattered throughout the source as well as some static stack variables. Hunting these down and finding a home for them is, in and of itself, a major task. For example, flex produces code that is not thread safe, you have to modify that too. The current work around in exper. thrreaded postgres is not pretty, one "environment" structure that holds all the normal postgres globals in thread local storage. This makes compile time choices impractical I think." Now, what is a 'clean' solution to this? Making sure that all variables are passed through to various functions, maybe through a struct construct? So, can we start there and work our way through the code? Start simple ... take one of the global(s), put it into the struct and take it out of global space and make sure that its passed appropriately through all the required functions ... add in the next one, and do another trace? Someone, or a group of ppl, with thread knowledge needs to start this forward ... once the clean up begins, even without any thread code thrown in, it shouldn't be too difficult to keep it clean to go to 'the next step', no?
Peter Eisentraut wrote: > > Haroldo Stenger writes: > > > Though, starting to think & code thread safe would be nice too. > > The thing about thread-safeness is that it's only actually useful when > you're using threads. Otherwise it wastes everybody's time -- the > programmer's, the computer's, and the user's. Yes I see. The scenario under which I see doing it to be useful, is thinking in adding multi-threading for PG v 7.5 say, and preparing the road. But maybe it's a worthless effort. Many developers are pointing it. Let's forget about threads for now. By the way, my original question about how integrated the multi-threading fork reached, remained unanswered. I will assume it went threading, dropping forever the original behaviour, so deciding me towards not considering threading a viable option (for now). Regards, Haroldo.
"Marc G. Fournier" wrote: > The thing is, there are several areas where using threads would be a > benefit, from what I've read on this list over the years ... as time goes > on, less and less of the OSs in use dont' have threads, so we have to > start *somewhere* to work towards that sort of hybrid system ... Yes. But, maybe things like full-fledged replication, savepoints/nested transactions, out-of-transaction-scope cursors, and others must have priority over this; and that mutating PG thread safe, will slow down a 7.3 release a lot, something not wanted by many here. Let's make a pro cons list of thread related aspectcs here. We saw a lot of cons. Write some pros explicitely. We're not in a hurry anyway. Regards, Haroldo,
mkscott@sacadia.com wrote: > > On Wed, 6 Feb 2002, Haroldo Stenger wrote: > > > > > By the way, my original question about how integrated the multi-threading fork > > reached, remained unanswered. I will assume it went threading, dropping forever > > the original behaviour, so deciding me towards not considering threading a > > viable option (for now). > > Yes, you can use postmaster and fork for a connection...or at least you > could prior to some recent changes. I haven't tested it that way for > awhile but it should work. I find it very interesting. So you are telling us you were successfull in keeping both functionalities? So why don't you tell us what of an effort was it to convert the code to thread-safe? Just to compose a community view of the issue, and make a rational decision... Regards, Haroldo.
On Wed, 6 Feb 2002, Haroldo Stenger wrote: > > By the way, my original question about how integrated the multi-threading fork > reached, remained unanswered. I will assume it went threading, dropping forever > the original behaviour, so deciding me towards not considering threading a > viable option (for now). Yes, you can use postmaster and fork for a connection...or at least you could prior to some recent changes. I haven't tested it that way for awhile but it should work. Myron mkscott@sacadia.com
On Wed, 6 Feb 2002, Haroldo Stenger wrote: > "Marc G. Fournier" wrote: > > The thing is, there are several areas where using threads would be a > > benefit, from what I've read on this list over the years ... as time goes > > on, less and less of the OSs in use dont' have threads, so we have to > > start *somewhere* to work towards that sort of hybrid system ... > > Yes. > > But, maybe things like full-fledged replication, savepoints/nested > transactions, out-of-transaction-scope cursors, and others must have > priority over this; and If this are priorities for some, we do welcome patches from them to make it happen ... it is an open source project ... I am trying to encourage one person how has obviously spent a good deal of time on the whole threaded issue to start working at using his experience with PgSQL and Threading to see what, if anything, can be done to try and keep his work and ours from diverging too far ... > that mutating PG thread safe, will slow down a 7.3 release a lot, > something not wanted by many here. Depends on how it is handled ...
> Let's make a pro cons list of thread related aspectcs here. We saw a lot of > cons. Write some pros explicitely. We're not in a hurry anyway. I think in addition to pros/cons, an important question is: How has threading influenced other DBMS's? I know MySQL uses threading, at least in the development version; how much has it helped? Is the utility of a database based partly on the presence of threading? Take Oracle, MsSQL, and others; which have threading and which seem to gain from threading? I don't follow the other DB's as closely, so I don't know the answers. I suspect that looking at other databases will give us a clue about the magnitude of the pros, rather than just the areas of influence. Regards,Jeff
"Marc G. Fournier" wrote: > > On Wed, 6 Feb 2002, Haroldo Stenger wrote: > > > "Marc G. Fournier" wrote: > > > The thing is, there are several areas where using threads would be a > > > benefit, from what I've read on this list over the years ... as time goes > > > on, less and less of the OSs in use dont' have threads, so we have to > > > start *somewhere* to work towards that sort of hybrid system ... > > > > Yes. > > > > But, maybe things like full-fledged replication, savepoints/nested > > transactions, out-of-transaction-scope cursors, and others must have > > priority over this; and > > If this are priorities for some, we do welcome patches from them to make > it happen ... it is an open source project ... I am trying to encourage > one person how has obviously spent a good deal of time on the whole > threaded issue to start working at using his experience with PgSQL and > Threading to see what, if anything, can be done to try and keep his work > and ours from diverging too far ... Yes, that was my very original thinking. We shouldn't waste programmers or code. But we're trying to make an idea of cost/benefit/risk. Let's go on with this discussion, basing it on the pros outlined by the threaded fork knowledge holders, right? Maybe they are tired, maybe they spent too much effort, and don't want to do it again. Should that be the case, at least let us obtain information from the *developent process* of their work in order to measure the impact on current source tree with current programming force. > > > that mutating PG thread safe, will slow down a 7.3 release a lot, > > something not wanted by many here. > > Depends on how it is handled ... How do you see it not slowing down, when key developers said their view is that multithreading will pose a major obstacle? Are you envisioning any special approach not already talked about? Regards, Haroldo.
On Wed, 6 Feb 2002, Marc G. Fournier wrote: > Are there code clean-ups that have gone into the thread'd code that could > be incorporated into the existing code base that would start us down that > path? I don't think existing code is much help. So much has changed since 7.0.2 that the current threaded code is propbably only good for investigating the benefits of threading and maybe some porting techniques. > For instance, based my limited understanding of threaded servers, I > believe that 'global variables' are generally considered "A Real Bad > Thing" ... in one of your email's, you mentioned: > > "The first basic problem is that global variables are scattered throughout > the source ..." > > Now, what is a 'clean' solution to this? The current threaded postgres is messy because I just packed all the global variables, including those produced be flex, into a 5K structure. Everytime threaded code needed a "global", it called a function to retrieve a pointer from thread local storage. When I profiled the code I saw way too many calls to grab the environment structure and I modified some hotspots to pass the structure down the call chain. Ideally, I think that the "environment" structure could be optimized for size and passed down the call chain to reduce the number of times thread local storage is accessed. This is also bad because when anyone working on a segment of code needs a global, they need to add it to the "environment" structure. I don't think this would be a good situation for code maintainers. > > Someone, or a group of ppl, with thread knowledge needs to start this > forward ... once the clean up begins, even without any thread code thrown > in, it shouldn't be too difficult to keep it clean to go to 'the next > step', no? > I came up with a process to find global variables in the code that became somewhat effective and could be applied to the current code. Someone else might have a better way of ding this though. Myron mkscott@sacadia.com
On Wed, 6 Feb 2002, Haroldo Stenger wrote: > > Depends on how it is handled ... > > How do you see it not slowing down, when key developers said their view > is that multithreading will pose a major obstacle? Are you envisioning > any special approach not already talked about? Read my previous emails? To move *any part of PgSQL* to a threaded model (even one where each connection is still forked, but parts of each connection are threaded), the mess of global variables needs to be cleaned up ... that will be one of the "major obstacles" ... if someone with a knowledge of making code thread-safe were to submit patches (even very large ones) that start to clean this up, it could be broken down into more manageable chunks ... The second major obstacle that has been identified is cross-platform comapability ... as I mentioned already, and another has also, Apache2 has their APR code that might help us reduce that obstacle to a more manageable level, since, I believe, the Apache license wouldn't restrict us to being able to use/distribute the code ... this is definitely something that we'd have to look into to make sure though ... The point is that nobody is even implying that this is a "for v7.3" project ... there have been several projects that have been initiated over the years that have straddled releases, and we have alot of very good developers, and testers, that will make sure that any changes are "for the good" ...
On Wed, 6 Feb 2002, Haroldo Stenger wrote: > > > that mutating PG thread safe, will slow down a 7.3 release a lot, > > > something not wanted by many here. > > > > Depends on how it is handled ... > > How do you see it not slowing down, when key developers said their view is that > multithreading will pose a major obstacle? Are you envisioning any special > approach not already talked about? Excuse my butting in, but it large part we are talking about changing things like: if (PqSomeStaticOrGlobalVariable) { ... } to if (MyPort->PqSomeVariable) { ... } converting to thread safety should not, at least for this kind of low hanging fruit, have any negative performance impact. And from my vantage point it takes out a whole lot of "where did that come from and who set it when?" kinda questions when reading the code. Of course I'm just getting my feet wet so feel free to correct my first impressions. Brian
On Wed, 6 Feb 2002, Brian Bruns wrote: > On Wed, 6 Feb 2002, Haroldo Stenger wrote: > > > > > that mutating PG thread safe, will slow down a 7.3 release a lot, > > > > something not wanted by many here. > > > > > > Depends on how it is handled ... > > > > How do you see it not slowing down, when key developers said their view is that > > multithreading will pose a major obstacle? Are you envisioning any special > > approach not already talked about? > > Excuse my butting in, but it large part we are talking about changing > things like: > > if (PqSomeStaticOrGlobalVariable) { ... } > > to > > if (MyPort->PqSomeVariable) { ... } > > converting to thread safety should not, at least for this kind of low > hanging fruit, have any negative performance impact. And from my vantage > point it takes out a whole lot of "where did that come from and who set it > when?" kinda questions when reading the code. Of course I'm just getting > my feet wet so feel free to correct my first impressions. This is one way that it could be accomplish ... I think one of the more proper ways would be to convert the Global variables to proper function calls ... a combination of the two would most likely be optimal ...
On Wed, 6 Feb 2002 mkscott@sacadia.com wrote: > I came up with a process to find global variables in the code that > became somewhat effective and could be applied to the current code. > Someone else might have a better way of ding this though. Is this something that could be added to the distribution similar to some of the other development tools? Is it a shell script?
Brian Bruns wrote: > > On Wed, 6 Feb 2002, Haroldo Stenger wrote: > > > > > that mutating PG thread safe, will slow down a 7.3 release a lot, > > > > something not wanted by many here. > > > > > > Depends on how it is handled ... > > > > How do you see it not slowing down, when key developers said their view is that > > multithreading will pose a major obstacle? Are you envisioning any special > > approach not already talked about? > > Excuse my butting in, but it large part we are talking about changing > things like: > > if (PqSomeStaticOrGlobalVariable) { ... } > > to > > if (MyPort->PqSomeVariable) { ... } > > converting to thread safety should not, at least for this kind of low > hanging fruit, have any negative performance impact. And from my vantage > point it takes out a whole lot of "where did that come from and who set it > when?" kinda questions when reading the code. Of course I'm just getting > my feet wet so feel free to correct my first impressions. Just that when I said "will slow down a 7.3 release a lot", I was referring to *the date of the release*, not its inherent performance, the code to be multi-threaded or not. It was a software engineering sort of consideration. Regards, Haroldo.
Here I'll respectfully compile the opinions that I found of impact over a dicision: Revisited key developer opinion 1: Tom Lane wrote: > > If someone wanted to submit appropriate patches for the v7.3 development > > tree, that merge cleanly, I can't see why this wouldn't be a good thing > > I would resist it. I do not think we need the portability and > reliability headaches that would come with it. Furthermore, > an #ifdef'd implementation would be the worst of all possible > worlds, as it would do major damage to readability of the code. Revisited key developer opinion 2: Peter Eisentraut wrote: > > Though, starting to think & code thread safe would be nice too. > > The thing about thread-safeness is that it's only actually useful when > you're using threads. Otherwise it wastes everybody's time -- the > programmer's, the computer's, and the user's. So at least for Tom Lane and Peter E., threads are hard to implement. For Tom, we would enter a world of portability and reliability headaches. For Peter, unless we *want* threads, we don't have to start *now* coding thread safe. Please correct me if I'm wrong. Zeugswetter Andreas SB SD wrote: > > If someone wanted to submit appropriate patches for the v7.3 development > > tree, that merge cleanly, I can't see why this wouldn't be a good thing > > I thought that the one thread instead of one process per client model > would only be an advantage for the "native Windows port" ? > > Imho a useful threaded model on unix would involve a separation of threads > and clients. ( 1 CPU thread per physical CPU, several IO threads) > But that would involve a complete redesign. For Andreas, for a threaded PG to be useful under a Unix environment, a complete PG redesign would be needed. "Marc G. Fournier" wrote: > > On Wed, 6 Feb 2002, Haroldo Stenger wrote: > > > > Depends on how it is handled ... > > > > How do you see it not slowing down, when key developers said their view > > is that multithreading will pose a major obstacle? Are you envisioning > > any special approach not already talked about? > > Read my previous emails? To move *any part of PgSQL* to a threaded model > (even one where each connection is still forked, but parts of each > connection are threaded), the mess of global variables needs to be cleaned > up ... that will be one of the "major obstacles" ... if someone with a > knowledge of making code thread-safe were to submit patches (even very > large ones) that start to clean this up, it could be broken down into more > manageable chunks ... > Yes, I liked too the idea of multiple process, running multiple threads each, distributed under some wise criteria. I wonder if cleaning up the mess of global variables, seems not convenient from Peter's or Tom's point of view. Standard wisdom says globals should be avoided. In current PG's case, they should be reworked in a way or another. > The second major obstacle that has been identified is cross-platform > comapability ... as I mentioned already, and another has also, Apache2 has > their APR code that might help us reduce that obstacle to a more > manageable level, since, I believe, the Apache license wouldn't restrict > us to being able to use/distribute the code ... this is definitely > something that we'd have to look into to make sure though ... I agree with cross-polinization among open source projects. BTW, this practice should be encouraged, and not called "stealing", not even as a joke, as I've seen it called for example for the TCP/IP Linux stack code (99% sure this was the one module), which came from the *BSD projects, in its very first version. Also mentioning that BSD -> GPL was possible, but not the other way round; I don't mean to start a war or anything, just exposing facts. > The point is that nobody is even implying that this is a "for v7.3" > project ... there have been several projects that have been initiated over > the years that have straddled releases, and we have alot of very good > developers, and testers, that will make sure that any changes are "for the > good" ... Yes, I agree. If starting to think & code thread safe *now* proves *not* to be a waste of everybody's time, that's the path to follow. This very point is the one under technical examination, right? Regards, Haroldo.
On Wed, 6 Feb 2002, Jeff Davis wrote: > I think in addition to pros/cons, an important question is: > How has threading influenced other DBMS's? I know MySQL uses threading, at > least in the development version; how much has it helped? Is the utility of a I think threads was or is a big deal Informix (now IBM) Dynamic Server. With a combination of multiple processes and threads it is able to spread a query among multiple processors and recruit more resources for complex queries. Myron
On Wed, 6 Feb 2002, Marc G. Fournier wrote: > > Is this something that could be added to the distribution similar to some > of the other development tools? Is it a shell script? > No, but I suppose it could and should be, I just used a combination of the commands nm and grep to find all the global symbols in the object files of each subsection then went through the code and determined if they needed to be moved. Myron
On Wed, 2002-02-06 at 23:00, mkscott@sacadia.com wrote: > > > On Wed, 6 Feb 2002, Marc G. Fournier wrote: > > > Right now, from everythign I've heard, making the code thread-safe is one > > big onerous task ... but if we were to start incorporating changes from > > the 'thread work' that is being done now, into the base server, and ppl > > start thinking thread-safe when they are coding new stuff, over time, this > > task becomes smaller ... > > > > I agree, once the move is made to thread-safe it becomes much easier to > maintain thread-safe code. I also very much like the idea of multiple > thread/process models that could be chosen from. I think the question has > always been the > inital cost vs. benefit. The group has not seen much to be gained for > the amount of initial work involved. After working with the code, I too > felt it wasn't worth it. > > After revisiting the threaded code after a long break I now see some real > benefits to threading. For example, I was able to incorporate Tom Lane's > lazy_vacuum code to do relation clean up automatically when a threshold of > page writes occurred. Could you please explain why it was easier to do with your threaded version than with the standard version ? > I was also able to use the freespace information to > be shared among threads in the process without touching shared mem. As a > result, a pgbench run with 20 clients and over 1,000,000 > trasactions maintained a more or less constant tps with manual > vacuum commands and far less heap expansion. Do you mean that "it ran at more or less the same speed as when running comcurrent manual VACUUMs" ? Btw, have you tried comparing pgbench runs on threaded model vs forked model. IIRC your code can run both ways. > You can do this with processes (planned for 7.3 I think) but I > think it was much easier with threads. Other things may open up with > threads as well like Java stored procedures. Anyway, now I think it is > worth it. In my experience any code cleanup will eventually pay off (if the project lives long enough :) --------- Hannu
On Thu, Feb 07, 2002 at 12:03:56PM +0200, Hannu Krosing wrote: > Btw, have you tried comparing pgbench runs on threaded model vs forked > model. IIRC your code can run both ways. It depend on OS. For example do fork and create thread is very simular on Linux. May be ..can be some speed difference betweenlocking and access to shared memory? IMHO in thread version is problem with backend crash (user's bugs in PL .etc). > > You can do this with processes (planned for 7.3 I think) but I > > think it was much easier with threads. Other things may open up with > > threads as well like Java stored procedures. Anyway, now I think it is > > worth it. Are all current PL interpereters thread safe? Karel -- Karel Zak <zakkr@zf.jcu.cz>http://home.zf.jcu.cz/~zakkr/C, PostgreSQL, PHP, WWW, http://docs.linux.cz, http://mape.jcu.cz
On Thu, 7 Feb 2002, Haroldo Stenger wrote: > > > Brian Bruns wrote: > > > > On Wed, 6 Feb 2002, Haroldo Stenger wrote: > > > > > > > that mutating PG thread safe, will slow down a 7.3 release a lot, > > > > > something not wanted by many here. > > > > > > > > Depends on how it is handled ... > > > > > > How do you see it not slowing down, when key developers said their view is that > > > multithreading will pose a major obstacle? Are you envisioning any special > > > approach not already talked about? > > > > Excuse my butting in, but it large part we are talking about changing > > things like: > > > > if (PqSomeStaticOrGlobalVariable) { ... } > > > > to > > > > if (MyPort->PqSomeVariable) { ... } > > > > converting to thread safety should not, at least for this kind of low > > hanging fruit, have any negative performance impact. And from my vantage > > point it takes out a whole lot of "where did that come from and who set it > > when?" kinda questions when reading the code. Of course I'm just getting > > my feet wet so feel free to correct my first impressions. > > Just that when I said "will slow down a 7.3 release a lot", I was referring to > *the date of the release*, not its inherent performance, the code to be > multi-threaded or not. It was a software engineering sort of consideration. Again, if we go at it as 'threaded for v7.3', then most probably ... but I would not allow that to happen, nor would any of the *core* developers ... what I am, and have been, advocating is starting down the 'thread-safe' path ... as has actually been discussed before, there are sections of PostgreSQL that could make use of threading without the whole system *being* threaded ... stuff that, right now, are done sequentially that could be done in parralel if threading was available ...
On Thu, 7 Feb 2002, Haroldo Stenger wrote: > Here I'll respectfully compile the opinions that I found of impact over a > dicision: > > Revisited key developer opinion 1: > > Tom Lane wrote: > > > If someone wanted to submit appropriate patches for the v7.3 development > > > tree, that merge cleanly, I can't see why this wouldn't be a good thing > > > > I would resist it. I do not think we need the portability and > > reliability headaches that would come with it. Furthermore, > > an #ifdef'd implementation would be the worst of all possible > > worlds, as it would do major damage to readability of the code. Put this into context ... I had suggested someone submit'ng #ifdef'd code that could implement threaded, not that someone submit'd code to clean up a mess that nobody *really* wants to clean up due to time and lack of visibility/glory *grin* > > Revisited key developer opinion 2: > > Peter Eisentraut wrote: > > > Though, starting to think & code thread safe would be nice too. > > > > The thing about thread-safeness is that it's only actually useful when > > you're using threads. Otherwise it wastes everybody's time -- the > > programmer's, the computer's, and the user's. > > > So at least for Tom Lane and Peter E., threads are hard to implement. > For Tom, we would enter a world of portability and reliability > headaches. For Peter, unless we *want* threads, we don't have to start > *now* coding thread safe. Please correct me if I'm wrong. yes and no ... Tom is/was looking at it from an 'implement it for all the systems we currently support' point of view, without looking at (and Tom, feel free to correct me if I'm wrong) what has been implemented outside of our project to simplify the portability and reliability issues associated with supporting both a fork and fork/thread model ... with the work that the Apache group has done in this regard, and the fact that their license is not restrictive, both issues may (or may not) be moot, but someone has to investigate that ... In Peter's case ... I'm sorry, but I was always taught in programming that "global variables should be avoided at all costs" ... right now, all I'm advocating *right now* is making our variables thread safe, which, from my understanding, means getting rid of the global variables ... not sure how that affects the users themselves, but, from a programmers standpoint, the 'time' is what the person cleaning the code has to put into it ... once its cleaned up, any new code or changes should just automatically be "global variables aren't permitted" Both Tom and Peter have better/more important things on their plates then to go through the code and clean up the global variables ... Eventually, I would like to see, where possible, threaded code put in so that each connection is *still* forked, but parts of the connection that could deal with more parralel processing making use of threads to speed it up ... > I wonder if cleaning up the mess of global variables, seems not > convenient from Peter's or Tom's point of view. Standard wisdom says > globals should be avoided. In current PG's case, they should be reworked > in a way or another. Correct, and that is what I am currently advocating ... if we get that cleaned up, so that 'threaded' is possible, nothing stops the next step being someone submit'ng a simple patch that uses threading to 'read from disk while processing what has been read in, as it is being read in' ... the point is, until we clean out the *time consuming, but relatively easy* anti-thread issues we have, even if that is over several releases, nothing else is going to happen cause "its too big of a job" ... what I would like to see is someone submitting large patches that clean the global variables, one global at a time ... I say large, because I would imagine that pretty much any global is going to hit a *large* number of files to remove it, and add it back in as an arg to functions ... I can't see anyone convincingly argue against such patches, since, IMHO, global variables are a remenent of when we took over the code from Berkeley, I can't see any of the core developers actually *approving* of them being there except the work involved in removing them ... :)
Haroldo Stenger wrote: <snip> > > I agree with cross-polinization among open source projects. BTW, this practice > should be encouraged, and not called "stealing", not even as a joke, as I've > seen it called for example for the TCP/IP Linux stack code (99% sure this was > the one module), which came from the *BSD projects, in its very first version. > Also mentioning that BSD -> GPL was possible, but not the other way round; I > don't mean to start a war or anything, just exposing facts. > > > The point is that nobody is even implying that this is a "for v7.3" > > project ... there have been several projects that have been initiated over > > the years that have straddled releases, and we have alot of very good > > developers, and testers, that will make sure that any changes are "for the > > good" ... > > Yes, I agree. If starting to think & code thread safe *now* proves *not* to be a > waste of everybody's time, that's the path to follow. This very point is the one > under technical examination, right? So, with this thought in mind of "starting to think & code thread safe", we should start putting together a set of reference guidlines, especially drawing on the experience of people whom have good, solid experience with threaded, multi-process, cross-platform coding. It should take into account the people who are reading it, may not be as experienced in this um... specialised area of coding too. We've identified "global variables" needing to be done in a better and more consistent way. So, what else do coders need to do when "thinking and coding thread safe", that we can make into a guidline for forthcoming PostgreSQL coding? :-) Regards and best wishes, Justin Clift > Regards, > Haroldo. > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org -- "My grandfather once told me that there are two kinds of people: those who work and those who take the credit. He told me to try to be in the first group; there was less competition there." - Indira Gandhi
> Again, if we go at it as 'threaded for v7.3', then most probably ... but I > would not allow that to happen, nor would any of the *core* developers ... > what I am, and have been, advocating is starting down the 'thread-safe' > path ... as has actually been discussed before, there are sections of > PostgreSQL that could make use of threading without the whole system > *being* threaded ... stuff that, right now, are done sequentially that > could be done in parralel if threading was available ... How about doing what Marc suggests and start moving toward reentrant functions in postgres. This could be done by creating a global private memory area that is accessed much like shared memory is now with a hash table setting aside memory for various code subsections. We could put all the global variables there with little impact on current functionality and, if done right, speed. I think I have a good idea as to where most of the "difficult" globals are and could start working on moving them once the global memory area was set up. We can worry about threads vs. processes later. comments? Myron
Justin Clift wrote: > > Haroldo Stenger wrote: > <snip> > > > > I agree with cross-polinization among open source projects. BTW, this practice > > should be encouraged, and not called "stealing", not even as a joke, as I've > > seen it called for example for the TCP/IP Linux stack code (99% sure this was > > the one module), which came from the *BSD projects, in its very first version. > > Also mentioning that BSD -> GPL was possible, but not the other way round; I > > don't mean to start a war or anything, just exposing facts. > > > > > The point is that nobody is even implying that this is a "for v7.3" > > > project ... there have been several projects that have been initiated over > > > the years that have straddled releases, and we have alot of very good > > > developers, and testers, that will make sure that any changes are "for the > > > good" ... > > > > Yes, I agree. If starting to think & code thread safe *now* proves *not* to be a > > waste of everybody's time, that's the path to follow. This very point is the one > > under technical examination, right? > > So, with this thought in mind of "starting to think & code thread safe", > we should start putting together a set of reference guidlines, > especially drawing on the experience of people whom have good, solid > experience with threaded, multi-process, cross-platform coding. It > should take into account the people who are reading it, may not be as > experienced in this um... specialised area of coding too. > > We've identified "global variables" needing to be done in a better and > more consistent way. > > So, what else do coders need to do when "thinking and coding thread > safe", that we can make into a guidline for forthcoming PostgreSQL > coding? Going from a "process model" to a "threaded model" is a HUGE undertaking. In the process model, all data is assumed to be private, and shared data must be explicitly shared. In a threaded model all data is implicitly shared and private data must be explicitly made private. Do not under estimate what this means or how hard it is to convert one to the other. Also: Think of file handles. In a threaded version of postgreSQL, all connections will be competing for file handles. I think the limit in Linux is 1024. All threads will be competing for memory mapping. As systems get more and more RAM, on the x86 and other 32 bit machines, process space is limited to 2 to 3 gig. If you have 8 gig in your system, PostgreSQL won't be able to use it. As I have said before, multithreading queries within a connection process would be pretty cool, on a low load server, this could make a big performance increase, but it may be easier to create a couple I/O threads per connection process and devise some queuing mechanism for disk reads/write. In essence provide an asynchronous I/O system. This would give you the some of the performance of multithreading a query, while not requiring a complete thread-safe implementation. I think threading connections is a VERY bad idea. I am dubious that the amount of work will result in a decent return on investment.
... > As I have said before, multithreading queries within a connection > process would be pretty cool, on a low load server, this could make a > big performance increase, but it may be easier to create a couple I/O > threads per connection process and devise some queuing mechanism for > disk reads/write. In essence provide an asynchronous I/O system. This > would give you the some of the performance of multithreading a query, > while not requiring a complete thread-safe implementation. The other use case would be a high load server with only one or a few connections (big queries, few clients); see below. > I think threading connections is a VERY bad idea. I am dubious that the > amount of work will result in a decent return on investment. Agreed. A subset area which *might* be a benefit for the use case above is to allow threading of subqueries, which might happen after the optimizer section of code. That is a (pretty big) fraction of the code, not all of it, and it would still continue the benefits of the process-per-client model while allowing a client to spread across multiple processors. The other area which could be exploited with restructuring to allow post-optimizer threading is distributed databases, where each of those subqueries could be rerouted to another server. A first cut would be to allow read-only distributed databases; that might demote the nomenclature for this to federated databases, but it is still an interesting capability. - Thomas
On Thu, 7 Feb 2002, mlw wrote: > > Going from a "process model" to a "threaded model" is a HUGE > undertaking. In the process model, all data is assumed to be private, > and shared data must be explicitly shared. In a threaded model all data > is implicitly shared and private data must be explicitly made private. > Do not under estimate what this means or how hard it is to convert one > to the other. Agreed. > > Also: > > Think of file handles. In a threaded version of postgreSQL, all > connections will be competing for file handles. I think the limit in > Linux is 1024. > Yes, but because the current file manager is built with three layers of absraction OS FD --> Postgres Vfd --> Postgres Storage Manager it is possible to manage and configure this very nicely. For threaded postgres, each thread has its own storage manager which share Vfd's to sharing max. This prevents too many threads from trying to seek on the same OS FD. The Vfd's manage OS FD resources. > All threads will be competing for memory mapping. As systems get more > and more RAM, on the x86 and other 32 bit machines, process space is > limited to 2 to 3 gig. If you have 8 gig in your system, PostgreSQL > won't be able to use it. > You should be able to set up several processes in shared memory for the db. 5 processes * 256 client threads per process = 1280 clients or something like that. > As I have said before, multithreading queries within a connection > process would be pretty cool, on a low load server, I think this would be possible now if I knew how to spin out subqueries from the query tree. Myron mkscott@sacadia.com
On Thu, 2002-02-07 at 19:13, mlw wrote: > Justin Clift wrote: > > Also: > > Think of file handles. In a threaded version of postgreSQL, all > connections will be competing for file handles. I think the limit in > Linux is 1024. From what I've seen we are more likely to hit the per-system file handle limit when all separate forks open the same files over and over again, so as the number of processes grows we will be worse off than usin the same file handles for all connections in threaded mode. > I think threading connections is a VERY bad idea. I am dubious that the > amount of work will result in a decent return on investment. This whole thread started with a notion that this has already been done once and the idea was to investigate what could be brought over to main forked-only (the threaded version could be forked at the same time) codebase. ---------------- Hannu
On Thu, 2002-02-07 at 12:49, Karel Zak wrote: > IMHO in thread version is problem with backend crash (user's bugs in > PL .etc). The current behaviour for crashing one backend is also "terminate all backends as something bad may have happened to shared memory". ---------- Hannu
On Thu, 7 Feb 2002, mlw wrote: > As I have said before, multithreading queries within a connection > process would be pretty cool, on a low load server, this could make a > big performance increase, but it may be easier to create a couple I/O > threads per connection process and devise some queuing mechanism for > disk reads/write. In essence provide an asynchronous I/O system. This > would give you the some of the performance of multithreading a query, > while not requiring a complete thread-safe implementation. > > I think threading connections is a VERY bad idea. I am dubious that the > amount of work will result in a decent return on investment. I don't believe anyone (or, at least I hope not) is advocating threading connections ... with systems getting more and more CPUs, and more and more RAM, what I'm advocating is looking at taking pieces from within the connection itself and threading those, to improve performance ... from what I can tell with Apache2 itself, there is no "thread only" model that they are advocating ... the closest is their 'worker' where you can have multiple connections threaded in multiple processes, so, in theory, you could limit to a large number of threads and a very low number of processes ...
On Thu, 7 Feb 2002, Marc G. Fournier wrote: > > I don't believe anyone (or, at least I hope not) is advocating threading > connections ... with systems getting more and more CPUs, and more and more > RAM, what I'm advocating is looking at taking pieces from within the > connection itself and threading those, to improve performance ... from > what I can tell with Apache2 itself, there is no "thread only" model that > they are advocating ... the closest is their 'worker' where you can have > multiple connections threaded in multiple processes, so, in theory, you > could limit to a large number of threads and a very low number of > processes ... Making postgres functions thread-safe increases the flexibility of the codebase. Whether threading connections, sub-queries, increasing processor utilization, or some other unforseen optimization, having reentrant and thread-safe code leaves the door open for new ideas. Yes, writing reenterant code can be restrictive and a little more complex, but not much, the big work is the upfront cost of porting. I have done it done it once and gained a great deal on projects that I am working on. Myron mkscott@sacadia.com
On Thu, 7 Feb 2002, mlw wrote: > > Going from a "process model" to a "threaded model" is a HUGE > undertaking. In the process model, all data is assumed to be private, > and shared data must be explicitly shared. In a threaded model all data > is implicitly shared and private data must be explicitly made private. > Do not under estimate what this means or how hard it is to convert one > to the other. > I agree with the first and last sentance ... the rest of the paragraph is ... well we argued this before - look in the archives. > Also: > > Think of file handles. In a threaded version of postgreSQL, all > connections will be competing for file handles. I think the limit in > Linux is 1024. Depends on how it is done. > All threads will be competing for memory mapping. As systems get more > and more RAM, on the x86 and other 32 bit machines, process space is > limited to 2 to 3 gig. If you have 8 gig in your system, PostgreSQL > won't be able to use it. Depends on how it is done. > I think threading connections is a VERY bad idea. I am dubious that the > amount of work will result in a decent return on investment. Depends on how it is done. We should be careful to assume that threading postgresql instantly equates to threading connections. That is only *ONE* possible type of threading architecture one could choose. Making broad generalized statements doesn't accomplish anything in this debate ... instead be more focused with your comments so one can make heads or tails out of them. -- //========================================================\\ || D. Hageman <dhageman@dracken.com> || \\========================================================//
"D. Hageman" wrote: > > On Thu, 7 Feb 2002, mlw wrote: > > > > Going from a "process model" to a "threaded model" is a HUGE > > undertaking. In the process model, all data is assumed to be private, > > and shared data must be explicitly shared. In a threaded model all data > > is implicitly shared and private data must be explicitly made private. > > Do not under estimate what this means or how hard it is to convert one > > to the other. > > > > I agree with the first and last sentance ... the rest of the paragraph is > ... well we argued this before - look in the archives. yes, I know. > > > Also: > > > > Think of file handles. In a threaded version of postgreSQL, all > > connections will be competing for file handles. I think the limit in > > Linux is 1024. > > Depends on how it is done. How does it depend? If you have one process with multiple threads, you will bump up against the process limit of file handles. > > > All threads will be competing for memory mapping. As systems get more > > and more RAM, on the x86 and other 32 bit machines, process space is > > limited to 2 to 3 gig. If you have 8 gig in your system, PostgreSQL > > won't be able to use it. > > Depends on how it is done. Again, How does it depend? If you have one process, there is a limit to the amount of memory it can access. 3gig (2gig on older Windows) of process space it is a classic limitation to x86 operating systems. > > > I think threading connections is a VERY bad idea. I am dubious that the > > amount of work will result in a decent return on investment. > > Depends on how it is done. We should be careful to assume that threading > postgresql instantly equates to threading connections. That is only *ONE* > possible type of threading architecture one could choose. Making broad > generalized statements doesn't accomplish anything in this debate ... > instead be more focused with your comments so one can make heads or tails > out of them. There are, AFAIK two reasons to thread PostgreSQL: (1) Run the multiple connections in their own thread with the assumption that this is more efficient for [n] reasons. (2) Run a single query across multiple threads, thus parallelizing the query engine. There is a mutant of this as well: (1a) You could have multiple processes each with [n] connection threads. As far as PostgreSQL is concerned, I am dubious that (1) or (1a) will provide any real benefit for the amount of work required to accomplish it. Work on "pre-forking" would be FAR more productive. The idea of parallelizing queries could be very worth while. However, that being said, creating a set of I/O threads that get blocks from disk devices asynchronously, my be enough with a very limited amount of work. I guess all I am saying, is that a person's time is really the only limited resource. Tom, Bruce, Marc, Peter and everyone else have a limited amount of time. If I could influence how those guys spend their time, I would hope they spent time working on improving the functionality of PostgreSQL, not the tedium of making it thread safe. > > -- > //========================================================\\ > || D. Hageman <dhageman@dracken.com> || > \\========================================================// > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
On Thu, 7 Feb 2002, mlw wrote: <SNIP a bunch crap that will hopefully be implicitly explained and understand by the comments below> > There are, AFAIK two reasons to thread PostgreSQL: > > (1) Run the multiple connections in their own thread with the assumption > that this is more efficient for [n] reasons. > (2) Run a single query across multiple threads, thus parallelizing the > query engine. (3) Parallelize house keeping (for example vacuums) of the database. I think they are going to call this processes or something slated for the next version? (4) Replication (5) Referential Integritity cleanups (6) EXOTIC FEATURES: crossdb Oh yeah ... and we might be able to drop the whole startup time section from the TODO list. It all depends on how one wants to implement the threads into postgresql. Then again ... maybe a task of this endeavor would be more appropriately forked off and proceeded on as a seperate project (as it kinda as already been done). > I guess all I am saying, is that a person's time is really the only > limited resource. Tom, Bruce, Marc, Peter and everyone else have a > limited amount of time. If I could influence how those guys spend their > time, I would hope they spent time working on improving the > functionality of PostgreSQL, not the tedium of making it thread safe. The people that do the biggest amount of coding should definately code what they feel is the best to work on - NO one is arguing that. If a few of them want to assist in this endeavor then they should do that as well. Most importantly - we shouldn't belittle the efforts of those that do see the vision of how this could be beneficial in the long run. My point is that I see more people wasting time complaining then it would take to make up a list of coding practices to follow for future work that will make the postgresql code base better. (Come on ... the first thing a programmer is taught is that global variables are BAD). -- //========================================================\\ || D. Hageman <dhageman@dracken.com> || \\========================================================//
"D. Hageman" <dhageman@dracken.com> writes: > (Come on ... the first thing a programmer is > taught is that global variables are BAD). Reality check time: I don't believe there are very many gratuitously-static variables in the backend. Most of the ones I can think of offhand are associated with data structures that are actually global, or at least would be of interest to more than one thread. (For example, the catcache/relcache data structures are referenced from static variables. You would very likely want these caches to be shared across as many threads as possible. The data structures associated with configuration variables would need to be shared by all threads executing on behalf of a particular client connection. Etc.) So the hard part of making the code "thread safe" is figuring out what we want to do with potentially-sharable data structures: can they be shared, if so across what scope, and what sort of locking penalty will we pay for sharing them? Maybe I'm missing something, but I don't think that a "coding practices" document will do much of anything to improve our threading situation. It might be worth having on other grounds, but not that one. regards, tom lane
"D. Hageman" <dhageman@dracken.com> writes: > (3) Parallelize house keeping (for example vacuums) of the database. I > think they are going to call this processes or something slated for the > next version? > > (4) Replication > > (5) Referential Integritity cleanups > > (6) EXOTIC FEATURES: crossdb I fail to see how threads are required for any of these. They could just as well be done with a separate process(es) in the current model. -Doug -- Let us cross over the river, and rest under the shade of the trees. --T. J. Jackson, 1863
On 7 Feb 2002, Doug McNaught wrote: > "D. Hageman" <dhageman@dracken.com> writes: > > > (3) Parallelize house keeping (for example vacuums) of the database. I > > think they are going to call this processes or something slated for the > > next version? > > > > (4) Replication > > > > (5) Referential Integritity cleanups > > > > (6) EXOTIC FEATURES: crossdb > > I fail to see how threads are required for any of these. They could > just as well be done with a separate process(es) in the current model. > Oh, I didn't realize the conversation was about what threads was "required" for completing. My mistake ... *cough* *cough* -- //========================================================\\ || D. Hageman <dhageman@dracken.com> || \\========================================================//
On Thu, 7 Feb 2002, Tom Lane wrote: > > Maybe I'm missing something, but I don't think that a "coding practices" > document will do much of anything to improve our threading situation. > It might be worth having on other grounds, but not that one. > You aren't missing anything. A document of coding practices with points on using thread-safe functions etc. isn't going to revolutionize anything. However, it has the potential of being the best way to begin and soften the cries of the luddites (which is the biggest problem at the momment). -- //========================================================\\ || D. Hageman <dhageman@dracken.com> || \\========================================================//
On Thu, 7 Feb 2002, mlw wrote: > How does it depend? If you have one process with multiple threads, you > will bump up against the process limit of file handles. So? Use an OS that doesn't impose such limits, or lets you increase them? > Again, How does it depend? If you have one process, there is a limit to > the amount of memory it can access. 3gig (2gig on older Windows) of > process space it is a classic limitation to x86 operating systems. But, we aren't talking about one *big* process with many threads ... we are talking several processes that make use of threads to speed up various processes ... kinda like programming in C for 99% of a project, but going to assembly for stuff that could use that little bit of a boost ... > I guess all I am saying, is that a person's time is really the only > limited resource. Tom, Bruce, Marc, Peter and everyone else have a > limited amount of time. If I could influence how those guys spend their > time, I would hope they spent time working on improving the > functionality of PostgreSQL, not the tedium of making it thread safe. Except that, as several ppl have pointed out, that 'tedium' could result in functionality that we really don't have right now ... right now, with a "non-threaded, single process per connection", you really aren't making *as efficient of use* of a multi-CPU environment ... how many queries spend a good deal of time sitting in an I/O wait state because it has to wait untli all the data is read from the drive before it can start processing? Going to a large database/application, on a Quad+ server, where you don't have *alot* of queries happening, but those that do are *very* large ... that large query is currently stuck on the one CPU while the other 3+ CPUs are sitting idle ... etc, etc ... there is functionality that 'working around' in a non-threaded environment would be more tedious then doing the code clean up itself, and, most likely, not near as efficient as it could be ... The first step has to be taken *sometime*, and best to encourage it while we have ppl around that have the *knowledge* to take it ... god, I can remember when doing the code cleanups to get configure integrated into our build process (there was a time where configure didn't exist) was a tedious process, but how many ppl out there could imagine us without it?
On Thu, 7 Feb 2002 mkscott@sacadia.com wrote: > Making postgres functions thread-safe increases the flexibility of the > codebase. Whether threading connections, sub-queries, increasing > processor utilization, or some other unforseen optimization, having > reentrant and thread-safe code leaves the door open for new ideas. Yes, > writing reenterant code can be restrictive and a little more complex, > but not much, the big work is the upfront cost of porting. I have done > it done it once and gained a great deal on projects that I am working > on. Would be willing to take what you've learnt and work with the current CVS tree towards making her thread-safe? Even small steps regularly taken brings us closer to being able to use even *some* threading in the backend ...
At 04:39 PM 07-02-2002 -0500, mlw wrote: > >There are, AFAIK two reasons to thread PostgreSQL: > >(1) Run the multiple connections in their own thread with the assumption >that this is more efficient for [n] reasons. >(2) Run a single query across multiple threads, thus parallelizing the >query engine. > >There is a mutant of this as well: (1a) You could have multiple >processes each with [n] connection threads. > >As far as PostgreSQL is concerned, I am dubious that (1) or (1a) will >provide any real benefit for the amount of work required to accomplish >it. Work on "pre-forking" would be FAR more productive. > >The idea of parallelizing queries could be very worth while. However, >that being said, creating a set of I/O threads that get blocks from disk >devices asynchronously, my be enough with a very limited amount of work. 2) seems to be the only good argument for threads so far. 1) may only be true on certain O/Ses. That said, are those large single queries typically CPU bound or IO bound or neither? If they are IO bound then given my limited understanding it is not easy to see how spreading the query over additional CPUs is going to help. I suggest that work on clustering postgresql may result in a more scalable general solution than threaded postgresql. Looks to be more difficult, but the benefits seem more tangible. Cheerio, Link.
On Thu, 7 Feb 2002, Marc G. Fournier wrote: > > Would be willing to take what you've learnt and work with the current CVS > tree towards making her thread-safe? Even small steps regularly taken > brings us closer to being able to use even *some* threading in the backend > ... > I can definitely take a stab aat it. Maybe I can make a test case with some globals that are accessed often submit some patches to see what people think. Can I send them to you? Myron mkscott@sacadia.com
> I can definitely take a stab aat it. Maybe I can make a test case with > some globals that are accessed often submit some patches to see what > people think. Can I send them to you? Maybe we should assign someone (or a team) to be the 'thread strike force'. Their job is to (at their leisure) tidy up various parts of the source code in such a way that they should not affect other parts. This should be done during the release cycle, so there is plenty of time to test their changes. Then, once the whole source tree has had its stylistic improvements, it would become easier to switch to a threaded/mpm model... Chris
On Thu, 7 Feb 2002 mkscott@sacadia.com wrote: > > > > On Thu, 7 Feb 2002, Marc G. Fournier wrote: > > > > > Would be willing to take what you've learnt and work with the current CVS > > tree towards making her thread-safe? Even small steps regularly taken > > brings us closer to being able to use even *some* threading in the backend > > ... > > > > I can definitely take a stab aat it. Maybe I can make a test case with > some globals that are accessed often submit some patches to see what > people think. Can I send them to you? Send them through to pgsql-patches@postgresql.org ... since we are right at the start of the development cycle for v7.3, things should be alot easier ... pretty much expect to send them in, have them reviewed and commented upon by various developers as to how this shold be done this way, and that shouldn't be done this way and have to re-submit ... :)
On Fri, 8 Feb 2002, Christopher Kings-Lynne wrote: > > I can definitely take a stab aat it. Maybe I can make a test case with > > some globals that are accessed often submit some patches to see what > > people think. Can I send them to you? > > Maybe we should assign someone (or a team) to be the 'thread strike force'. > Their job is to (at their leisure) tidy up various parts of the source code > in such a way that they should not affect other parts. This should be done > during the release cycle, so there is plenty of time to test their changes. > > Then, once the whole source tree has had its stylistic improvements, it > would become easier to switch to a threaded/mpm model... Woo hoo, he caught up with the thread *grin* *poke* Yes, this is exactly what we've been discussing, while some have been trying to tangent off onto side threads ...
"Marc G. Fournier" wrote: > <snip> > > Woo hoo, he caught up with the thread *grin* *poke* > > Yes, this is exactly what we've been discussing, while some have been > trying to tangent off onto side threads ... I feel this would benefit from some kind of PostgreSQL specific guide for new coders to follow. Doesn't have to be overdone, but it should at least give people an idea of what stuff to keep in mind when coding. ??? Regards and best wishes, Justin Clift -- "My grandfather once told me that there are two kinds of people: those who work and those who take the credit. He told me to try to be in the first group; there was less competition there." - Indira Gandhi
<mkscott@sacadia.com> writes: > I can definitely take a stab aat it. Maybe I can make a test case with > some globals that are accessed often submit some patches to see what > people think. Can I send them to you? I have a sneaking feeling that what you are going to come up with is a multi-megabyte patch to convert CurrentMemoryContext into a non-global, which will require changing the parameter list of damn near every routine in the backend. Personally I will vote for rejecting such a patch, as it will uglify the code (and break nearly all existing user-written extension functions) far more than is justified by what it accomplishes: exactly zero, in terms of near-term usefulness. I think what's more interesting to discuss at this stage is the considerations I alluded to before: what are we going to do with the caches and other potentially-sharable datastructures? Without a credible design for those issues, there is no point in sweating the small-but-annoying stuff. regards, tom lane
On Fri, Feb 08, 2002 at 11:17:51AM -0500, Tom Lane wrote: > <mkscott@sacadia.com> writes: > > I can definitely take a stab aat it. Maybe I can make a test case with > > some globals that are accessed often submit some patches to see what > > people think. Can I send them to you? > > I have a sneaking feeling that what you are going to come up with is a > multi-megabyte patch to convert CurrentMemoryContext into a non-global, > which will require changing the parameter list of damn near every > routine in the backend. Sorry I not too careful watch this discussion, but if I see thatyou are talking about PostgreSQL memory management and threadsI have have a note. I and Dan Horak one year work on Mape project (http://mape.jcu.cz) and we already have ported good postgres memory managementinto thread daemon. It works very well and it's transparend solution -- you not must rewrite routines that useMamoryContextSwitchTo or palloc() and other stuff, because everything is based on thread-specific contexts (see man aboutpthread_key_create). With this solution you not must change to muchthings in current code. Karel -- Karel Zak <zakkr@zf.jcu.cz>http://home.zf.jcu.cz/~zakkr/C, PostgreSQL, PHP, WWW, http://docs.linux.cz, http://mape.jcu.cz
On Fri, 8 Feb 2002, Tom Lane wrote: > I have a sneaking feeling that what you are going to come up with is a > multi-megabyte patch to convert CurrentMemoryContext into a non-global, > which will require changing the parameter list of damn near every > routine in the backend. While working with 7.0.2, I changed the call signature on only about 10 functions. In the MemoryContext example, MemorycontextSwitchTo(<Any>MemoryContext) turned into MemoryContextSwitchTo(GetEnv()-><Any>MemoryContext). You may be able to do this with a #define. While profiling the code, this actually had very little impact on CPU resources. There were some hotspots where it made more sense to pass the global environment to the function but the list is small. > > Personally I will vote for rejecting such a patch, as it will uglify the > code (and break nearly all existing user-written extension functions) > far more than is justified by what it accomplishes: exactly zero, in > terms of near-term usefulness. I don't think that user functions need be broken. As long as they use palloc, a recompile may be all that is needed. > > I think what's more interesting to discuss at this stage is the > considerations I alluded to before: what are we going to do with the > caches and other potentially-sharable datastructures? Without a > credible design for those issues, there is no point in sweating the > small-but-annoying stuff. As far as caches go, I punted on sharing. Controlling access to the cache hash tables looked like alot of work and I thought the contention for this resource would be high. So I had each thread build separate cache structures. The one difference was I had the original cache build occur from memory rather than the file pg_internal.init. So when the first thread for a particular db is built, the cache structures are built in system memory and copied into the appropriate MemoryContext. Each subsequent cache for the db is copied from main memory at thread build. One place where sharing worked great was the file manager. I modified md.c to share Vfd's. I made the maximum number of threads that could share one Vfd configurable so that the number of Vfds created and the contention to those Vfd's could be balanced. It seems obvious to me that we need to thread slowly and softly into this area so I promise I will not to spend a ton of time mangling the whole CVS tree, that most definitely, would be a waste of everybody's time. I think I can find an example area that will be a small patch and submit it for review. Hopefully this can get the ball rolling. Myron mkscott@sacadia.com
<mkscott@sacadia.com> writes: > On Fri, 8 Feb 2002, Tom Lane wrote: >> I have a sneaking feeling that what you are going to come up with is a >> multi-megabyte patch to convert CurrentMemoryContext into a non-global, >> which will require changing the parameter list of damn near every >> routine in the backend. > While working with 7.0.2, I changed the call signature on only about 10 > functions. In the MemoryContext example, > MemorycontextSwitchTo(<Any>MemoryContext) turned into > MemoryContextSwitchTo(GetEnv()-><Any>MemoryContext). You may be able > to do this with a #define. Oh, I see. Okay, if we can hide the messiness inside #define's then it might not be as bad as I was expecting. That'd also allow the overhead to be compiled away when we didn't need/want thread support, which'd be even nicer. regards, tom lane