Thread: getting rid of "thread fork emulation" in pgbench?
Hello hackers, This is take 2, I already suggested this 2 years ago... There is a "thread fork emulation" hack which allows pgbench to be compiled without threads by emulating them with forks, obviously with limited capabilities. This feature is triggered by configuring postgresql with --disable-thread-safety. I'm not aware of platforms which would still benefit from this feature (we are probably talking of a PostgreSQL 9.6 maybe released in 2016), and this thread emulation is a real pain for maintaining and extending it: some features cannot be implemented, or must be implemented twice, or must be implemented in a heavy way because it cannot be assumed to be in a threaded implementation. My question is: would someone still object strongly to the removal of this "thread fork emulation" hack in pgbench? That would mean arguing that it is more important than a simpler code base for pgbench, which would help both its maintenance and extension, and must be kept despite these costs, hence the "strongly". Tom's opinion, 2 years ago, was to object strongly: http://www.postgresql.org/message-id/24846.1372352618@sss.pgh.pa.us "I would object strongly to that, as it would represent a significant movement of the goalposts on what is required to build Postgres at all, ie platforms on which --enable-thread-safety is unavailable or expensive would be out in the cold. Perhaps that set is approaching empty, but a project that's still standardized on C89 has little business making such a choice IMO." However, about not providing a full feature under thread emulation: "Perhaps that's worth doing. I agree with Fabien that full support of this feature in the process model is more trouble than it's worth, though, and I wouldn't scream loudly if we just didn't support it. --disable-thread-safety doesn't have to be entirely penalty-free." Robert Haas did also object strongly: http://www.postgresql.org/message-id/CA+TgmoaJawAM0A4N1RnsndFhTOYYbwnt+7N7XWLcW=n-uDC79Q@mail.gmail.com "I don't believe that to be an acceptable restriction. We generally require features to work on all platforms we support. We have made occasional compromises there, but generally only when the restriction is fundamental to the platform rather than for developer convenience." So the underlying question is, does Tom and Robert's opinion have changed from 2 years ago? If not, I would really like to know at least *one* current platform for which --disable-thread-safety is required, just to know why I waste time over such things. -- Fabien.
Fabien COELHO <coelho@cri.ensmp.fr> writes: > There is a "thread fork emulation" hack which allows pgbench to be > compiled without threads by emulating them with forks, obviously with > limited capabilities. This feature is triggered by configuring postgresql > with --disable-thread-safety. > I'm not aware of platforms which would still benefit from this feature (we > are probably talking of a PostgreSQL 9.6 maybe released in 2016), and this > thread emulation is a real pain for maintaining and extending it: some > features cannot be implemented, or must be implemented twice, or must be > implemented in a heavy way because it cannot be assumed to be in a > threaded implementation. > My question is: would someone still object strongly to the removal of this > "thread fork emulation" hack in pgbench? By my count, the pthread-emulation code amounts to about 175 lines out of the nearly 4000 in pgbench.c. That's not an undue burden IMO (it's not, for comparison's sake, very much more than the amount of WIN32-specific code in that file). And what will you do instead? It would be fine I think for pgbench to not allow --jobs different from 1 on a threadless platform, but not for it to fail to work altogether. It looks to me like allowing it to compile without that code would take nearly as much effort/mess as what's there now. > So the underlying question is, does Tom and Robert's opinion have changed > from 2 years ago? Not mine. regards, tom lane
Hello Tom, >> My question is: would someone still object strongly to the removal of this >> "thread fork emulation" hack in pgbench? > > By my count, the pthread-emulation code amounts to about 175 lines out of > the nearly 4000 in pgbench.c. That's not an undue burden IMO (it's not, > for comparison's sake, very much more than the amount of WIN32-specific > code in that file). From my point of view, the burden is how things can be implemented if you have to assume "no threads". In the discussion at hand which motivates this renewed request, the question is whether external merge sorts implying rescanning and reprinting files over and over should be implemented in pgbench so as to recombine log files. Now if we have threads, it is simpler (if the overhead is reasonable) that threads share a file handle and just generate one file, so there is no merging. That is not possible with processes at least for the aggregated logs because the thread data must be combined, or would require that pgbench use shared memory and so on, but then it is really a lot of bother for a limited feature. The alternative is that the feature would not be available if no thread, but then people object to that, on principle. > And what will you do instead? It would be fine I think for pgbench to > not allow --jobs different from 1 on a threadless platform, but not for > it to fail to work altogether. Sure. No thread really means working with only one thread. > It looks to me like allowing it to compile without that code would take > nearly as much effort/mess as what's there now. My motivation is to simplify how things are done by simply assuming that threads are available and can share data, esp for counters. >> So the underlying question is, does Tom and Robert's opinion have changed >> from 2 years ago? > > Not mine. Ok! I'll ask again in 2017, and probably again in 2019:-) -- Fabien.
Fabien COELHO <coelho@cri.ensmp.fr> writes: >> And what will you do instead? It would be fine I think for pgbench to >> not allow --jobs different from 1 on a threadless platform, but not for >> it to fail to work altogether. > Sure. No thread really means working with only one thread. >> It looks to me like allowing it to compile without that code would take >> nearly as much effort/mess as what's there now. > My motivation is to simplify how things are done by simply assuming that > threads are available and can share data, esp for counters. pgbench does not get to "assume that threads are available", at least not as long as the project as a whole supports --disable-thread-safety. As I said, I wouldn't have a problem with restricting the --jobs setting on threadless platforms, which seems like it would fix your problem since you wouldn't need to have more than one process involved. regards, tom lane
> As I said, I wouldn't have a problem with restricting the --jobs setting > on threadless platforms, which seems like it would fix your problem since > you wouldn't need to have more than one process involved. Indeed! So I undestand that "thread fork emulation" could eventually go from your point of view, provided that pgbench compiles and runs with one job. I'll try to submit a patch to that effect, maybe to June CF. -- Fabien.
On Sun, Mar 29, 2015 at 1:49 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > Now if we have threads, it is simpler (if the overhead is reasonable) that > threads share a file handle and just generate one file, so there is no > merging. I really, really wish you'd stop arguing against the patch to allow merging of pgbench logs in this basis. There may or may not be other reasons to reject that patch, but arguing that we can or should reject it because of the hypothetical possibility that sharing a file handle will work out just isn't right. People who work hard to develop a patch that does something useful don't deserve to have it kicked to the curb because someone can imagine a development path that would eventually obsolete it. That patch deserves to be evaluated on its own merits. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hello Robert, > I really, really wish you'd stop arguing against the patch to allow > merging of pgbench logs in this basis. Hmmm. I'm lost. I thought that discussing how to best implement a feature was part of reviewing a patch. > There may or may not be other reasons to reject that patch, but arguing > that we can or should reject it because of the hypothetical possibility > that sharing a file handle will work out just isn't right. I must say that I do not understand why discussing options, and actually implementing and testing them for hours to see the performance impact, is bad, but it will save time next time. As for pgbench thread forlk emulation induced complexity, I already paid for some stuff I contributed to pgbench, so I feel I'm entitled to rant about it:-) > People who work hard to develop a patch I know. I also spent quite some time on testing alternate implementations on this one. > that does something useful don't deserve to have it kicked to the curb > because someone can imagine a development path that would eventually > obsolete it. That patch deserves to be evaluated on its own merits. Sure. I did that in the first post. The code is heavy and awkward, include copy-pastes, the merge sorts (there are 2 of thems) complexity are bad, the different log formats are hard coded and must be updated in several place if modified or extended, ... IMO this would be best outside of pgbench in a script, but this means that the script must be given indication about the file format which depends on the options used to run pgbench, which makes it awkward as well. So the current status is between "rejected" and "returned with feedback", make your pick. Now I think that the feature provides value and I am trying to help salvage it. I can do other things. -- Fabien.
On Mon, Mar 30, 2015 at 2:00 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: >> I really, really wish you'd stop arguing against the patch to allow >> merging of pgbench logs in this basis. > > Hmmm. I'm lost. I thought that discussing how to best implement a feature > was part of reviewing a patch. Of course it is. But the feature you are talking about and the feature Tomas is talking about are two different features. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company