Re: Let's make PostgreSQL multi-threaded - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Let's make PostgreSQL multi-threaded |
Date | |
Msg-id | CA+TgmoZKrgkd+jEbRRpOYoG14Ue9GLWTH2kKH_Yhac3s6Ofemg@mail.gmail.com Whole thread Raw |
In response to | Re: Let's make PostgreSQL multi-threaded (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Let's make PostgreSQL multi-threaded
Re: Let's make PostgreSQL multi-threaded Re: Let's make PostgreSQL multi-threaded |
List | pgsql-hackers |
On Tue, Jun 6, 2023 at 10:02 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > I agree that if we were building this system from scratch today, > we'd probably choose thread-per-session not process-per-session. > But the costs of getting to that from where we are will be enormous. > I seriously doubt that the net benefits could justify that work, > no matter how long you want to look forward. It's not really > significantly different from "let's rewrite the server in > C++/Rust/$latest_hotness". Well, I don't know, I think that's a bunch of things that are not all the same. Rewriting the server in a whole different programming language would be a massive effort. I can't really see anyone volunteering to rewrite a million lines of C (or whatever we've got) in Rust, and I'm not sure who would use the result if they did, or why. We could, perhaps, allow new source files to be written in Rust while keeping old ones written in C, but then every hacker has to know two languages, and having code written in both languages manipulating the same data structures would probably be a recipe for confusion and bugs. It's hard to believe that the upsides would be worth the pain. Maybe transition to C++ would be easier, or maybe it wouldn't, I'm not sure. But from my point of the view, the issue here is simply that stop-the-world-and-change-everything is not a viable way forward for a project the size of PostgreSQL, but incremental changes are potentially acceptable if the benefits outweigh the drawbacks. So what are the costs, exactly, of transition to a threaded model? It seems to me that there's basically one problem: global variables. Sure, there's a bunch of stuff around process management that would likely have to be revised in some way, but that's not that much code and wouldn't have that much impact on unrelated development. However, the project's widespread and often gratuitous use of global variables would have to be addressed in some way, and I think that will pretty much inevitably involve touching all of those global variable declarations in some way. Now, if we can get away with simply marking all of those thread-local, then it's of the same general flavor as PGDLLIMPORT. I am aware that you think that PGDLLIMPORT markings are ugly as sin, and these would be more widespread since they'd have to be applied to literally every global variable, including file-local ones. However, it's hard to imagine that adding such markings would cause PostgreSQL development to grind to a halt. It would cause minor rebasing pain and that's about it. I hope that we'd have some tool that would make the build fail if any markings are missing and everybody would be annoyed until they finished rebasing all of their WIP patches and then that would just be how things are. It's not *lovely* but it doesn't sound that bad either. In my mind, the bigger question is how much further than that do you have to go? I think I remember a previous conversation with Andres where he opined that thread-local variables are "really expensive" (and I apologize in advance if I'm mis-remembering this). Now, Andres is not a man who accepts a tax on performance of any size without a fight, so his "really expensive" might turn out to resemble my "pretty cheap." However, if widespread use of TLS is too expensive and we have to start rewriting code to not depend on global variables, that's going to be more of a problem. If we can get by with doing such rewrites only in performance-critical places, it might not still be too bad. Personally, I think the degree of dependence that PostgreSQL has on global variables is pretty excessive and I don't think that a certain amount of refactoring to reduce it would be a bad thing. If it turns into an infinite series of hastily-written patches to rejigger every source file we have, though, then I'm not really on board with that. Heikki mentions the idea of having a central Session object and just passing that around. I have a hard time believing that's going to work out nicely. First, it's not extensible. Right now, if you need a bit of additional session-local state, you just declare a variable and you're all set. That's not a perfect system and does cause some problems, but we can't go from there to a system where it's impossible to add session-local state without hacking core. Second, we will be sad if session.h ends up #including every other header file that defines a data structure anywhere in the backend. Or at least I'll be sad. I'm not actually against the idea of having some kind of session object that we pass around, but I think it either needs to be limited to a relatively small set of well-defined things, or else it needs to be design in some kind of extensible way that doesn't require it to know the full details of every sort of object that's being used as session-local state anywhere in the system. I haven't really seen any convincing design ideas around this yet. But I think jumping to the conclusion that the migration path here is akin to rewriting the whole code base in Rust is jumping too far. I do see some problems here that I don't know how to solve, but that's nowhere near in the same category as find . -name '*.c' -exec rm {} \; -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-hackers by date: