Thread: Parallel threads in query
Hi,
-- I've tried porting some of PostGIS algorithms to utilize multiple cores via OpenMP to return faster.
Question is, what's the best policy to allocate cores so we can play nice with rest of postgres?
What I'd like to see is some function that I can call and get a number of threads I'm allowed to run, that will also advise rest of postgres to not use them, and a function to return the cores back (or do it automatically at the end of query). Is there an infrastructure for that?
Darafei Praliaskouski
Support me: http://patreon.com/komzpa
Support me: http://patreon.com/komzpa
=?UTF-8?Q?Darafei_=22Kom=D1=8Fpa=22_Praliaskouski?= <me@komzpa.net> writes: > Question is, what's the best policy to allocate cores so we can play nice > with rest of postgres? > What I'd like to see is some function that I can call and get a number of > threads I'm allowed to run, that will also advise rest of postgres to not > use them, and a function to return the cores back (or do it automatically > at the end of query). Is there an infrastructure for that? There is not, because we do not use or support multiple threads inside a Postgres backend, and have no intention of doing so any time soon. There is a huge amount of non-thread-safe infrastructure there, and if you call any of it from other than the main thread, bad things will happen. You might be able to make this work if the threaded stuff is kept rigidly separate from any core Postgres code, but it'll be error-prone. regards, tom lane
On Wed, Oct 31, 2018 at 09:07:43AM -1000, Darafei "Komяpa" Praliaskouski wrote: > Hi, > > I've tried porting some of PostGIS algorithms to utilize multiple cores via > OpenMP to return faster. Great! > Question is, what's the best policy to allocate cores so we can play nice > with rest of postgres? As Tom mentioned, the PostgreSQL backend is not yet threaded and is not likely to be any time in the next couple of years. There has been at least one attempt recently to start this work. https://www.postgresql.org/message-id/9defcb14-a918-13fe-4b80-a0b02ff85527%40postgrespro.ru > What I'd like to see is some function that I can call and get a > number of threads I'm allowed to run, that will also advise rest of > postgres to not use them, and a function to return the cores back > (or do it automatically at the end of query). Is there an > infrastructure for that? Not really, as above. In the case of GIS queries, you may be able to send a large-enough payload of work to make it worthwhile to do some kind of IPC or even a (short, high-bandwidth, low-latency) network hop to communicate with a separate OpenMP GIS server. So long as the threads don't directly interact with the backend, you could do this safely. Best, David. -- David Fetter <david(at)fetter(dot)org> http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On 31.10.2018 22:07, Darafei "Komяpa" Praliaskouski wrote: > Hi, > > I've tried porting some of PostGIS algorithms to utilize multiple > cores via OpenMP to return faster. > > Question is, what's the best policy to allocate cores so we can play > nice with rest of postgres? > > What I'd like to see is some function that I can call and get a number > of threads I'm allowed to run, that will also advise rest of postgres > to not use them, and a function to return the cores back (or do it > automatically at the end of query). Is there an infrastructure for that? I do not completely understand which PostGIS algorithms you are going to make parallel. So may be you should first clarify it. There are three options to perform parallel execution of the single query in Postgres now: 1. Use existed Postgres parallel capabilities. For example if there is some expensive function f() which you are going to execute concurrently, then you do not need to do anything: parallel seq scan will do it for you. You can configure arbitrary number of parallel workers and so control level of concurrency. The restriction of the current Postgres parallel query processing implementation is that - parallel workers are started for each query - it is necessary to serialize and pass to parallel workers a lot of things from coordinator - in case of seqscan, workers will compete for pages to scan, so effective number of workers should be < 10, while most powerful modern servers have hundreds of COU cores. 2. Implement you own parallel plan nodes using existed Postgres parallel infrastructure. Such approach has most chances to be committed in Postgres core. But disadvantages are mostly the same as in 1) Exchange of data between different process is much more complex and expensive than access to common memory in case of threads. Mostly likely you will have to use shared message queue and dynamic shared memory, implemented in Postgres specially for interaction of parallel workers . 3. Use multithreading to provide concurrent execution of your particular algorithm (s[awn threads within backend). You should be very careful with this approach, because Postgres code is not thread safe. So you should not try to execute in thread any subplan or call any postgres functions (unless you are 100% sure that them are thread safe). This approach may be easy to implement and provide better performance than 1). But please notice its limitations. I have used such approach in my IMCS extension (In-Memory-Columnar-Store). You can look at pg_strom extension as an example of providing parallel query execution (in this case using parallel capabilities of video cards). -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On Wed, Oct 31, 2018 at 2:11 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Darafei "Komяpa" Praliaskouski <me@komzpa.net> writes:
> Question is, what's the best policy to allocate cores so we can play nice
> with rest of postgres?
There is not, because we do not use or support multiple threads inside
a Postgres backend, and have no intention of doing so any time soon.
As a practical matter though, if we're multi-threading a heavy PostGIS function, presumably simply grabbing *every* core is not a recommended or friendly practice. My finger-in-the-wind guess would be that the value of max_parallel_workers_per_gather would be the most reasonable value to use to limit the number of cores a parallel PostGIS function should use. Does that make sense?
P
On 2018-11-01 10:10:33 -0700, Paul Ramsey wrote: > On Wed, Oct 31, 2018 at 2:11 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > =?UTF-8?Q?Darafei_=22Kom=D1=8Fpa=22_Praliaskouski?= <me@komzpa.net> > > writes: > > > Question is, what's the best policy to allocate cores so we can play nice > > > with rest of postgres? > > > > > > There is not, because we do not use or support multiple threads inside > > a Postgres backend, and have no intention of doing so any time soon. > > > > As a practical matter though, if we're multi-threading a heavy PostGIS > function, presumably simply grabbing *every* core is not a recommended or > friendly practice. My finger-in-the-wind guess would be that the value > of max_parallel_workers_per_gather would be the most reasonable value to > use to limit the number of cores a parallel PostGIS function should use. > Does that make sense? I'm not sure that's a good approximation. Postgres' infrastructure prevents every query from using max_parallel_workers_per_gather processes due to the global max_worker_processes limit. I think you probably would want something very very roughly approximating a global limit - otherwise you'll either need to set the per-process limit way too low, or overwhelm machines with context switches. Greetings, Andres Freund
On 11/01/2018 06:15 PM, Andres Freund wrote: > On 2018-11-01 10:10:33 -0700, Paul Ramsey wrote: >> On Wed, Oct 31, 2018 at 2:11 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> >>> =?UTF-8?Q?Darafei_=22Kom=D1=8Fpa=22_Praliaskouski?= <me@komzpa.net> >>> writes: >>>> Question is, what's the best policy to allocate cores so we can play nice >>>> with rest of postgres? >>> >> >> >>> There is not, because we do not use or support multiple threads inside >>> a Postgres backend, and have no intention of doing so any time soon. >>> >> >> As a practical matter though, if we're multi-threading a heavy PostGIS >> function, presumably simply grabbing *every* core is not a recommended or >> friendly practice. My finger-in-the-wind guess would be that the value >> of max_parallel_workers_per_gather would be the most reasonable value to >> use to limit the number of cores a parallel PostGIS function should use. >> Does that make sense? > > I'm not sure that's a good approximation. Postgres' infrastructure > prevents every query from using max_parallel_workers_per_gather > processes due to the global max_worker_processes limit. I think you > probably would want something very very roughly approximating a global > limit - otherwise you'll either need to set the per-process limit way > too low, or overwhelm machines with context switches. > Yeah. Without a global limit it would be fairly trivial to create way too many threads - say when a query gets parallelized, and each worker creates a bunch of private threads. And then a bunch of such queries executed concurrently, and it's getting bad pretty fast. In theory, simulating such global limit should be possible using a bit of shared memory for the current total, per-process counter and probably some simple abort handling (say, just like contrib/openssl does using ResourceOwner). A better solution might be to start a bgworker managing a connection pool and forward the requests to it using IPC (and enforce the thread count limit there). regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi, On 2018-11-01 19:33:39 +0100, Tomas Vondra wrote: > In theory, simulating such global limit should be possible using a bit > of shared memory for the current total, per-process counter and probably > some simple abort handling (say, just like contrib/openssl does using > ResourceOwner). Right. I don't think you even need something resowner like, given that anything using threads better make it absolutely absolutely impossible that an error can escape. > A better solution might be to start a bgworker managing a connection > pool and forward the requests to it using IPC (and enforce the thread > count limit there). That doesn't really seem feasible for cases like this - after all, you'd only use threads to work on individual rows if you wanted to parallelize relatively granular per-row work or such. Adding cross-process IPC seems like it'd make that perform badly. Greetings, Andres Freund
In theory, simulating such global limit should be possible using a bit
of shared memory for the current total, per-process counter and probably
some simple abort handling (say, just like contrib/openssl does using
ResourceOwner).
I would expect that this limit is already available and it's parallel worker limit. Basically, when start a new thread I would like to somehow consume a part of parallel worker limit - a thread is a kind of parallel worker, from user's perspective. If I have 4 cores and Postgres already started 4 parallel workers, I don't really want to start 4 threads for each of them, or 4 for one of them and 1 for each of the rest, if I manage that separately from parallel worker limit.
IPC and co - that's another question and out of scope for this one. Since OpenMP allows to write multithreaded code by just adding more #pragma around loops, I don't want to reinvent that part of infrastructure.
Darafei Praliaskouski
Support me: http://patreon.com/komzpa
Support me: http://patreon.com/komzpa
On 11/01/2018 07:40 PM, Andres Freund wrote: > Hi, > > On 2018-11-01 19:33:39 +0100, Tomas Vondra wrote: >> In theory, simulating such global limit should be possible using a bit >> of shared memory for the current total, per-process counter and probably >> some simple abort handling (say, just like contrib/openssl does using >> ResourceOwner). > > Right. I don't think you even need something resowner like, given that > anything using threads better make it absolutely absolutely impossible > that an error can escape. > True. Still, I wonder if the process can die in a way that would fail to update the counter. > >> A better solution might be to start a bgworker managing a connection >> pool and forward the requests to it using IPC (and enforce the thread >> count limit there). > > That doesn't really seem feasible for cases like this - after all, you'd > only use threads to work on individual rows if you wanted to parallelize > relatively granular per-row work or such. Adding cross-process IPC seems > like it'd make that perform badly. > I think that very much depends on how expensive the tasks handled by the threads are. It may still be cheaper than a reasonable IPC, and if you don't create/destroy threads, that also saves quite a bit of time. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 11/01/2018 07:43 PM, Darafei "Komяpa" Praliaskouski wrote: > In theory, simulating such global limit should be possible using a bit > of shared memory for the current total, per-process counter and probably > some simple abort handling (say, just like contrib/openssl does using > ResourceOwner). > > > I would expect that this limit is already available and it's parallel > worker limit. Basically, when start a new thread I would like to somehow > consume a part of parallel worker limit - a thread is a kind of parallel > worker, from user's perspective. If I have 4 cores and Postgres already > started 4 parallel workers, I don't really want to start 4 threads for > each of them, or 4 for one of them and 1 for each of the rest, if I > manage that separately from parallel worker limit. > Well, PostgreSQL does that, but only for the process-based parallelism. It has no idea about threads, so it can't work out of the box. Also, the max_worker_processes limit determines various shared memory we need to manage those processes, so it's really not about threads. If you need something like that for threads, feel free to do that, but I'd strongly suggest using a separate counter (perhaps using a m_w_p as an initial value). > IPC and co - that's another question and out of scope for this one. > Since OpenMP allows to write multithreaded code by just adding more > #pragma around loops, I don't want to reinvent that part of infrastructure. Maybe. I don't know OpenMP that well, so I can't really safe if that's a good idea or not. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi, On 2018-11-01 19:44:54 +0100, Tomas Vondra wrote: > On 11/01/2018 07:40 PM, Andres Freund wrote: > > On 2018-11-01 19:33:39 +0100, Tomas Vondra wrote: > >> In theory, simulating such global limit should be possible using a bit > >> of shared memory for the current total, per-process counter and probably > >> some simple abort handling (say, just like contrib/openssl does using > >> ResourceOwner). > > > > Right. I don't think you even need something resowner like, given that > > anything using threads better make it absolutely absolutely impossible > > that an error can escape. > > > > True. Still, I wonder if the process can die in a way that would fail to > update the counter. You'd better make that case a panic restart. > >> A better solution might be to start a bgworker managing a connection > >> pool and forward the requests to it using IPC (and enforce the thread > >> count limit there). > > > > That doesn't really seem feasible for cases like this - after all, you'd > > only use threads to work on individual rows if you wanted to parallelize > > relatively granular per-row work or such. Adding cross-process IPC seems > > like it'd make that perform badly. > > > > I think that very much depends on how expensive the tasks handled by the > threads are. It may still be cheaper than a reasonable IPC, and if you > don't create/destroy threads, that also saves quite a bit of time. I'm not following. How can you have a pool *and* threads? Those seem to be contradictory in PG's architecture? You need full blown IPC with your proposal afaict? Greetings, Andres Freund
On 11/01/2018 07:50 PM, Andres Freund wrote: > Hi, > > On 2018-11-01 19:44:54 +0100, Tomas Vondra wrote: >> On 11/01/2018 07:40 PM, Andres Freund wrote: >>> On 2018-11-01 19:33:39 +0100, Tomas Vondra wrote: >>>> In theory, simulating such global limit should be possible using a bit >>>> of shared memory for the current total, per-process counter and probably >>>> some simple abort handling (say, just like contrib/openssl does using >>>> ResourceOwner). >>> >>> Right. I don't think you even need something resowner like, given that >>> anything using threads better make it absolutely absolutely impossible >>> that an error can escape. >>> >> >> True. Still, I wonder if the process can die in a way that would fail to >> update the counter. > > You'd better make that case a panic restart. > > >>>> A better solution might be to start a bgworker managing a connection >>>> pool and forward the requests to it using IPC (and enforce the thread >>>> count limit there). >>> >>> That doesn't really seem feasible for cases like this - after all, you'd >>> only use threads to work on individual rows if you wanted to parallelize >>> relatively granular per-row work or such. Adding cross-process IPC seems >>> like it'd make that perform badly. >>> >> >> I think that very much depends on how expensive the tasks handled by the >> threads are. It may still be cheaper than a reasonable IPC, and if you >> don't create/destroy threads, that also saves quite a bit of time. > > I'm not following. How can you have a pool *and* threads? Those seem to > be contradictory in PG's architecture? You need full blown IPC with your > proposal afaict? > My suggestion was to create a bgworker, which would then internally allocate and manage a pool of threads. It could then open some sort of IPC (say, as dumb as unix socket). The backends could could then send requests to it, and it would respond to them. Not sure why/how would this contradict PG's architecture? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2018-11-01 19:57:17 +0100, Tomas Vondra wrote: > >> I think that very much depends on how expensive the tasks handled by the > >> threads are. It may still be cheaper than a reasonable IPC, and if you > >> don't create/destroy threads, that also saves quite a bit of time. > > > > I'm not following. How can you have a pool *and* threads? Those seem to > > be contradictory in PG's architecture? You need full blown IPC with your > > proposal afaict? > > > > My suggestion was to create a bgworker, which would then internally > allocate and manage a pool of threads. It could then open some sort of > IPC (say, as dumb as unix socket). The backends could could then send > requests to it, and it would respond to them. Not sure why/how would > this contradict PG's architecture? Because you said "faster than reasonable IPC" - which to me implies that you don't do full blown IPC. Which using threads in a bgworker is very strongly implying. What you're proposing strongly implies multiple context switches just to process a few results. Even before, but especially after, spectre that's an expensive proposition. Greetings, Andres Freund
Because you said "faster than reasonable IPC" - which to me implies that
you don't do full blown IPC. Which using threads in a bgworker is very
strongly implying. What you're proposing strongly implies multiple
context switches just to process a few results. Even before, but
especially after, spectre that's an expensive proposition.
To have some idea of what it could be:
a)
PostGIS has ST_ClusterKMeans window function. It collects all geometries passed to it to memory, re-packs to more compact buffer and starts a loop that goes over it several (let's say 10..100) times. Then it spits out all the assigned cluster numbers for each of the input rows.
It's all great when you need to calculate KMeans of 200-50000 rows, but for a million input rows even a hundred passes on a single core are painful.
b)
PostGIS has ST_Subdivide function. It takes a single row of geometry (usually super-large, like a continent or the wholeness of Russia) and splits it into many rows that have more simple shape, by performing a horizontal or vertical split recursively. Since it's a tree traversal, it can be paralleled efficiently, with one task becoming to follow the right subpart of geometry and other - to follow left part of it.
b)
PostGIS has ST_Subdivide function. It takes a single row of geometry (usually super-large, like a continent or the wholeness of Russia) and splits it into many rows that have more simple shape, by performing a horizontal or vertical split recursively. Since it's a tree traversal, it can be paralleled efficiently, with one task becoming to follow the right subpart of geometry and other - to follow left part of it.
Both seem to be a standard thing for OpenMP, which has compiler support in GCC and clang and MSVC. For an overview how it work, have a look here:
So, do I understand correctly that I need to start a parallel worker that does nothing for each thread I launch to consume the parallel worker limit?
Darafei Praliaskouski
Support me: http://patreon.com/komzpa
Support me: http://patreon.com/komzpa
Hi, On 2018-11-01 09:17:56 -1000, Darafei "Komяpa" Praliaskouski wrote: > So, do I understand correctly that I need to start a parallel worker that > does nothing for each thread I launch to consume the parallel worker limit? No, I don't think that'd be reasonable. I think what we're saying is that there's no way to reasonably use the parallel worker limit as the limitation for what you're trying to do. You need custom infrastructure. Greetings, Andres Freund
On 11/01/2018 08:03 PM, Andres Freund wrote: > On 2018-11-01 19:57:17 +0100, Tomas Vondra wrote: >>>> I think that very much depends on how expensive the tasks handled by the >>>> threads are. It may still be cheaper than a reasonable IPC, and if you >>>> don't create/destroy threads, that also saves quite a bit of time. >>> >>> I'm not following. How can you have a pool *and* threads? Those seem to >>> be contradictory in PG's architecture? You need full blown IPC with your >>> proposal afaict? >>> >> >> My suggestion was to create a bgworker, which would then internally >> allocate and manage a pool of threads. It could then open some sort of >> IPC (say, as dumb as unix socket). The backends could could then send >> requests to it, and it would respond to them. Not sure why/how would >> this contradict PG's architecture? > > Because you said "faster than reasonable IPC" - which to me implies that > you don't do full blown IPC. Which using threads in a bgworker is very > strongly implying. What you're proposing strongly implies multiple > context switches just to process a few results. Even before, but > especially after, spectre that's an expensive proposition. > Gah! I meant to wrote "faster with reasonable IPC" - i.e. faster/cheaper than a solution that would create threads ad-hoc. My assumption is that the tasks are fairly large, and may take quite a bit of time to process (say, a couple of seconds?). In which cese the the extra context switches are not a major issue. But maybe I'm wrong. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services