Re: Parallel threads in query - Mailing list pgsql-hackers

From Darafei "Komяpa" Praliaskouski
Subject Re: Parallel threads in query
Date
Msg-id CAC8Q8tKRMRTBSDqaD5NEsm7HtAX2F7B0YJsZOQt1pFiF8nzOPg@mail.gmail.com
Whole thread Raw
In response to Re: Parallel threads in query  (Andres Freund <andres@anarazel.de>)
Responses Re: Parallel threads in query  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers

Because you said "faster than reasonable IPC" - which to me implies that
you don't do full blown IPC. Which using threads in a bgworker is very
strongly implying. What you're proposing strongly implies multiple
context switches just to process a few results. Even before, but
especially after, spectre that's an expensive proposition.


To have some idea of what it could be:

a)
PostGIS has ST_ClusterKMeans window function. It collects all geometries passed to it to memory, re-packs to more compact buffer and starts a loop that goes over it several (let's say 10..100) times. Then it spits out all the assigned cluster numbers for each of the input rows.

It's all great when you need to calculate KMeans of 200-50000 rows, but for a million input rows even a hundred passes on a single core are painful.

b) 
PostGIS has ST_Subdivide function. It takes a single row of geometry (usually super-large, like a continent or the wholeness of Russia) and splits it into many rows that have more simple shape, by performing a horizontal or vertical split recursively. Since it's a tree traversal, it can be paralleled efficiently, with one task becoming to follow the right subpart of geometry and other - to follow left part of it. 

Both seem to be a standard thing for OpenMP, which has compiler support in GCC and clang and MSVC. For an overview how it work, have a look here:

So, do I understand correctly that I need to start a parallel worker that does nothing for each thread I launch to consume the parallel worker limit?
--
Darafei Praliaskouski
Support me: http://patreon.com/komzpa

pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Doubts about pushing LIMIT to MergeAppendPath
Next
From: Andres Freund
Date:
Subject: Re: Parallel threads in query