Noah Misch <noah@leadboat.com> writes:
> Each worker needs to make SnapshotNow visibility decisions coherent with the
> master. For sorting, this allows us to look up comparison functions, even
> when the current transaction created or modified those functions. This will
> also be an essential building block for any parallelism project that consults
> user tables. Implementing this means copying the subtransaction stack and the
> combocid hash to each worker.
> [ ... and GUC settings, and who knows what else ... ]
This approach seems to me to be likely to guarantee that the startup
overhead for any parallel sort is so large that only fantastically
enormous sorts will come out ahead.
I think you need to think in terms of restricting the problem space
enough so that the worker startup cost can be trimmed to something
reasonable. One obvious suggestion is to forbid the workers from
doing any database access of their own at all --- the parent would
have to do any required catalog lookups for sort functions etc.
before forking the children.
I think we should also seriously think about relying on fork() and
copy-on-write semantics to launch worker subprocesses, instead of
explicitly copying so much state over to them. Yes, this would
foreclose ever having parallel query on Windows, but that's okay
with me (hm, now where did I put my asbestos longjohns ...)
Both of these lines of thought suggest that the workers should *not*
be full-fledged backends.
regards, tom lane