Re: parallel mode and parallel contexts - Mailing list pgsql-hackers

From Andres Freund
Subject Re: parallel mode and parallel contexts
Date
Msg-id 20150208002027.GH9201@alap3.anarazel.de
Whole thread Raw
In response to Re: parallel mode and parallel contexts  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: parallel mode and parallel contexts  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
Hi,


On 2015-02-06 22:43:21 -0500, Robert Haas wrote:
> Here's v4, with that fixed and a few more tweaks.

If you attached files generated with 'git format-patch' I could directly
apply then with the commit message and such. All at once if it's
mutliple patches, as individual commits. On nontrivial patches it's nice
to see the commit message(s) along the diff(s).

Observations:
* Some tailing whitespace in the readme. Very nice otherwise.

* Don't like CreateParallelContextForExtension much as a function name - I don't think we normally equate the fact that
codeis located in a loadable library with the extension mechanism.
 

* StaticAssertStmt(BUFFERALIGN(PARALLEL_ERROR_QUEUE_SIZE) ...) what's that about?

* the plain shm calculations intentionally use mul_size/add_size to deal with overflows. On 32bit it doesn't seem
impossible,but unlikely to overflow size_t.
 

* I'd s/very // i "We might be running in a very short-lived memory context.". Or replace it with "too".

* In +LaunchParallelWorkers, does it make sense trying to start workers if one failed? ISTM that's likely to not be
helpful.I.e. it should just break; after the first failure.
 

* +WaitForParallelWorkersToFinish says that it waits for workers to exit cleanly. To me that's ambiguous. How about
"fully"?

* ParallelMain restores libraries before GUC state. Given that librararies can, and actually somewhat frequently do,
inspectGUCs on load, it seems better to do it the other way round? You argue "We want to do this before restoring GUCs,
becausethe libraries might define custom variables.", but I don't buy that. It's completely normal for namespaced GUCs
tobe present before a library is loaded? Especially as you then go and process_session_preload_libraries() after
settingthe GUCs.
 

* Should ParallelMain maybe enter the parallel context before loading user defined libraries? It's far from absurd to
executecode touching the database on library initialization...
 

* rename ParallelMain to ParallelWorkerMain?

* I think restoring snapshots needs to fudge the worker's PGXACT->xmin to be the minimum of the top transaction id and
thesnapshots. Otherwise if the initiating process dies badly (e.g. because postmaster died) the workers will continue
towork, while other processes may remove things. Also, you don't seem to prohibit popping the active snapshot (should
thatbe prohibitted maybe?) which might bump the initiator's xmin horizon.
 

* Comment about 'signal_handler' in +HandleParallelMessageInterrupt is outdated.

* Is it really a good idea to overlay Z/ReadyForQuery with 'this worker exited cleanly'? Shouldn't it at least be
T/Terminate?I'm generally not sure it's wise to use this faux FE/BE protocol here...
 

* HandlingParallelMessages looks dead.

* ParallelErrorContext has the wrong comment.

* ParallelErrorContext() provides the worker's pid in the context message. I guess that's so it's easier to interpret
whensent to the initiator? It'll look odd when logged by the failing process.
 

* We now have pretty much the same handle_sigterm in a bunch of places. Can't we get rid of those? You actually
probablycan just use die().
 

* The comments in xact.c above XactTopTransactionId pretty much assume that the reader knows that that is about
parallelstuff.
 

* I'm a bit confused by the fact that Start/CommitTransactionCommand() emit a generic elog when encountering a
PARALLEL_INPROGRESS,whereas ReleaseSavepoint()/RollbackTo has a special message. Shouldn't it be pretty much impossible
tohit a ReleaseSavepoint() with a parallel transaction in progress? We'd have had to end the previous transaction
commandwhile parallel stuff was in progress - right? (Internal/ReleaseCurrentSubTransaction is different, that's used
incode)
 

* Why are you deviating from the sorting order used in other places for ParallelCurrentXids? That seems really wierd,
especiallyas we use something else a couple lines down. Especially as you actually seem to send stuff in xidComparator
order?

* I don't think skipping AtEOXact_Namespace() entirely if parallel is a good idea. That function already does some
otherstuff than cleaning up temp tables. I think you should pass in parallel and do the skip in there.
 

* Start/DndParallelWorkerTransaction assert the current state, whereas the rest of the file FATALs in that case. I
thinkit'd actually be good to be conservative and do the same in this case.
 

* You're manually passing function names to PreventCommandIfParallelMode() in a fair number of cases. I'd either try
andkeep the names consistent with what the functions are actually called at the sql level (adding the types in the
parens)or just use PG_FUNCNAME_MACRO to keep them correct.
 

* Wait. You now copy all held relation held "as is" to the standby? I quite doubt that's a good idea, and it's not my
readingof the conclusions made in the group locking thread. At the very least this part needs to be extensively
documented.And while LockAcquireExtended() refers to src/backend/access/transam/README.parallel for details I don't see
anythingpertinent in there. And the function header sounds like the only difference is the HS logging - not mentioning
thatit essentially disables lock queuing entirely.
 
 This seems unsafe (e.g. consider if the initiating backend died and somebody else acquired the lock, possible e.g. if
postmasterdied) and not even remotely enough discussed. I think this should be removed from the patch for now.
 

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Jeff Davis
Date:
Subject: 9.6 Feature help requested: Inclusion Constraints
Next
From: Andreas Karlsson
Date:
Subject: Re: PATCH: Reducing lock strength of trigger and foreign key DDL