I got around to looking at this today, and what I find is that the problem is that exec_stmt_return_query() uses a portal (i.e. a cursor) to read the results of the query. That seemed like a good idea, back in the late bronze age, because it allowed plpgsql to fetch the query results a few rows at a time and not risk blowing out memory with a huge SPI result. However, the parallel-query infrastructure refuses to parallelize when the query is being read via a cursor.
I think that the latter restriction is probably sane, because we don't want to suspend execution of a parallel query while we've got worker processes waiting. And there might be some implementation restrictions lurking under it too --- that's not a part of the code I know in any detail.
However, there's no fundamental reason why exec_stmt_return_query has to use a cursor. It's going to run the query to completion immediately anyway, and shove all the result rows into a tuplestore. What we lack is a way to get the SPI query to pass its results directly to a tuplestore, without the SPITupleTable intermediary. (Note that the tuplestore can spill a large result to disk, whereas SPITupleTable can't do that.)
So, attached is a draft patch to enable that. By getting rid of the intermediate SPITupleTable, this should improve the performance of RETURN QUERY somewhat even without considering the possibility of parallelizing the source query. I've not tried to measure that though. I've also not looked for other places that could use this new infrastructure, but there may well be some.
One thing I'm not totally pleased about with this is adding another SPI interface routine using the old parameter-values API (that is, null flags as char ' '/'n'). That was the path of least resistance given the other moving parts in pl_exec.c and spi.c, but maybe we should try to modernize that before we set it in stone.
Another thing standing between this patch and committability is suitable additions to the SPI documentation. But I saw no value in writing that before the previous point is settled.
I will go add this to the next commitfest (for v14), but I wonder if we should try to squeeze it into v13? This isn't the only complaint we've gotten about non-parallelizability of RETURN QUERY.