The sort node on q3 takes almost 12 seconds, making the query run on 68 if I had set enough work_mem to make it all in memory.
The third one thinks it is going find 3454539 output rows. If it run in parallel, it thinks it will be passing lots of rows up from the parallel workers, and charges a high price (parallel_tuple_cost = 0.1) for doing so. So you can try lowering parallel_tuple_cost, or figuring out why the estimate is so bad.