Now the question is, where should the code that does all of this live? postgres_fdw? Some new, sharding-specific FDW? In core? I don't know for sure, but what I do know is that we could make a lot of progress over where we are today by just improving postgres_fdw, and I don't think those improvements are even all that difficult. If we decide we need to implement something new, it's going to be a huge project that will take years to complete, with uncertain results. I'd rather have a postgres_fdw-based implementation that is imperfect and can't handle some kinds of queries in 9.6 than a promise that by 9.9 we'll have something really great that handles MPP perfectly.
Distributed shuffles (Map/Reduce) are hard. When we looked at using FDWs for pg_shard, we thought that Map/Reduce would require a comprehensive revamp of the APIs.
For Citus, a second part of the question is as FDW writers. We implemented cstore_fdw, json_fdw, and mongo_fdw, and these wrappers don't benefit from even the simple join pushdown that doesn't require Map/Reduce.