It's months that I'm trying to solve a performance issue with PostgreSQL. I’m able to give you all the technical details needed.
SYSTEM CONFIGURATION
Our deployment machine is a Dell PowerEdge T420 with a Perc H710 RAID controller configured in this way:
two Intel Xeon E5-2640 v2 @2Ghz
PostgreSQL 9.4 (updated to the latest available version)
My personal low cost and low profile development machine is a MacMini configured in this way:
one Intel i7 @2.2Ghz
PostgreSQL 9.0.13 (the original built-in shipped with OS X Server)
Using such different versions of PostgreSQL seems like a recipe for frustration.
Here are two benchmarks generated using pg_test_fsync:
This is unlikely to be important for the type of workload you describe. Fsyncs are the bottleneck for many short transactions, but not often the bottleneck for very large transactions.
What collation is used for both databases? Perhaps the T420 is using a much slower collation.
How can you sort 2,951,191 but then materialize 4,458,971 rows out of that? I've never seen that before. (Or, in the other plan, put 2,951,191 rows into the sort from the CTE but get 4,458,971 out of the sort?