Compared to numeric type, decimal64 arithmetics is about 2x faster, decimal128 is about 1.5x faster. However, the cast between decimal and float4/8 is implemented rather naively and slow. As always, it depends on workload, decimal may take more, or less space, may be slower if cast is frequently performed.
Are you able to share the processor vendor, and perhaps some other specs of the machine you obtained these results from?