3) stadistinct - This is quite problematic. We only have the per-child estimates, and it's not clear if there's any overlap. For now I've just summed it up, because that's safer / similar to what we do for gather merge paths etc. Maybe we could improve this by estimating the overlap somehow (e.g. from MCV lists / histograms). But honestly, I doubt the estimates based on tiny sample of each child are any better. I suppose we could introduce a column option, determining how to combine ndistinct (similar to how we can override n_distinct itself).
4) MCV - It's trivial to build a new "parent" MCV list, although it may be too large (in which case we cut it at statistics target, and copy the remaining bits to the histogram)
I think there is one approach to solve the problem with calculating mcv and distinct statistics. To do this, you need to calculate the density of the sample distribution and store it, for example, in some slot. Then, when merging statistics, we will sum up the densities of all partitions as functions and get a new density. According to the new density, you can find out which values are most common and which are distinct.