> Where analyze does systematically fall down is with databases over 500GB in > size, but that's not a function of d_s_t but rather of our tiny sample size.
n_distinct. For that Josh is right, we *would* need a sample size proportional to the whole data set which would practically require us to scan the whole table (and have a technique for summarizing the results in a nearly constant sized data structure).