> Would it be possible to look at a much larger number of samples during
> analyze,
> then look at the variation in those to generate a reasonable number of
> pg_statistic "samples" to represent our estimate of the actual
> distribution?
> More datapoints for tables where the planner might benefit from it, fewer
> where it wouldn't.
Maybe it would be possible to take note somewhere of the percentage of
occurence of the most common value (in the OP's case, about 3%), in which
case a quick decision can be taken to use the index without even looking
at the value, if we know the most common one is below the index use
threshold...