On Tue, Apr 16, 2024 at 6:52 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> Taking a closer look at acquire_sample_rows(), I think it would be
> good if table AM implementation would care about block-level (or
> whatever-level) sampling. So that acquire_sample_rows() just fetches
> tuples one-by-one from table AM implementation without any care about
> blocks. Possible table_beginscan_analyze() could take an argument of
> target number of tuples, then those tuples are just fetches with
> table_scan_analyze_next_tuple(). What do you think?
Andres is the expert here, but FWIW, that plan seems reasonable to me.
One downside is that every block-based tableam is going to end up with
a very similar implementation, which is kind of something I don't like
about the tableam API in general: if you want to make something that
is basically heap plus a little bit of special sauce, you have to copy
a mountain of code. Right now we don't really care about that problem,
because we don't have any other tableams in core, but if we ever do, I
think we're going to find ourselves very unhappy with that aspect of
things. But maybe now is not the time to start worrying. That problem
isn't unique to analyze, and giving out-of-core tableams the
flexibility to do what they want is better than not.
--
Robert Haas
EDB: http://www.enterprisedb.com