Hello,
On Thu, Dec 5, 2019 at 11:14 AM Pengzhou Tang <ptang@pivotal.io> wrote:
>
> When hacking the Zedstore, we need to get a more accurate statistic for zedstore and we
> faced some restrictions:
> 1) acquire_sample_rows() always use RelationGetNumberOfBlocks to generate sampling block
> numbers, this is not friendly for zedstore which wants to use a logical block number and might also
> not friendly to non-block-oriented Table AMs.
> 2) columns of zedstore table store separately, so columns in a row have a different physical position,
> tid in a tuple is invalid for zedstore which means the correlation statistic is incorrect for zedstore.
> 3) RelOptInfo->pages is not correct for Zedstore if we only access partial of the columns which make
> the IO cost much higher than the actual cost.
>
> For 1) and 2), we propose to extend existing ANALYZE-scan table AM routines in patch
> "0001-ANALYZE-tableam-API-change.patch" which add three more APIs:
> scan_analyze_beginscan(), scan_analyze_sample_tuple(), scan_analyze_endscan(). This provides
> more convenience and table AMs can take more control of every step of sampling rows. Meanwhile,
> with the new structure named "AcquireSampleContext", we can acquire extra info (eg: physical position,
> physical size) except the real columns values.
>
> For 3), we hope we can have a similar mechanism with RelOptInfo->rows which is calculated from
> (RelOptInfo->tuples * Selectivity), we can calculate RelOptInfo->pages with a page selectivity which
> is base on the selected zedstore columns. 0002-Planner-can-estimate-the-pages-based-on-the-columns-.patch
> shows one idea that adding the `stadiskfrac` to pg_statistic and planner use it to estimate the
> RelOptInfo->pages.
>
> 0003-ZedStore-use-extended-ANAlYZE-API.patch is attached to only show how Zedstore use the
> previous patches to achieve:
> 1. use logical block id to acquire the sample rows.
> 2. can only acquire sample rows from specified column c1, this is used when user only analyze table
> on specified columns eg: "analyze zs (c1)".
> 3 when ANALYZE, zedstore table AM provided extra disksize info, then ANALYZE compute the
> physical fraction statistic of each column and planner use it to estimate the IO cost based on
> the selected columns.
I couldn't find an entry for that patchset in the next commitfest.
Could you register it so that it won't be forgotten?