Re: [Proposal] Extend TableAM routines for ANALYZE scan - Mailing list pgsql-hackers

From Julien Rouhaud
Subject Re: [Proposal] Extend TableAM routines for ANALYZE scan
Date
Msg-id CAOBaU_Z4fRRvzwMMprAe8fHSXu2LMwMET2O4WNrwUZRKozf02A@mail.gmail.com
Whole thread Raw
In response to [Proposal] Extend TableAM routines for ANALYZE scan  (Pengzhou Tang <ptang@pivotal.io>)
List pgsql-hackers
Hello,

On Thu, Dec 5, 2019 at 11:14 AM Pengzhou Tang <ptang@pivotal.io> wrote:
>
> When hacking the Zedstore, we need to get a more accurate statistic for zedstore and we
> faced some restrictions:
> 1) acquire_sample_rows() always use RelationGetNumberOfBlocks to generate sampling block
>     numbers, this is not friendly for zedstore which wants to use a logical block number and might also
>     not friendly to non-block-oriented Table AMs.
> 2) columns of zedstore table store separately, so columns in a row have a different physical position,
>     tid in a tuple is invalid for zedstore which means the correlation statistic is incorrect for zedstore.
> 3) RelOptInfo->pages is not correct for Zedstore if we only access partial of the columns which make
>    the IO cost much higher than the actual cost.
>
> For 1) and 2), we propose to extend existing ANALYZE-scan table AM routines in patch
> "0001-ANALYZE-tableam-API-change.patch" which add three more APIs:
> scan_analyze_beginscan(), scan_analyze_sample_tuple(), scan_analyze_endscan(). This provides
> more convenience and table AMs can take more control of every step of sampling rows. Meanwhile,
> with the new structure named "AcquireSampleContext", we can acquire extra info (eg: physical position,
> physical size) except the real columns values.
>
> For 3), we hope we can have a similar mechanism with RelOptInfo->rows which is calculated from
>  (RelOptInfo->tuples * Selectivity), we can calculate RelOptInfo->pages with a page selectivity which
> is base on the selected zedstore columns.  0002-Planner-can-estimate-the-pages-based-on-the-columns-.patch
> shows one idea that adding the `stadiskfrac` to pg_statistic and planner use it to estimate the
> RelOptInfo->pages.
>
> 0003-ZedStore-use-extended-ANAlYZE-API.patch is attached to only show how Zedstore use the
> previous patches to achieve:
> 1. use logical block id to acquire the sample rows.
> 2. can only acquire sample rows from specified column c1, this is used when user only analyze table
>     on specified columns eg: "analyze zs (c1)".
> 3 when ANALYZE, zedstore table AM provided extra disksize info, then ANALYZE compute the
>     physical fraction statistic of each column and planner use it to estimate the IO cost based on
>     the selected columns.

I couldn't find an entry for that patchset in the next commitfest.
Could you register it so that it won't be forgotten?



pgsql-hackers by date:

Previous
From: Julien Rouhaud
Date:
Subject: Re: [PATCH] Increase the maximum value track_activity_query_size
Next
From: Jehan-Guillaume de Rorthais
Date:
Subject: Re: Fetching timeline during recovery