Thread: PoC Refactor AM analyse API
Hello all!
I suggest a refactoring of analyze AM API as it is too much heap specific at the moment. The problem was inspired by Greenplum’s analyze improvement for append-optimized row and column AM with variable size compressed blocks.
Currently we do analyze in two steps.
1. Sample fix size blocks with algorithm S from Knuth (BlockSampler function)
2. Collect tuples into reservoir with algorithm Z from Vitter.
So this doesn’t work for AMs using variable sized physical blocks for example. They need weight random sampling (WRS) algorithms like A-Chao or logical blocks to follow S-Knuth (and have a problem with RelationGetNumberOfBlocks() estimating a physical number of blocks). Another problem with columns - they are not passed to analyze begin scan and can’t benefit from column storage at ANALYZE TABLE (COL).
The suggestion is to replace table_scan_analyze_next_block() and table_scan_analyze_next_tuple() with a single function: table_acquire_sample_rows(). The AM implementation of table_acquire_sample_rows() can use the BlockSampler functions if it wants to, but if the AM is not block-oriented, it could do something else. This suggestion also passes VacAttrStats to table_acquire_sample_rows() for column-oriented AMs and removes PROGRESS_ANALYZE_BLOCKS_TOTAL and PROGRESS_ANALYZE_BLOCKS_DONE definitions as not all AMs can be block-oriented.
Best regards,
Denis Smirnov | Developer
sd@arenadata.io
Arenadata | Godovikova 9-17, Moscow 129085 Russia
Denis Smirnov | Developer
sd@arenadata.io
Arenadata | Godovikova 9-17, Moscow 129085 Russia
Attachment
It seems that my mailing client set wrong MIME types for attached patch and it was filtered by the web archive. So I attachthe patch again for the web archive. > 7 дек. 2020 г., в 23:23, Смирнов Денис <sd@arenadata.io> написал(а): > > Hello all! > > I suggest a refactoring of analyze AM API as it is too much heap specific at the moment. The problem was inspired by Greenplum’sanalyze improvement for append-optimized row and column AM with variable size compressed blocks. > Currently we do analyze in two steps. > > 1. Sample fix size blocks with algorithm S from Knuth (BlockSampler function) > 2. Collect tuples into reservoir with algorithm Z from Vitter. > > So this doesn’t work for AMs using variable sized physical blocks for example. They need weight random sampling (WRS) algorithmslike A-Chao or logical blocks to follow S-Knuth (and have a problem with RelationGetNumberOfBlocks() estimatinga physical number of blocks). Another problem with columns - they are not passed to analyze begin scan and can’tbenefit from column storage at ANALYZE TABLE (COL). > > The suggestion is to replace table_scan_analyze_next_block() and table_scan_analyze_next_tuple() with a single function:table_acquire_sample_rows(). The AM implementation of table_acquire_sample_rows() can use the BlockSampler functionsif it wants to, but if the AM is not block-oriented, it could do something else. This suggestion also passes VacAttrStatsto table_acquire_sample_rows() for column-oriented AMs and removes PROGRESS_ANALYZE_BLOCKS_TOTAL and PROGRESS_ANALYZE_BLOCKS_DONEdefinitions as not all AMs can be block-oriented. > > <am-analyze-1.patch> > > > > Best regards, > Denis Smirnov | Developer > sd@arenadata.io > Arenadata | Godovikova 9-17, Moscow 129085 Russia >
Attachment
Hi Denis! > 7 дек. 2020 г., в 18:23, Смирнов Денис <sd@arenadata.io> написал(а): > > I suggest a refactoring of analyze AM API as it is too much heap specific at the moment. The problem was inspired by Greenplum’sanalyze improvement for append-optimized row and column AM with variable size compressed blocks. > Currently we do analyze in two steps. > > 1. Sample fix size blocks with algorithm S from Knuth (BlockSampler function) > 2. Collect tuples into reservoir with algorithm Z from Vitter. > > So this doesn’t work for AMs using variable sized physical blocks for example. They need weight random sampling (WRS) algorithmslike A-Chao or logical blocks to follow S-Knuth (and have a problem with RelationGetNumberOfBlocks() estimatinga physical number of blocks). Another problem with columns - they are not passed to analyze begin scan and can’tbenefit from column storage at ANALYZE TABLE (COL). > > The suggestion is to replace table_scan_analyze_next_block() and table_scan_analyze_next_tuple() with a single function:table_acquire_sample_rows(). The AM implementation of table_acquire_sample_rows() can use the BlockSampler functionsif it wants to, but if the AM is not block-oriented, it could do something else. This suggestion also passes VacAttrStatsto table_acquire_sample_rows() for column-oriented AMs and removes PROGRESS_ANALYZE_BLOCKS_TOTAL and PROGRESS_ANALYZE_BLOCKS_DONEdefinitions as not all AMs can be block-oriented. Just few random notes about the idea. Heap pages are not of a fixed size, when measured in tuple count. And comment in the codes describes it. * Although every row has an equal chance of ending up in the final * sample, this sampling method is not perfect: not every possible * sample has an equal chance of being selected. For large relations * the number of different blocks represented by the sample tends to be * too small. We can live with that for now. Improvements are welcome. Current implementation provide framework with shared code. Though this framework is only suitable for block-of-tuples AMs.And have statistical downsides when count of tuples varies too much. Maybe can we just provide a richer API? To have both: avoid copying code and provide flexibility. Best regards, Andrey Borodin.
Andrey, thanks for your feedback! I agree that AMs with fix sized blocks can have much alike code in acquire_sample_rows() (though it is not a rule). But thereare several points about current master sampling. * It is not perfect - AM developers may want to improve it with other sampling algorithms. * It is designed with a big influence of heap AM - for example, RelationGetNumberOfBlocks() returns uint32 while other AMscan have a bigger amount of blocks. * heapam_acquire_sample_rows() is a small function - I don't think it is not a big trouble to write something alike for anyAM developer. * Some AMs may have a single level sampling (only algorithm Z from Vitter for example) - why not? As a result we get a single and clear method to acquire rows for statistics. If we don’t modify but rather extend currentAPI ( for example in a manner it is done for FDW) the code becomes more complicated and difficult to understand. > 8 дек. 2020 г., в 18:42, Andrey Borodin <x4mmm@yandex-team.ru> написал(а): > > Hi Denis! > >> 7 дек. 2020 г., в 18:23, Смирнов Денис <sd@arenadata.io> написал(а): >> >> I suggest a refactoring of analyze AM API as it is too much heap specific at the moment. The problem was inspired by Greenplum’sanalyze improvement for append-optimized row and column AM with variable size compressed blocks. >> Currently we do analyze in two steps. >> >> 1. Sample fix size blocks with algorithm S from Knuth (BlockSampler function) >> 2. Collect tuples into reservoir with algorithm Z from Vitter. >> >> So this doesn’t work for AMs using variable sized physical blocks for example. They need weight random sampling (WRS)algorithms like A-Chao or logical blocks to follow S-Knuth (and have a problem with RelationGetNumberOfBlocks() estimatinga physical number of blocks). Another problem with columns - they are not passed to analyze begin scan and can’tbenefit from column storage at ANALYZE TABLE (COL). >> >> The suggestion is to replace table_scan_analyze_next_block() and table_scan_analyze_next_tuple() with a single function:table_acquire_sample_rows(). The AM implementation of table_acquire_sample_rows() can use the BlockSampler functionsif it wants to, but if the AM is not block-oriented, it could do something else. This suggestion also passes VacAttrStatsto table_acquire_sample_rows() for column-oriented AMs and removes PROGRESS_ANALYZE_BLOCKS_TOTAL and PROGRESS_ANALYZE_BLOCKS_DONEdefinitions as not all AMs can be block-oriented. > > Just few random notes about the idea. > Heap pages are not of a fixed size, when measured in tuple count. And comment in the codes describes it. > * Although every row has an equal chance of ending up in the final > * sample, this sampling method is not perfect: not every possible > * sample has an equal chance of being selected. For large relations > * the number of different blocks represented by the sample tends to be > * too small. We can live with that for now. Improvements are welcome. > > Current implementation provide framework with shared code. Though this framework is only suitable for block-of-tuples AMs.And have statistical downsides when count of tuples varies too much. > Maybe can we just provide a richer API? To have both: avoid copying code and provide flexibility. > > Best regards, Andrey Borodin. >
> 8 дек. 2020 г., в 16:44, Denis Smirnov <sd@arenadata.io> написал(а): > > Andrey, thanks for your feedback! > > I agree that AMs with fix sized blocks can have much alike code in acquire_sample_rows() (though it is not a rule). Butthere are several points about current master sampling. > > * It is not perfect - AM developers may want to improve it with other sampling algorithms. > * It is designed with a big influence of heap AM - for example, RelationGetNumberOfBlocks() returns uint32 while otherAMs can have a bigger amount of blocks. > * heapam_acquire_sample_rows() is a small function - I don't think it is not a big trouble to write something alike forany AM developer. > * Some AMs may have a single level sampling (only algorithm Z from Vitter for example) - why not? > > As a result we get a single and clear method to acquire rows for statistics. If we don’t modify but rather extend currentAPI ( for example in a manner it is done for FDW) the code becomes more complicated and difficult to understand. This makes sense. Purpose of the API is to provide flexible abstraction. Current table_scan_analyze_next_block()/table_scan_analyze_next_tuple()API assumes too much about AM implementation. But why do you pass int natts and VacAttrStats **stats to acquire_sample_rows()? Is it of any use? It seems to break abstractiontoo. Best regards, Andrey Borodin.
> But why do you pass int natts and VacAttrStats **stats to acquire_sample_rows()? Is it of any use? It seems to break abstractiontoo. Yes, it is really a kluge that should be discussed. The main problem is that we don’t pass projection information to analyzescan (analyze begin scan relise only on relation information during initialization). And as a result we can’t benefitfrom column AMs during «analyze t(col)» and consume data only from target columns. This parameters were added to solvethis problem. Best regards, Denis Smirnov | Developer sd@arenadata.io Arenadata | Godovikova 9-17, Moscow 129085 Russia
On 30/12/2020 11:12, Denis Smirnov wrote: > >> But why do you pass int natts and VacAttrStats **stats to >> acquire_sample_rows()? Is it of any use? It seems to break >> abstraction too. > > Yes, it is really a kluge that should be discussed. The main problem > is that we don’t pass projection information to analyze scan (analyze > begin scan relise only on relation information during > initialization). And as a result we can’t benefit from column AMs > during «analyze t(col)» and consume data only from target columns. > This parameters were added to solve this problem. The documentation needs to be updated accordingly, see AcquireSampleRowsFunc in fdwhandler.sgml. This part of the patch, adding the list of columns being analyzed, seems a bit unfinished. I'd suggest to leave that out for now, and add it as part of the "Table AM modifications to accept column projection lists" patch that's also in this commitfest [1] > This suggestion also ... removes PROGRESS_ANALYZE_BLOCKS_TOTAL and > PROGRESS_ANALYZE_BLOCKS_DONE definitions as not all AMs can be > block-oriented. We shouldn't just remove it, a progress indicator is nice. It's true that not all AMs are block-oriented, but those that are can still use those. Perhaps we can add ther PROGRESS_ANALYZE_* states for non-block-oriented AMs, but that can wait until there is a concrete use for it. > static int > acquire_sample_rows(Relation onerel, int elevel, > HeapTuple *rows, int targrows, > double *totalrows, double *totaldeadrows) > { > int numrows = 0; /* # rows now in reservoir */ > TableScanDesc scan; > > Assert(targrows > 0); > > scan = table_beginscan_analyze(onerel); > > numrows = table_acquire_sample_rows(scan, elevel, > natts, stats, > vac_strategy, rows, > targrows, totalrows, > totaldeadrows); > > table_endscan(scan); > > /* > * If we didn't find as many tuples as we wanted then we're done. No sort > * is needed, since they're already in order. > * > * Otherwise we need to sort the collected tuples by position > * (itempointer). It's not worth worrying about corner cases where the > * tuples are already sorted. > */ > if (numrows == targrows) > qsort((void *) rows, numrows, sizeof(HeapTuple), compare_rows); > > return numrows; > } Perhaps better to move the qsort() into heapam_acquire_sample_rows(), and document that the acquire_sample_rows() AM function must return the tuples in 'ctid' order. In a generic API, it seems like a shady assumption that they must be in order if we didn't find as many rows as we wanted. Or always call qsort(); if the tuples are already in order, that should be pretty quick. The table_beginscan_analyze() call seems a bit pointless now. Let's remove it, and pass the Relation to table_acquire_sample_rows directly. > /* > * This callback needs to fill reservour with sample rows during analyze > * scan. > */ > int (*acquire_sample_rows) (TableScanDesc scan, The "reservoir" is related to the block sampler, but that's just an implementation detail of the function. Perhaps something like "Acquire a sample of rows from the table, for ANALYZE". And explain the arguments here, or in table_acquire_sample_rows(). Overall, I like where this patch is going. [1] https://commitfest.postgresql.org/31/2922/ - Heikki
Thanks for your review, Heikki. I have made the changes you have requested. 1. All modifications interconnected with column projection were reverted (they should be added in https://commitfest.postgresql.org/31/2922if the current patch would be merged one day). 2. I have returned PROGRESS_ANALYZE_* states. 3. qsort() was moved into heapam_acquire_sample_rows(). Also a comment was added, that the acquire_sample_rows() AM functionmust return the tuples in a physical table order. 4. table_beginscan_analyze() was removed as a redundant function. 5. acquire_sample_rows() comment about reservoir was changed. Best regards, Denis Smirnov | Developer sd@arenadata.io Arenadata | Godovikova 9-17, Moscow 129085 Russia
Attachment
Hi,
+ *totalrows = floor((liverows / bs.m) * totalblocks + 0.5);
Is the above equivalent to:
+ *totalrows = ceil((liverows / bs.m) * totalblocks);
For compare_rows(), it seems the computation of oa and ob can be delayed to when ba == bb (after the first two if statements).
Cheers
On Thu, Feb 18, 2021 at 6:06 PM Denis Smirnov <sd@arenadata.io> wrote:
Thanks for your review, Heikki.
I have made the changes you have requested.
1. All modifications interconnected with column projection were reverted (they should be added in https://commitfest.postgresql.org/31/2922 if the current patch would be merged one day).
2. I have returned PROGRESS_ANALYZE_* states.
3. qsort() was moved into heapam_acquire_sample_rows(). Also a comment was added, that the acquire_sample_rows() AM function must return the tuples in a physical table order.
4. table_beginscan_analyze() was removed as a redundant function.
5. acquire_sample_rows() comment about reservoir was changed.
Best regards,
Denis Smirnov | Developer
sd@arenadata.io
Arenadata | Godovikova 9-17, Moscow 129085 Russia
Hello, Zhihong. Thanks for your comments. 1. I am afraid that an equivalence of "floor(val + 0.5)" to "cell(val)" is incorrect: "floor(2.1 + 0.5)" returns 2 while "cell(2.1)" returns 3. We can’t use such replacement, as you have suggested. 2. >> For compare_rows(), it seems the computation of oa and ob can be delayed to when ba == bb (after the first two if statements). I have checked some examples of ASM code generated by different compilers with -O2/O3 flags on https://gcc.godbolt.org anddidn’t see any big benefit in result CPU instructions. You can check yourself an attached example below. Best regards, Denis Smirnov | Developer sd@arenadata.io Arenadata | Godovikova 9-17, Moscow 129085 Russia
Attachment
Denis:
Thanks for considering my suggestion.
For #1, I didn't take your example into account. Thanks for pointing that out.
Cheers
On Thu, Feb 18, 2021 at 11:59 PM Denis Smirnov <sd@arenadata.io> wrote:
Hello, Zhihong.
Thanks for your comments.
1. I am afraid that an equivalence of "floor(val + 0.5)" to "cell(val)" is incorrect: "floor(2.1 + 0.5)" returns 2 while "cell(2.1)" returns 3. We can’t use such replacement, as you have suggested.
2. >> For compare_rows(), it seems the computation of oa and ob can be delayed to when ba == bb (after the first two if statements).
I have checked some examples of ASM code generated by different compilers with -O2/O3 flags on https://gcc.godbolt.org and didn’t see any big benefit in result CPU instructions. You can check yourself an attached example below.
Best regards,
Denis Smirnov | Developer
sd@arenadata.io
Arenadata | Godovikova 9-17, Moscow 129085 Russia
On Fri, Feb 19, 2021 at 12:06:12PM +1000, Denis Smirnov wrote: > Thanks for your review, Heikki. > > I have made the changes you have requested. > > 1. All modifications interconnected with column projection were reverted (they should be added in https://commitfest.postgresql.org/31/2922if the current patch would be merged one day). > 2. I have returned PROGRESS_ANALYZE_* states. > 3. qsort() was moved into heapam_acquire_sample_rows(). Also a comment was added, that the acquire_sample_rows() AM functionmust return the tuples in a physical table order. > 4. table_beginscan_analyze() was removed as a redundant function. > 5. acquire_sample_rows() comment about reservoir was changed. > Hi Denis, This doesn't apply anymore because of c6fc50c, can you resubmit a new patch? Please note that the patch must be submitted with a .patch extension instead of .txt, that way the CI at http://commitfest.cputube.org/ will be able to execute automatic tests on it. Regards, -- Jaime Casanova Director de Servicios Profesionales SystemGuards - Consultores de PostgreSQL
On Wed, Sep 08, 2021 at 10:06:25AM -0500, Jaime Casanova wrote: > This doesn't apply anymore because of c6fc50c, can you resubmit a new > patch? Activity has stalled here, so I have marked the entry as RwF in the CF app. -- Michael
Attachment
I think this patch should be totally redesigned and removed from the upcoming CF. The problem is that vanilla PG has a single storage manager implementation, that works with fix size blocks. Current commit didn’t take this aspect into account. We should first decide whether PG needs an ability to implement custom storage managers with variable size blocks and custom block buffers (or without any for OLAP). And only after that we should move to the variable size block analyze.
Best regards,
Denis Smirnov | Developer
sd@arenadata.io
Arenadata | Bldg. 3, Block 1 Skladochnaya St. Moscow, 127018
Denis Smirnov | Developer
sd@arenadata.io
Arenadata | Bldg. 3, Block 1 Skladochnaya St. Moscow, 127018