Re: PoC Refactor AM analyse API - Mailing list pgsql-hackers

From Denis Smirnov
Subject Re: PoC Refactor AM analyse API
Date
Msg-id 94A92C72-E6CD-410F-8557-6CDEE04C28B8@arenadata.io
Whole thread Raw
In response to PoC Refactor AM analyse API  (Смирнов Денис <sd@arenadata.io>)
List pgsql-hackers
It seems that my mailing client set wrong MIME types for attached patch and it was filtered by the web archive. So I
attachthe patch again for the web archive. 



> 7 дек. 2020 г., в 23:23, Смирнов Денис <sd@arenadata.io> написал(а):
>
> Hello all!
>
> I suggest a refactoring of analyze AM API as it is too much heap specific at the moment. The problem was inspired by
Greenplum’sanalyze improvement for append-optimized row and column AM with variable size compressed blocks. 
> Currently we do analyze in two steps.
>
> 1. Sample fix size blocks with algorithm S from Knuth (BlockSampler function)
> 2. Collect tuples into reservoir with algorithm Z from Vitter.
>
> So this doesn’t work for AMs using variable sized physical blocks for example. They need weight random sampling (WRS)
algorithmslike A-Chao or logical blocks to follow S-Knuth (and have a problem with RelationGetNumberOfBlocks()
estimatinga physical number of blocks). Another problem with columns - they are not passed to analyze begin scan and
can’tbenefit from column storage at ANALYZE TABLE (COL). 
>
> The suggestion is to replace table_scan_analyze_next_block() and table_scan_analyze_next_tuple() with a single
function:table_acquire_sample_rows(). The AM implementation of table_acquire_sample_rows() can use the BlockSampler
functionsif it wants to, but if the AM is not block-oriented, it could do something else. This suggestion also passes
VacAttrStatsto table_acquire_sample_rows() for column-oriented AMs and removes PROGRESS_ANALYZE_BLOCKS_TOTAL and
PROGRESS_ANALYZE_BLOCKS_DONEdefinitions as not all AMs can be block-oriented. 
>
> <am-analyze-1.patch>
>
>
>
> Best regards,
> Denis Smirnov | Developer
> sd@arenadata.io
> Arenadata | Godovikova 9-17, Moscow 129085 Russia
>


Attachment

pgsql-hackers by date:

Previous
From: Kyotaro Horiguchi
Date:
Subject: Re: [Patch] Optimize dropping of relation buffers using dlist
Next
From: "tsunakawa.takay@fujitsu.com"
Date:
Subject: RE: [bug fix] ALTER TABLE SET LOGGED/UNLOGGED on a partitioned table does nothing silently