Home > mailing lists

Re: Merging statistics from children instead of re-sampling everything - Mailing list pgsql-hackers

From	Andrey V. Lepikhov
Subject	Re: Merging statistics from children instead of re-sampling everything
Date	February 11, 2022 04:29:38
Msg-id	90e400a2-9afa-15c6-8c3c-bac9f68cb9f3@postgrespro.ru Whole thread Raw
In response to	Re: Merging statistics from children instead of re-sampling everything (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Responses	Re: Merging statistics from children instead of re-sampling everything
List	pgsql-hackers

Tree view

On 2/11/22 03:37, Tomas Vondra wrote:
> That being said, this thread was not really about foreign partitions,
> but about re-analyzing inheritance trees in general. And sampling
> foreign partitions doesn't really solve that - we'll still do the
> sampling over and over.
IMO, to solve the problem we should do two things:
1. Avoid repeatable partition scans in the case inheritance tree.
2. Avoid to re-analyze everything in the case of active changes in small 
subset of partitions.

For (1) i can imagine a solution like multiplexing: on the stage of 
defining which relations to scan, group them and prepare parameters of 
scanning to make multiple samples in one shot.
It looks like we need a separate logic for analysis of partitioned 
tables - we should form and cache samples on each partition before an 
analysis.
It requires a prototype to understand complexity of such solution and 
can be done separately from (2).

Task (2) is more difficult to solve. Here we can store samples from each 
partition in values[] field of pg_statistic or in specific table which 
stores a 'most probable values' snapshot of each table.
Most difficult problem here, as you mentioned, is ndistinct value. Is it 
possible to store not exactly calculated value of ndistinct, but an 
'expected value', based on analysis of samples and histograms on 
partitions? Such value can solve also a problem of estimation of a SETOP 
result grouping (joining of them, etc), where we have statistics only on 
sources of the union.

-- 
regards,
Andrey Lepikhov
Postgres Professional

pgsql-hackers by date:

From: Noah Misch
Date: 11 February 2022, 03:46:39
Subject: Re: List of all* PostgreSQL EXTENSIONs in the world

From: Bharath Rupireddy
Date: 11 February 2022, 04:46:27
Subject: Re: Is there any documentation on how to correctly create extensions in HA(primary-standby) setup?

Re: Merging statistics from children instead of re-sampling everything - Mailing list pgsql-hackers

Previous

Next