Home > mailing lists

Re: Merging statistics from children instead of re-sampling everything - Mailing list pgsql-hackers

From	Andrey Lepikhov
Subject	Re: Merging statistics from children instead of re-sampling everything
Date	February 10, 2022 11:50:31
Msg-id	bdb0bea2-a0da-1f1d-5c92-96ff90c198eb@postgrespro.ru Whole thread
In response to	Re: Merging statistics from children instead of re-sampling everything (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Responses	Re: Merging statistics from children instead of re-sampling everything
List	pgsql-hackers

Tree view

On 21/1/2022 01:25, Tomas Vondra wrote:
> But I don't have a very good idea what to do about statistics that we
> can't really merge. For some types of statistics it's rather tricky to
> reasonably merge the results - ndistinct is a simple example, although
> we could work around that by building and merging hyperloglog counters.
I think, as a first step on this way we can reduce a number of pulled 
tuples. We don't really needed to pull all tuples from a remote server. 
To construct a reservoir, we can pull only a tuple sample. Reservoir 
method needs only a few arguments to return a sample like you read 
tuples locally. Also, to get such parts of samples asynchronously, we 
can get size of each partition on a preliminary step of analysis.
In my opinion, even this solution can reduce heaviness of a problem 
drastically.

-- 
regards,
Andrey Lepikhov
Postgres Professional

pgsql-hackers by date:

From: Masahiko Sawada
Date: 10 February 2022, 11:28:57
Subject: Re: Logging in LockBufferForCleanup()

From: Bharath Rupireddy
Date: 10 February 2022, 12:01:55
Subject: Refactor CheckpointWriteDelay()

Re: Merging statistics from children instead of re-sampling everything - Mailing list pgsql-hackers

Previous

Next