Home > mailing lists

Re: multivariate statistics (v25) - Mailing list pgsql-hackers

From	Tomas Vondra
Subject	Re: multivariate statistics (v25)
Date	April 5, 2017 12:41:29
Msg-id	a80cbb70-ea48-0367-9a40-a5cb6484046e@2ndquadrant.com Whole thread Raw
In response to	Re: [HACKERS] multivariate statistics (v25) (Alvaro Herrera <alvherre@2ndquadrant.com>)
List	pgsql-hackers

Tree view


On 04/05/2017 08:41 AM, Sven R. Kunze wrote:
> Thanks Tomas and David for hacking on this patch.
> 
> On 04.04.2017 20:19, Tomas Vondra wrote:
>> I'm not sure we still need the min_group_size, when evaluating 
>> dependencies. It was meant to deal with 'noisy' data, but I think it 
>> after switching to the 'degree' it might actually be a bad idea.
>>
>> Consider this:
>>
>>     create table t (a int, b int);
>>     insert into t select 1, 1 from generate_series(1, 10000) s(i);
>>     insert into t select i, i from generate_series(2, 20000) s(i);
>>     create statistics s with (dependencies) on (a,b) from t;
>>     analyze t;
>>
>>     select stadependencies from pg_statistic_ext ;
>>                   stadependencies
>>     --------------------------------------------
>>      [{1 => 2 : 0.333344}, {2 => 1 : 0.333344}]
>>     (1 row)
>>
>> So the degree of the dependency is just ~0.333 although it's obviously 
>> a perfect dependency, i.e. a knowledge of 'a' determines 'b'. The 
>> reason is that we discard 2/3 of rows, because those groups are only a 
>> single row each, except for the one large group (1/3 of rows).
> 
> Just for me to follow the comments better. Is "dependency" roughly the 
> same as when statisticians speak about " conditional probability"?
> 

No, it's more 'functional dependency' from relational normal forms.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

pgsql-hackers by date:

From: Tomas Vondra
Date: 05 April 2017, 12:37:40
Subject: Re: strange parallel query behavior after OOM crashes

From: Amit Langote
Date: 05 April 2017, 12:54:46
Subject: Re: UPDATE of partition key

Re: multivariate statistics (v25) - Mailing list pgsql-hackers

Previous

Next