Home > mailing lists

Re: [HACKERS] PATCH: multivariate histograms and MCV lists - Mailing list pgsql-hackers

From	Tomas Vondra
Subject	Re: [HACKERS] PATCH: multivariate histograms and MCV lists
Date	January 11, 2019 21:18:32
Msg-id	a0aa7043-26cd-5473-a9d4-ef3bd035f132@2ndquadrant.com Whole thread Raw
In response to	Re: [HACKERS] PATCH: multivariate histograms and MCV lists (Dean Rasheed <dean.a.rasheed@gmail.com>)
Responses	Re: [HACKERS] PATCH: multivariate histograms and MCV lists
List	pgsql-hackers

Tree view

On 1/10/19 4:20 PM, Dean Rasheed wrote:
> ...
>
> So perhaps what we should do for multivariate stats is simply use the
> relative standard error approach (i.e., reuse the patch in [2] with a
> 20% RSE cutoff). That had a lot of testing at the time, against a wide
> range of data distributions, and proved to be very good, not to
> mention being very simple.
> 
> That approach would encompass both groups more and less common than
> the base frequency, because it relies entirely on the group appearing
> enough times in the sample to infer that any errors on the resulting
> estimates will be reasonably well controlled. It wouldn't actually
> look at the base frequency at all in deciding which items to keep.
> 

I've been looking at this approach today, and I'm a bit puzzled. That
patch essentially uses SRE to compute mincount like this:

    mincount = n*(N-n) / (N-n+0.04*n*(N-1))

and then includes all items more common than this threshold. How could
that handle items significantly less common than the base frequency?

Or did you mean to use the SRE, but in some different way?

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

pgsql-hackers by date:

From: Tom Lane
Date: 11 January 2019, 21:05:05
Subject: Re: port of INSTALL file generation to XSLT

From: Merlin Moncure
Date: 11 January 2019, 21:25:23
Subject: Re: Early WIP/PoC for inlining CTEs

Re: [HACKERS] PATCH: multivariate histograms and MCV lists - Mailing list pgsql-hackers

Previous

Next