Home > mailing lists

Re: MCV lists for highly skewed distributions - Mailing list pgsql-hackers

From	John Naylor
Subject	Re: MCV lists for highly skewed distributions
Date	December 31, 2017 15:01:55
Msg-id	CAJVSVGUOj5UYthSZj9qH89o2Q6PtTuBEiSGsp7DWM4MtadQB-g@mail.gmail.com Whole thread Raw
In response to	Re: MCV lists for highly skewed distributions (John Naylor <jcnaylor@gmail.com>)
Responses	Re: MCV lists for highly skewed distributions
List	pgsql-hackers

Tree view

I wrote:

> On 12/28/17, Jeff Janes <jeff.janes@gmail.com> wrote:
>> I think that perhaps maxmincount should also use the dynamic
>> values_cnt_remaining rather than the static one.  After all, things
>> included in the MCV don' get represented in the histogram.  When I've
>> seen
>> planning problems due to skewed distributions I also usually see
>> redundant
>> values in the histogram boundary list which I think should be in the MCV
>> list instead. But I have not changed that here, pending discussion.
>
> I think this is also a good idea, but I haven't thought it through. If
> you don't go this route, I would move this section back out of the
> loop as well.

I did some quick and dirty testing of this, and I just want to note
that in this case, setting mincount to its hard-coded minimum must
come after setting it to maxmincount, since now maxmincount can go
arbitrarily low.

I'll be travelling for a few days, but I'll do some testing on some
data sets soon. While looking through the archives for more info, I
saw this thread

https://www.postgresql.org/message-id/32261.1496611829%40sss.pgh.pa.us

which showcases the opposite problem: For more uniform distributions,
there are too many MCVs. Not relevant to your problem, but if I have
time I'll try my hand at testing an approach suggested in that thread
at the same time I test your patch and see how it interacts.

-John Naylor

pgsql-hackers by date:

From: Dmitry Dolgov
Date: 31 December 2017, 14:55:47
Subject: Re: [HACKERS] WIP Patch: Precalculate stable functions,infrastructure v1

From: Andrey Borodin
Date: 31 December 2017, 17:05:34
Subject: Re: Faster inserts with mostly-monotonically increasing values

Re: MCV lists for highly skewed distributions - Mailing list pgsql-hackers

Previous

Next