Re: More stable query plans via more predictable column statistics - Mailing list pgsql-hackers

From Shulgin, Oleksandr
Subject Re: More stable query plans via more predictable column statistics
Date
Msg-id CACACo5Qmkt1fdQrGJ_mcLJhJmJ2YpXX-7iB4qen+4sni=YPeRA@mail.gmail.com
Whole thread Raw
In response to Re: More stable query plans via more predictable column statistics  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: More stable query plans via more predictable column statistics  (Alex Shulgin <alex.shulgin@gmail.com>)
List pgsql-hackers
<p dir="ltr">On Apr 2, 2016 18:38, "Tom Lane" <<a href="mailto:tgl@sss.pgh.pa.us">tgl@sss.pgh.pa.us</a>>
wrote:<br/> ><br /> > "Shulgin, Oleksandr" <<a
href="mailto:oleksandr.shulgin@zalando.de">oleksandr.shulgin@zalando.de</a>>writes:<br /> > > On Apr 1, 2016
23:14,"Tom Lane" <<a href="mailto:tgl@sss.pgh.pa.us">tgl@sss.pgh.pa.us</a>> wrote:<br /> > >> Haven't
lookedat 0002 yet.<br /> ><br /> > > [crosses fingers] hope you'll have a chance to do that before feature<br
/>> > freeze for 9.6<br /> ><br /> > I studied this patch for awhile after rebasing it onto yesterday's<br
/>> commits.<p dir="ltr">Fantastic! I could not hope for a better reply :-) <p dir="ltr">> I did not like the
factthat the compute_scalar_stats logic<br /> > would allow absolutely anything into the MCV list once num_hist
falls<br/> > below 2. I think it's important that we continue to reject values that<br /> > are only seen once in
thesample, because there's no very good reason to<br /> > think that they are MCVs and not just infrequent values
thatby luck<br /> > appeared in the sample.<p dir="ltr">In my understanding we only put a value in the track list if
we'veseen it at least twice, no?<p dir="ltr">> However, after I rearranged the tests there so<br /> > that "if
(num_hist>= 2)" only controlled whether to apply the 1/K limit,<br /> > one of the regression tests started to
fail:<pdir="ltr">Uh-oh.<p dir="ltr">> there's a place in<br /> > rowsecurity.sql that expects that if a column
containsnothing but several<br /> > instances of a single value, that value will be recorded as a lone MCV.<br />
>Now this isn't a particularly essential thing for that test, but it still<br /> > seems like a good property for
ANALYZEto have.<p dir="ltr">No objection here.<p dir="ltr">> The reason it's failing,<br /> > of course, is that
thetest as written cannot possibly accept the last<br /> > (or only) value.<p dir="ltr">Yeah, this I would expect
fromsuch a change.<p dir="ltr">> Before I noticed the regression failure, I'd been thinking that maybe it'd<br />
>be better if the decision rule were not "at least 100+x% of the average<br /> > frequency of this value and
laterones", but "at least 100+x% of the<br /> > average frequency of values after this one".<p dir="ltr">Hm, sounds
prettysimilar to what I wanted to achieve, but better formalized.<p dir="ltr">> With that formulation, we're<br />
>not constrained as to the range of x.  Now, if there are *no* values after<br /> > this one, then this way needs
anexplicit special case in order not to<br /> > compute 0/0; but the preceding para shows that we need a special
casefor<br /> > the last value anyway.<br /> ><br /> > So, attached is a patch rewritten along these lines.  I
used50% rather<br /> > than 25% as the new cutoff percentage --- obviously it should be higher<br /> > in this
formulationthan before, but I have no idea if that particular<br /> > number is good or we should use something
else. Also, the rule for the<br /> > last value is "at least 1% of the non-null samples".  That's a pure guess<br />
>as well.<br /> ><br /> > I do not have any good corpuses of data to try this on.  Can folks who<br /> >
havebeen following this thread try it on their data and see how it<br /> > does?  Also please try some other
multipliersbesides 1.5, so we can<br /> > get a feeling for where that cutoff should be placed.<p dir="ltr">Expect
meto run it on my pet db early next week. :-) <p dir="ltr">Many thanks!<br /> --<br /> Alex<br /> 

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Batch update of indexes
Next
From: Konstantin Knizhnik
Date:
Subject: Re: Batch update of indexes