Re: Collect frequency statistics for arrays - Mailing list pgsql-hackers

From Alexander Korotkov
Subject Re: Collect frequency statistics for arrays
Date
Msg-id CAPpHfdsqAaWN4fqzLXK-fmY3VeXti1=i0Q-DEsyP2VrOcnSzuw@mail.gmail.com
Whole thread Raw
In response to Re: Collect frequency statistics for arrays  (Noah Misch <noah@leadboat.com>)
Responses Re: Collect frequency statistics for arrays  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Mon, Jan 23, 2012 at 7:58 PM, Noah Misch <noah@leadboat.com> wrote:
> +     /* Take care about events with low probabilities. */
> +     if (rest > DEFAULT_CONTAIN_SEL)
> +     {

Why the change from "rest > 0" to this in the latest version?
Ealier addition of "rest" distribution require O(m) time. Now there is a more accurate and proved estimate, but it takes O(m^2) time.It doesn't make general assymptotical time worse, but it significant. That's why I decided to skip for low values of "rest" which don't change distribution significantly.
 

> +             /* emit some statistics for debug purposes */
> +             elog(DEBUG3, "array: target # mces = %d, bucket width = %d, "
> +                      "# elements = %llu, hashtable size = %d, usable entries = %d",
> +                      num_mcelem, bucket_width, element_no, i, track_len);

That should be UINT64_FMT.  (I introduced that error in v0.10.)


I've attached a new version that includes the UINT64_FMT fix, some edits of
your newest comments, and a rerun of pgindent on the new files.  I see no
other issues precluding commit, so I am marking the patch Ready for Committer.
Great!
 
If I made any of the comments worse, please post another update.
Changes looks reasonable for me. Thanks!

------
With best regards,
Alexander Korotkov.

pgsql-hackers by date:

Previous
From: Marko Kreen
Date:
Subject: Re: GUC_REPORT for protocol tunables was: Re: Optimize binary serialization format of arrays with fixed size elements
Next
From: Merlin Moncure
Date:
Subject: Re: Multithread Query Planner