Re: Collect frequency statistics for arrays - Mailing list pgsql-hackers

From Alexander Korotkov
Subject Re: Collect frequency statistics for arrays
Date
Msg-id CAPpHfdtwyn7QFy-1mo+fHHof=Bf8BTdbOO84a_aH9gGLXTmb0w@mail.gmail.com
Whole thread Raw
In response to Re: Collect frequency statistics for arrays  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Collect frequency statistics for arrays  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
<div class="gmail_quote">On Mon, Mar 5, 2012 at 1:11 AM, Tom Lane <span dir="ltr"><<a
href="mailto:tgl@sss.pgh.pa.us"target="_blank">tgl@sss.pgh.pa.us</a>></span> wrote:<br /><blockquote
class="gmail_quote"style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> BTW, one other thing about
thecount histogram: seems like we are<br /> frequently generating uselessly large ones.  For instance, do ANALYZE<br />
inthe regression database and then run<br /><br /> select tablename,attname,elem_count_histogram from pg_stats<br />
 whereelem_count_histogram is not null;<br /><br /> You get lots of entries that look like this:<br /><br />  pg_proc  
         | proallargtypes |
{1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,5,5,5,6,6,6,2.80556}<br
/> pg_proc             | proargmodes    |
{1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,1.61111}<br
/> pg_proc             | proargnames    |
{1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,5,5,5,5,5,5,5,5,6,6,6,7,7,7,7,8,8,8,14,14,15,16,3.8806}<br
/> pg_proc             | proconfig      |
{1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1}<br
/> pg_class            | reloptions     |
{1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1}<br
/><br/> which seems to me to be a rather useless expenditure of space.<br /> Couldn't we reduce the histogram size when
therearen't many<br /> different counts?<br /><br /> It seems fairly obvious to me that we could bound the histogram<br
/>size with (max count - min count + 1), but maybe something even<br /> tighter would work; or maybe I'm missing
somethingand this would<br /> sacrifice accuracy.<br /></blockquote><div class="gmail_quote"><br /></div><div
class="gmail_quote">True.If (max count - min count + 1) is small, enumerating of frequencies is both more compact and
moreprecise representation. Simultaneously, if (max count - min count + 1) is large, we can run out of
statistics_targetwith such representation. We can use same representation of count distribution as for scalar column
value: MCVand HISTOGRAM, but it would require additional statkind and statistics slot. Probably, you've better
ideas?</div><br/>------<br />With best regards,<br />Alexander Korotkov.</div> 

pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: review: CHECK FUNCTION statement
Next
From: "Kevin Grittner"
Date:
Subject: Re: autovacuum locks