Re: Collect frequency statistics for arrays - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Collect frequency statistics for arrays
Date
Msg-id 13539.1330797856@sss.pgh.pa.us
Whole thread Raw
In response to Re: Collect frequency statistics for arrays  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
... BTW, could you explain exactly how that "Fill histogram by hashtab"
loop works?  It's way too magic for my taste, and does in fact have bugs
in the currently submitted patch.  I've reworked it to this:
           /* Fill histogram by hashtab. */           delta = analyzed_rows - 1;           count_item_index = 0;
  frac = sorted_count_items[0]->frequency * (num_hist - 1);           for (i = 0; i < num_hist; i++)           {
      while (frac <= 0)               {                   count_item_index++;                   Assert(count_item_index
<count_items_count);                   frac += sorted_count_items[count_item_index]->frequency * (num_hist - 1);
      }               hist[i] = sorted_count_items[count_item_index]->count;               frac -= delta;           }
       Assert(count_item_index == count_items_count - 1);
 

The asserts don't fire in any test case I've tried, which seems to
indicate that it *does* work in the sense that the first histogram entry
is always the smallest count and the last histogram entry is always
the largest one.  But it's extremely unclear why it manages to stop
exactly at the last count_items array entry, or for that matter why it's
generating a representative histogram at all.  I'm suspicious that the
"-1" bits represent off-by-one bugs.

I also don't especially like the fact that "frac" is capable of
overflowing (since worst case frequency is 300 * 10000 and worst case
num_hist is 10000, with the current limits on statistics_target).
We could work around that by doing the frac arithmetic in int64, but I
wonder whether that couldn't be avoided.  In any case, first I'd like
an explanation why this code works at all.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Thom Brown
Date:
Subject: Re: Command Triggers, patch v11
Next
From: "Kevin Grittner"
Date:
Subject: Re: Command Triggers, patch v11