Home > mailing lists

Re: Collect frequency statistics for arrays - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: Collect frequency statistics for arrays
Date	March 8, 2012 12:31:11
Msg-id	12427.1331224252@sss.pgh.pa.us Whole thread Raw
In response to	Re: Collect frequency statistics for arrays (Noah Misch <noah@leadboat.com>)
Responses	Re: Collect frequency statistics for arrays
List	pgsql-hackers

Tree view

Noah Misch <noah@leadboat.com> writes:
> On Wed, Mar 07, 2012 at 07:51:42PM -0500, Tom Lane wrote:
>> On reflection my idea above is wrong; for example assume that we have a
>> column with 900 arrays of length 1 and 100 arrays of length 2.  Going by
>> what I said, we'd reduce the histogram to {1,2}, which might accurately
>> capture the set of lengths present but fails to show that 1 is much more
>> common than 2.  However, a histogram {1,1,1,1,1,1,1,1,1,2} (ten entries)
>> would capture the situation perfectly in one-tenth the space that the
>> current logic does.

> Granted.  When the next sample finds 899/101 instead, though, the optimization
> vanishes.

No, you missed my next point.  That example shows that sometimes a
smaller histogram can represent the situation with zero error, but in
all cases a smaller histogram can represent the situation with perhaps
more error than a larger one.  Since we already have a defined error
tolerance, we should try to generate a histogram that is as small as
possible while still not exceeding the error tolerance.

Now, it might be that doing that is computationally impractical, or
too complicated to be reasonable.  But it seems to me to be worth
looking into.

> If you want to materially narrow typical statistics, Alexander's
> proposal looks like the way to go.  I'd guess array columns always
> having DEC <= default_statistics_target are common enough to make that
> representation the dominant representation, if not the only necessary
> representation.

Well, I don't want to have two representations; I don't think it's worth
the complexity.  But certainly we could consider switching to a
different representation if it seems likely to usually be smaller.
        regards, tom lane

pgsql-hackers by date:

From: "Kevin Grittner"
Date: 08 March 2012, 12:30:19
Subject: Re: sortsupport for text

From: Peter Geoghegan
Date: 08 March 2012, 12:43:20
Subject: Re: pg_stat_statements and planning time

Re: Collect frequency statistics for arrays - Mailing list pgsql-hackers

Previous

Next