On Mon, May 16, 2022 at 12:09:41AM +0200, Tomas Vondra wrote:
> I think it's an interesting idea. In principle it allows deducing the
> multi-column MCV for arbitrary combination of columns, not determined in
> advance. We'd have the MCV with HLL instead of frequencies for columns
> A, B and C:
>
> (a1, hll(a1))
> (a2, hll(a2))
> (...)
> (aK, hll(aK))
>
>
> (b1, hll(b1))
> (b2, hll(b2))
> (...)
> (bL, hll(bL))
>
> (c1, hll(c1))
> (c2, hll(c2))
> (...)
> (cM, hll(cM))
>
> and from this we'd be able to build MCV for any combination of those
> three columns.
Sorry, but I am lost here. I read about HLL here:
https://towardsdatascience.com/hyperloglog-a-simple-but-powerful-algorithm-for-data-scientists-aed50fe47869
However, I don't see how they can be combined for multiple columns.
Above, I know A,B,C are columns, but what is a1, a2, etc?
--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com
Indecision is a decision. Inaction is an action. Mark Batterson