Home > mailing lists

Re: WIP: cross column correlation ... - Mailing list pgsql-hackers

From	Greg Stark
Subject	Re: WIP: cross column correlation ...
Date	February 26, 2011 14:45:41
Msg-id	AANLkTinD_vPt_d5tzGKXfJURYzq5=5mCa8K8_G1Sr8+O@mail.gmail.com Whole thread
In response to	Re: WIP: cross column correlation ... (PostgreSQL - Hans-Jürgen Schönig<postgres@cybertec.at>)
Responses	Re: WIP: cross column correlation ...
List	pgsql-hackers

Tree view

2011/2/26 PostgreSQL - Hans-Jürgen Schönig <postgres@cybertec.at>:
> what we are trying to do is to explicitly store column correlations. so, a histogram for (a, b) correlation and so
on.
>

The problem is that we haven't figured out how to usefully store a
histogram for <a,b>. Consider the oft-quoted example of a
<city,postal-code>  -- or <city,zip code> for Americans. A histogram
of the tuple is just the same as a histogram on the city. It doesn't
tell you how much extra selectivity the postal code or zip code gives
you. And if you happen to store a histogram of <postal code, city> by
mistake then it doesn't tell you anything at all.

We need a data structure that lets us answer the bayesian question
"given a city of New York how selective is zip-code = 02139". I don't
know what that data structure would be.

Heikki and I had a wacky hand-crafted 2D histogram data structure that
I suspect doesn't actually work. And someone else did some research on
list and came up with a fancy sounding name of a statistics concept
that might be what we want.

--
greg

pgsql-hackers by date:

From: Nick Raj
Date: 26 February 2011, 14:43:37
Subject: Spatio-Temporal Functions

From: Martijn van Oosterhout
Date: 26 February 2011, 14:58:57
Subject: Re: WIP: cross column correlation ...

Re: WIP: cross column correlation ... - Mailing list pgsql-hackers

Previous

Next