Re: Slow query postgres 8.3 - Mailing list pgsql-performance

From Claudio Freire
Subject Re: Slow query postgres 8.3
Date
Msg-id BANLkTimTe1pWT9sdHDe8+nwqpOckqb6ycg@mail.gmail.com
Whole thread Raw
In response to Re: Slow query postgres 8.3  (Tomas Vondra <tv@fuzzy.cz>)
List pgsql-performance
On Wed, Apr 13, 2011 at 10:16 PM, Tomas Vondra <tv@fuzzy.cz> wrote:
> You mean Support Vector Classifiers? Interesting idea, although I don't
> see how to apply that to query planning, especially with non-numeric
> inputs. Could you share more details on that statistical system and how
> do you think it could be applied in the pgsql world?

Well, in my case, the data was a simple list of attributes. You either
had them or not, and data was very sparse, so the task was to fill the
missing bits.

For that, what I did is take a training set of data, and each time I
wanted to know the likelihood of having a certain attribute I would
compute the conditional probability given the training data -
conditional on a set of other data.

So, for postgres, if I had an index over a few columns of booleans
(yea, bare with me) (a,b,c,d), and I wanted to know the selectivity of
"where a", IF i already accounted for "where b" then I'd pick my
training data and count how many of those that have b have also a. So
P(a if b).

Of course, my application had to handle thousands of attributes, so I
couldn't apply conditional distributions on everything, I'd pick the
conditional part (if b) to something that selected an "appropriately
sized" sample from my training data.

All that's very expensive.

So I thought... what about replacing that with an SVC - train an SVC
or SVR model for a, taking b, c, d as parameters. I never had the
oportunity to test the idea, but the SVC variant would probably be
usable by postgres, since all you need to know is b, c, d, they don't
need to be booleans, or scalars in fact, SVCs are very flexible.
Unknown values could easily be compensated for, with some cleverness.
The tough part is, of course, training the SVC, picking the kind of
SVC to use, and storing it into stats tables during analyze. Oh, and
hoping it doesn't make fatal mistakes.

I know, the idea is very green, but it would be a fun project - cough GSoC ;-)

pgsql-performance by date:

Previous
From: Claudio Freire
Date:
Subject: Re: Performance
Next
From: Tom Lane
Date:
Subject: Re: Performance