Re: WIP: multivariate statistics / proof of concept - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: WIP: multivariate statistics / proof of concept
Date
Msg-id 8106e11197849725375a933e1cc1409f.squirrel@2.emaily.eu
Whole thread Raw
In response to Re: WIP: multivariate statistics / proof of concept  (Katharina Büchse<katharina.buechse@uni-jena.de>)
Responses Re: WIP: multivariate statistics / proof of concept  (Kevin Grittner <kgrittn@ymail.com>)
List pgsql-hackers
Dne 13 Listopad 2014, 16:51, Katharina Büchse napsal(a):
> On 13.11.2014 14:11, Tomas Vondra wrote:
>
>> The only place where I think this might work are the associative rules.
>> It's simple to specify rules like ("ZIP code" implies "city") and we
>> could
>> even do some simple check against the data to see if it actually makes
>> sense (and 'disable' the rule if not).
>
> and even this simple example has its limits, at least in Germany ZIP
> codes are not unique for rural areas, where several villages have the
> same ZIP code.
>
> I guess there are just a few examples where columns are completely
> functional dependent without any exceptions.
> But of course, if the user gives this information just for optimization
> the statistics, some exceptions don't matter.
> If this information should be used for creating different execution
> plans (e.g. on column A is an index and column B is functional
> dependent, one could think about using this index on A and the
> dependency instead of running through the whole table to find all tuples
> that fit the query on column B), exceptions are a very important issue.

Yes, exactly. The aim of this patch is "only" improving estimates, not
removing conditions from the plan (e.g. checking only the ZIP code and not
the city name). That certainly can't be done solely based on approximate
statistics, and as you point out most real-world data either contain bugs
or are inherently imperfect (we have the same kind of ZIP/city
inconsistencies in Czech). That's not a big issue for estimates (assuming
only small fraction of rows violates the rule) though.

Tomas




pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: controlling psql's use of the pager a bit more
Next
From: Michael Banck
Date:
Subject: Re: controlling psql's use of the pager a bit more