Re: SoC Ideas for people looking for projects - Mailing list pgsql-general

From Chris Browne
Subject Re: SoC Ideas for people looking for projects
Date
Msg-id 60odmmjzu8.fsf@dba2.int.libertyrms.com
Whole thread Raw
In response to Reminder: only 5 days left to submit SoC applications  (Josh Berkus <josh@agliodbs.com>)
List pgsql-general
me@benjaminarai.com (Benjamin Arai) writes:
> If you are looking for a SoC idea, I have listed a couple below.  I
> am not sure how good of an idea they are but I have ran into the
> following limitations and probably other people have as well in the
> past.

Actually, I have a thought on a SoC idea...

The general notion would be to try to come up with some more rational
information on setting the default column statistics width.

http://www.postgresql.org/docs/8.2/interactive/runtime-config-query.html#GUC-DEFAULT-STATISTICS-TARGET
http://www.postgresql.org/docs/8.2/interactive/planner-stats.html

Now, the default value has long been 10.  There are cases where people
find they need to set it higher; that has always been pretty
trial-and-error.

My suspicion is that:

a) The default should probably be a bit higher than 10

b) Some analysis of stats and schema on an individual table could
perhaps provide more specific values for specific columns.

 - Data type might provide guidance; there's little need for >3 values on
   a binary column, for instance.

 - If there is a NOT NULL UNIQUE constraint on a column, that might
   suggest > 10 values

 - If the column is known to have 150 unique values, that might
   suggest SET STATISTICS 150

   It might be worth looking at the *least* frequently occuring
   values, and set stats high enough to make it likely that at least
   one such value would be pulled in...

 - Some kinds of values (dates, floats) are sorta continuous in value;
   having 10 bins may be pretty OK for such

There are probably some other heuristics to be had; this is just some
ideas off the top of my head.

Nobody has gone through any sort of real analysis of this; there
likely is merit to doing so...
--
let name="cbbrowne" and tld="cbbrowne.com" in name ^ "@" ^ tld;;
http://cbbrowne.com/info/finances.html
Where do you  *not* want to go today?  "Confutatis maledictis, flammis
acribus addictis" (<http://www.hex.net/~cbbrowne/msprobs.html>

pgsql-general by date:

Previous
From: "Bill Eaton"
Date:
Subject: best way to kill long running query?
Next
From: Magnus Hagander
Date:
Subject: Re: to_tsvector in 8.2.3