Thread: Query Hints? No thanks. Data hints?

Query Hints? No thanks. Data hints?

From
Dimitri Fontaine
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi -hackers,

In another thread about "GUC parameter cursors_tuple_fraction", the  
debate seems to drift onto query hints. About which the consensus here  
is pretty clear and strong, no query hints in PostgreSQL, thanks, or  
we're never gonna have a perfect generic planner.

IIRC, I've read here in the past some attempts to begin a proposal on  
the topic of data hints, allowing the user to describe his data in a  
way ANALYZE can't figure out by itself, as e.g. "this column value is  
tied to this other column value in this way".
This could be a materialized column, mutual-exclusive NOT NULLs, or  
any multi-columns relationships, as well as "this table is a fact  
table", etc.

What do you -hackers think about such a plan: - assess cases where the planner is failing short of good statistics
 - assess data properties SQL does not give us but would be of  
interrest   to internals, and at the same time not so difficult to know about  
by DBAs
 - based on this, prepare a descriptive language of some sort tying  
this all in
 - implement it in a good way ;)

I'm thinking we could have a new set of commands to tell PostgreSQL  
some "high-level" facts about the data, e.g. "there's a injective  
function such as f(t.colA) = t.colB" or any useful thing to be found  
in the firsts proposed step.

Is there a chance we're gonna improve the planner this way? And answer  
Simon's (and many others here and there, -performance etc) concerns?

HTH, regards,
- --
dim
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iEYEARECAAYFAkgeEjoACgkQlBXRlnbh1bnlHwCfcHL5uOlCpptekwLBMp+E9kUn
4roAoMfwdITByHtxCi35l9jDCTSFw2Ho
=whVn
-----END PGP SIGNATURE-----


Re: Query Hints? No thanks. Data hints?

From
Martijn van Oosterhout
Date:
On Sun, May 04, 2008 at 09:44:58PM +0200, Dimitri Fontaine wrote:
> IIRC, I've read here in the past some attempts to begin a proposal on
> the topic of data hints, allowing the user to describe his data in a
> way ANALYZE can't figure out by itself, as e.g. "this column value is
> tied to this other column value in this way".

There has been talk about this in the past, things like being able to
store cross-table correlations which isn't know at all. The problem is
two-fold:

- How do you calculate the stats?
- How do you use them?

Cross-table correlations are easy for the second part, because it's
fairly simple to see where it could be used. However, no-one has come
up with an algorithm to produce a useful number to use. For others it's
harder.

> I'm thinking we could have a new set of commands to tell PostgreSQL
> some "high-level" facts about the data, e.g. "there's a injective
> function such as f(t.colA) = t.colB" or any useful thing to be found
> in the firsts proposed step.

In general postgres could use many bits of information not currently
available. For example: A=B implies lower(A)=lower(B), hence an index
on lower(A) could be used to optimise comparisons against A. Certain
operations preserve order, which may also be useful.

You can always propose something, see where it goes.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while
> boarding. Thank you for flying nlogn airlines.

Re: Query Hints? No thanks. Data hints?

From
Peter Eisentraut
Date:
Am Dienstag, 6. Mai 2008 schrieb Martijn van Oosterhout:
> Cross-table correlations are easy for the second part, because it's
> fairly simple to see where it could be used. However, no-one has come
> up with an algorithm to produce a useful number to use. For others it's
> harder.

For an algorithm, principal components analysis would be my guess.  It is 
designed to answer the question "this column value is tied to this other 
column value in this way" [quote Simon], at least for the sort of data that a 
B-tree would cover.  For nonlinear data, it is of course harder.

> In general postgres could use many bits of information not currently
> available. For example: A=B implies lower(A)=lower(B), hence an index
> on lower(A) could be used to optimise comparisons against A. Certain
> operations preserve order, which may also be useful.

Locale horrors looming ... ;-)