Re: [HACKERS] optimizer and type question - Mailing list pgsql-hackers

From Tom Lane
Subject Re: [HACKERS] optimizer and type question
Date
Msg-id 23292.922151563@sss.pgh.pa.us
Whole thread Raw
In response to optimizer and type question  (Erik Riedel <riedel+@CMU.EDU>)
Responses Re: [HACKERS] optimizer and type question
Re: [HACKERS] optimizer and type question
List pgsql-hackers
Erik Riedel <riedel+@CMU.EDU> writes:
> [ optimizer doesn't find relevant pg_statistic entry ]

It's clearly a bug that the selectivity code is not finding this tuple.
If your analysis is correct, then selectivity estimation has *never*
worked properly, or at least not in recent memory :-(.  Yipes.
Bruce and I found a bunch of other problems in the optimizer recently,
so it doesn't faze me to assume that this is broken too.

> the "offending" line is setting the staop to InvalidOid (i.e. 0).
> Question 2 - is this right?  Is the intent for 0 to serve as a
> "wildcard",

My thought is that what the staop column ought to be is the OID of the
comparison function that was used to determine the sort order of the
column.  Without a sort op the lowest and highest keys in the column are
not well defined, so it makes no sense to assert "these are the lowest
and highest values" without providing the sort op that determined that.
(For sufficiently complex data types one could reasonably have multiple
ordering operators.  A crude example is sorting on "circumference" and
"area" for polygons.)  But typically the sort op will be the "<"
operator for the column data type.

So, the vacuum code is definitely broken --- it's not storing the sort
op that it used.  The code in gethilokey might be broken too, depending
on how it is producing the operator it's trying to match against the
tuple.  For example, if the actual operator in the query is any of
< <= > >= on int4, then int4lt ought to be used to probe the pg_statistic
table.  I'm not sure if we have adequate info in pg_operator or pg_type
to let the optimizer code determine the right thing to probe with :-(

> The immediate next thing that intltsel() does, near lines 122
> in selfuncs.c is call atol() on the strings from gethilokey().  And
> guess what it comes up with?
> low = 1
> high = 12
> because it calls atol() on '01-02-1992' and '12-01-1998'.  This
> clearly isn't right, it should get some large integer that includes
> the year and day in the result.  Then it should compare reasonably
> with my constant from the where clause and give a decent selectivity
> value.  This leads to a re-visit of Question 1.
> Question 4 - should date "<=" use a dateltsel() function instead of
> intltsel() as oprrest?

This is clearly busted as well.  I'm not sure that creating dateltsel()
is the right fix, however, because if you go down that path then every
single datatype needs its own selectivity function; that's more than we
need.

What we really want here is to be able to map datatype values into
some sort of numeric range so that we can compute what fraction of the
low-key-to-high-key range is on each side of the probe value (the
constant taken from the query).  This general concept will apply to
many scalar types, so what we want is a type-specific mapping function
and a less-specific fraction-computing-function.  Offhand I'd say that
we want intltsel() and floatltsel(), plus conversion routines that can
produce either int4 or float8 from a data type as seems appropriate.
Anything that couldn't map to one or the other would have to supply its
own selectivity function.

> Or is the problem in the
> way attribute values are stored in pg_statistic by vacuum analyze?

Looks like it converts the low and high values to text and stores them
that way.  Ugly as can be :-( but I'm not sure there is a good
alternative.  We have no "wild card" column type AFAIK, which is what
these columns of pg_statistic would have to be to allow storage of
unconverted min and max values.

I think you've found a can of worms here.  Congratulations ;-)
        regards, tom lane


pgsql-hackers by date:

Previous
From: Taral
Date:
Subject: Re: [HACKERS] optimizer and type question
Next
From: Bruce Momjian
Date:
Subject: Re: [HACKERS] optimizer and type question