OK, time to revive this old thread ...
On 09/23/2017 05:27 PM, Tom Lane wrote:
> Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>>>> [ scalarineqsel may fall over when used by extension operators ]
>
>> What about using two-pronged approach:
>
>> 1) fall back to mid bucket in back branches (9.3 - 10)
>> 2) do something smarter (along the lines you outlined) in PG11
>
> Sure. We need to test the fallback case anyway.
>
Attached is a minimal fix adding a flag to convert_numeric_to_scalar,
tracking when it fails because of unsupported data type. If any of the 3
calls (value + lo/hi boundaries) sets it to 'true' we simply fall back
to the default estimate (0.5) within the bucket.
>>> [ sketch of a more extensible design ]
>
>> Sounds reasonable to me, I guess - I can't really think about anything
>> simpler giving us the same flexibility.
>
> Actually, on further thought, that's still too simple. If you look
> at convert_string_to_scalar() you'll see it's examining all three
> values concurrently (the probe value, of one datatype, and two bin
> boundary values of possibly a different type). The reason is that
> it's stripping off whatever common prefix those strings have before
> trying to form a numeric equivalent. While certainly
> convert_string_to_scalar is pretty stupid in the face of non-ASCII
> sort orders, the prefix-stripping is something I really don't want
> to give up. So the design I sketched of considering each value
> totally independently isn't good enough.
>
> We could, perhaps, provide an API that lets an operator estimation
> function replace convert_to_scalar in toto, but that seems like
> you'd still end up duplicating code in many cases. Not sure about
> how to find a happy medium.
>
I plan to work on this improvement next, once I polish a couple of other
patches for the upcoming commit fest.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services