Re: Implement missing join selectivity estimation for range types - Mailing list pgsql-hackers

From SCHOEMANS Maxime
Subject Re: Implement missing join selectivity estimation for range types
Date
Msg-id DB8P190MB0731EA1B8381F48AA15B9437F0222@DB8P190MB0731.EURP190.PROD.OUTLOOK.COM
Whole thread
In response to Re: Implement missing join selectivity estimation for range types  (Haibo Yan <tristan.yim@gmail.com>)
Responses Re: Implement missing join selectivity estimation for range types
List pgsql-hackers
Hi Haibo,

Thank you for the review.

> One small note: the localized bool join_is_reversed; in my version was
> intentional. I left it that way because get_join_variables() wants a
> storage location, and I preferred to keep that use local and explicit
> rather than trying to reshape things around it.

Fair point. I moved it out of the bare block because it looked unusual,
but I can change it back if you prefer.

> For patch 2, I am less convinced, especially for &&.
> [...]
> for multiranges there is a third possibility: neither side is entirely
> left nor entirely right, and yet they still do not overlap because of
> an internal gap.

This is a valid concern, but it is an existing limitation of multirange
statistics, not something we are introducing. The existing restriction
selectivity code in multirangetypes_selfuncs.c already uses the same
NOT(<<) AND NOT(>>) decomposition for && on multiranges. And
multirange_typanalyze explicitly says:

    /* Treat multiranges like a big range without gaps. */

The statistics only store the outermost bounds, so the gap information
is already lost before our estimator sees it. The multirange GiST
opclass does the same (stores the bounding range). Our join estimator
is just consistent with how multiranges are handled elsewhere.

The alternative is falling back to the 0.005 default, which will almost
certainly be worse. Would a comment explaining the limitation be enough?

> For patch 3, I agree with the motivation [...] But I am not convinced
> that exporting those helpers via selfuncs.h is the right boundary.
> My preference would be something tighter: [...] a backend-private
> internal header just for the range-family selfuncs code

Good point about the visibility. I'll move the declarations to a
separate backend-private header in the next version.

Regards,
Maxime

pgsql-hackers by date:

Previous
From: Ashutosh Bapat
Date:
Subject: Re: Typo fixes in Graph table files
Next
From: Robert Haas
Date:
Subject: submissions for real-time patch evaluation panel