Re: Implement missing join selectivity estimation for range types - Mailing list pgsql-hackers
| From | Haibo Yan |
|---|---|
| Subject | Re: Implement missing join selectivity estimation for range types |
| Date | |
| Msg-id | CABXr29HuYbrtrThYaE8GYy-wMxtej8sG4Hw+2oSfQvuM+c0XVw@mail.gmail.com Whole thread |
| In response to | Re: Implement missing join selectivity estimation for range types (SCHOEMANS Maxime <maxime.schoemans@ulb.be>) |
| Responses |
Re: Implement missing join selectivity estimation for range types
|
| List | pgsql-hackers |
Hi Haibo,Thank you for the review.> One small note: the localized bool join_is_reversed; in my version was> intentional. I left it that way because get_join_variables() wants a> storage location, and I preferred to keep that use local and explicit> rather than trying to reshape things around it.Fair point. I moved it out of the bare block because it looked unusual,but I can change it back if you prefer.> For patch 2, I am less convinced, especially for &&.> [...]> for multiranges there is a third possibility: neither side is entirely> left nor entirely right, and yet they still do not overlap because of> an internal gap.This is a valid concern, but it is an existing limitation of multirangestatistics, not something we are introducing. The existing restrictionselectivity code in multirangetypes_selfuncs.c already uses the sameNOT(<<) AND NOT(>>) decomposition for && on multiranges. Andmultirange_typanalyze explicitly says:/* Treat multiranges like a big range without gaps. */The statistics only store the outermost bounds, so the gap informationis already lost before our estimator sees it. The multirange GiSTopclass does the same (stores the bounding range). Our join estimatoris just consistent with how multiranges are handled elsewhere.The alternative is falling back to the 0.005 default, which will almostcertainly be worse. Would a comment explaining the limitation be enough?
Thanks, that is a fair point.
I agree that this is not something patch 2 is uniquely introducing. If the
existing multirange statistics and restriction selectivity already treat a
multirange essentially as its outer bounds, then it makes sense that the
join estimator can only work within that same approximation.
So I am less worried about this as a correctness objection than I was at
first. My main concern is really about making that limitation explicit,
especially for &&, where internal gaps can matter a lot for the real
overlap semantics.
I think it would help if patch 2 said this a bit more directly, both in
the code comments and in the patch description. Something along the lines
of:
- this reuses the same outer-bounds approximation already used by existing
multirange statistics / restriction selectivity
- internal gaps are not represented in the available stats
- so && for sparse multiranges may still be overestimated in some cases
- but this is still expected to be better than falling back to a fixed default selectivity
> For patch 3, I agree with the motivation [...] But I am not convinced> that exporting those helpers via selfuncs.h is the right boundary.> My preference would be something tighter: [...] a backend-private> internal header just for the range-family selfuncs codeGood point about the visibility. I'll move the declarations to aseparate backend-private header in the next version.Regards,Maxime
If you are willing to add that clarification, I think that would address
most of my concern here.
Regards,
pgsql-hackers by date: