Re: Implement missing join selectivity estimation for range types - Mailing list pgsql-hackers

From Haibo Yan
Subject Re: Implement missing join selectivity estimation for range types
Date
Msg-id CABXr29HuYbrtrThYaE8GYy-wMxtej8sG4Hw+2oSfQvuM+c0XVw@mail.gmail.com
Whole thread
In response to Re: Implement missing join selectivity estimation for range types  (SCHOEMANS Maxime <maxime.schoemans@ulb.be>)
Responses Re: Implement missing join selectivity estimation for range types
List pgsql-hackers
On Wed, Apr 15, 2026 at 8:13 AM SCHOEMANS Maxime <maxime.schoemans@ulb.be> wrote:
Hi Haibo,

Thank you for the review.

> One small note: the localized bool join_is_reversed; in my version was
> intentional. I left it that way because get_join_variables() wants a
> storage location, and I preferred to keep that use local and explicit
> rather than trying to reshape things around it.

Fair point. I moved it out of the bare block because it looked unusual,
but I can change it back if you prefer.

> For patch 2, I am less convinced, especially for &&.
> [...]
> for multiranges there is a third possibility: neither side is entirely
> left nor entirely right, and yet they still do not overlap because of
> an internal gap.

This is a valid concern, but it is an existing limitation of multirange
statistics, not something we are introducing. The existing restriction
selectivity code in multirangetypes_selfuncs.c already uses the same
NOT(<<) AND NOT(>>) decomposition for && on multiranges. And
multirange_typanalyze explicitly says:

    /* Treat multiranges like a big range without gaps. */

The statistics only store the outermost bounds, so the gap information
is already lost before our estimator sees it. The multirange GiST
opclass does the same (stores the bounding range). Our join estimator
is just consistent with how multiranges are handled elsewhere.

The alternative is falling back to the 0.005 default, which will almost
certainly be worse. Would a comment explaining the limitation be enough?

Thanks, that is a fair point.

I agree that this is not something patch 2 is uniquely introducing. If the 

existing multirange statistics and restriction selectivity already treat a

multirange essentially as its outer bounds, then it makes sense that the

join estimator can only work within that same approximation.

So I am less worried about this as a correctness objection than I was at

first. My main concern is really about making that limitation explicit,

especially for &&, where internal gaps can matter a lot for the real

overlap semantics.

I think it would help if patch 2 said this a bit more directly, both in

the code comments and in the patch description. Something along the lines

of:

  • this reuses the same outer-bounds approximation already used by existing
multirange statistics / restriction selectivity
  • internal gaps are not represented in the available stats
  • so && for sparse multiranges may still be overestimated in some cases
  • but this is still expected to be better than falling back to a fixed default selectivity
> For patch 3, I agree with the motivation [...] But I am not convinced
> that exporting those helpers via selfuncs.h is the right boundary.
> My preference would be something tighter: [...] a backend-private
> internal header just for the range-family selfuncs code

Good point about the visibility. I'll move the declarations to a
separate backend-private header in the next version.

Regards,
Maxime

If you are willing to add that clarification, I think that would address

most of my concern here.

Regards,

Haibo  

pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: Use XLogRecPtrIsValid() instead of negated XLogRecPtrIsInvalid
Next
From: Bharath Rupireddy
Date:
Subject: Re: Introduce XID age based replication slot invalidation