Re: Implement missing join selectivity estimation for range types - Mailing list pgsql-hackers

From Haibo Yan
Subject Re: Implement missing join selectivity estimation for range types
Date
Msg-id CABXr29Gts+FuMRkeA=JNAxB700W1EQW2F=W2wAjtJrGZDGTxGw@mail.gmail.com
Whole thread
In response to Re: Implement missing join selectivity estimation for range types  (SCHOEMANS Maxime <maxime.schoemans@ulb.be>)
Responses Re: Implement missing join selectivity estimation for range types
List pgsql-hackers


On Tue, Apr 14, 2026 at 7:03 AM SCHOEMANS Maxime <maxime.schoemans@ulb.be> wrote:
Hi Haibo,

Thank you for picking this up again. I agree with the changes you made
in v5, in particular scoping the patch to the three strict operators and
reworking the tests to check plan structure rather than exact row counts.

Attached is v6 as a 3-patch series building on your v5.

Patch 1 is your range join selectivity patch with one small change: the
range_cmp_bounds result in the merge walk is stored in a local cmp
variable to avoid calling it twice per iteration, as jian he suggested.

Hi Maxime,

Thank you for working on this and for building on top of v5. 

I think patch 1 looks good to me. I do not see major issues there. One small note: the localized bool join_is_reversed; in my version was intentional. I left it that way because get_join_variables() wants a storage location, and I preferred to keep that use local and explicit rather than trying to reshape things around it.


Patch 2 adds the same estimation for multirange types, covering all type
combinations (multirange x multirange, multirange x range, range x
multirange). Since both range and multirange types use the same bound
histogram format and the same RangeBound representation, the core
algorithm is identical.

For patch 2, I am less convinced, especially for &&.

My concern is not so much whether the code runs, but whether the semantic argument is strong enough for the mixed range/multirange cases. The patch assumes that because range and multirange use the same bounds histogram format and the same RangeBound representation, the same estimator can be applied directly. I think that argument is much easier to make for << and >> than for &&.

For single ranges, && works nicely with the usual decomposition because if two single ranges do not overlap, then one must be entirely to the left or entirely to the right of the other. But for multiranges there is a third possibility: neither side is entirely left nor entirely right, and yet they still do not overlap because of an internal gap.

For example: 

A = {[1,2), [100,101)}

B = [50,60)

Here:

A << B is false
A >> B is false
A && B is also false

So for multiranges, “not left and not right” does not imply overlap in the same way it does for single ranges. That makes me worry that reusing the same estimator logic for &&, especially in mixed range/multirange cases, may overestimate overlap because the overall lower/upper bounds do not capture the internal holes.

So I think patch 2 still needs a stronger justification there, and probably more targeted tests around sparse multiranges / hole cases, especially for &&.


Patch 3 removes the duplication between rangetypes_selfuncs.c and
multirangetypes_selfuncs.c that Tom raised as a concern. It makes the
10 shared helper functions non-static, exports them via selfuncs.h,
and deletes the copies from the multirange file. This covers all the
pre-existing duplication between the two files, not just the functions
added in this patch set.

Regards,
Maxime

For patch 3, I agree with the motivation: the duplication between rangetypes_selfuncs.c and multirangetypes_selfuncs.c is not ideal. But I am not convinced that exporting those helpers via selfuncs.h is the right boundary.

My preference would be something tighter:

  • keep the shared helper implementations in one place

  • add a backend-private internal header just for the range-family selfuncs code

  • include that internal header from rangetypes_selfuncs.c and multirangetypes_selfuncs.c

  • avoid widening visibility by turning a group of file-local helpers into broader extern declarations in selfuncs.h

In other words, I agree that the duplication should be removed, but I think a backend-private internal header should be enough for that goal. I do not think we need to expand visibility more than necessary by moving these helpers out of the file-private space into a broader interface. 

Thanks again for working on this.

Regards,

Haibo

pgsql-hackers by date:

Previous
From: Richard Guo
Date:
Subject: Re: var_is_nonnullable() fails to handle invalid NOT NULL constraints
Next
From: Chao Li
Date:
Subject: Re: [PATCH] Miscellaneous little fixes