Home > mailing lists

Re: Pathify RHS unique-ification for semijoin planning - Mailing list pgsql-hackers

From	Richard Guo
Subject	Re: Pathify RHS unique-ification for semijoin planning
Date	August 1 04:53:37
Msg-id	CAMbWs49p2Ci-hokz+GP-EpYYdQEzr3uH3ApyPou+nsECXaJ_4w@mail.gmail.com Whole thread Raw
In response to	Re: Pathify RHS unique-ification for semijoin planning (wenhui qiu <qiuwenhuifx@gmail.com>)
List	pgsql-hackers

Tree view

On Thu, Jul 31, 2025 at 9:49 PM wenhui qiu <qiuwenhuifx@gmail.com> wrote:

> > This seems to be another case where the planner chooses a suboptimal
> >  plan due to inaccurate cost estimates.

> Agree ,I increased some rows , set enable_hashagg to on and off ,There's no difference in execution time. The
executionplan is the same. 

Yeah, I found that if you increase the total number of rows in the
table from 1000 to 1083 or more, you consistently get the more
efficient plan -- regardless of whether enable_hashagg is on or off.

> > #define IS_UNIQUEIFIED_REL(rel, sjinfo, nominal_jointype) \
>  >   ((nominal_jointype) == JOIN_INNER && (sjinfo)->jointype == JOIN_SEMI && \
> >     bms_equal((sjinfo)->syn_righthand, (rel)->relids))
> >
> >... and then the check in final_cost_hashjoin() becomes:
> >
>  >   if (IS_UNIQUEIFIED_REL(inner_path->parent, extra->sjinfo,
> >                           path->jpath.jointype))
> >    {
> >        innerbucketsize = 1.0 / virtualbuckets;
>  >       innermcvfreq = 0.0;
> >    }
> >
> >Would this be a better approach?  Any thoughts?

> This approach does indeed more accurately capture the fact that the relation has been unique-ified, especially in
caseswhere a semi join has been transformed into an inner join. Compared to the current heuristic checks in costsize.c
thatrely on inner_path->rows, this method is more semantically meaningful and robust. 

The current check in costsize.c relies on the path type, not the path
rows as you mentioned.  However, I agree that the check I proposed is
more robust and extensible: if additional path types are introduced to
represent unique-ification, this check wouldn't need to change.  So I
plan to go this way, unless there are any objections.

Thanks
Richard

pgsql-hackers by date:

From: Chao Li
Date: 01 August, 03:19:23
Subject: Cannot login CommitFests due to cool-off period

From: Chao Li
Date: 01 August, 04:57:20
Subject: [PATCH v1] Trivial: ignore IDE-specific files (.vscode/)

Re: Pathify RHS unique-ification for semijoin planning - Mailing list pgsql-hackers

Previous

Next