Home > mailing lists

Re: UniqueKey v2 - Mailing list pgsql-hackers

From	Antonin Houska
Subject	Re: UniqueKey v2
Date	June 4, 2024 14:14:19
Msg-id	21674.1717510459@antos Whole thread Raw
In response to	Re: UniqueKey v2 (Andy Fan <zhihuifan1213@163.com>)
List	pgsql-hackers

Tree view

Andy Fan <zhihuifan1213@163.com> wrote:
> Antonin Houska <ah@cybertec.at> writes:
>
> >> Could you make the reason clearer for adding 'List *opfamily_lists;'
> >> into UniqueKey?  You said "This is needed to create ECs in the parent
> >> query if the upper relation represents a subquery." but I didn't get the
> >> it. Since we need to maintain the UniqueKey in the many places, I'd like
> >> to keep it as simple as possbile. Of course, anything essentical should
> >> be added for sure.
> >
> > If unique keys are generated for a subquery output, they also need to be
> > created for the corresponding relation in the upper query ("sub" in the
> > following example):
>
> OK.
> >
> > select * from tab1 left join (select * from tab2) sub;
> >
> > However, to create an unique key for "sub", you need an EC for each expression
> > of the key.
>
> OK.
> > And to create an EC, you in turn need the list of operator
> > families.
>
> I'm thinking if we need to "create" any EC. Can you find out a user case
> where the outer EC is missed and the UniqueKey is still interesting? I
> don't have an example now.
>
> convert_subquery_pathkeys has a similar sistuation and has the following
> codes:
>
>                 outer_ec =
>                     get_eclass_for_sort_expr(root,
>                                              (Expr *) outer_var,
>                                              sub_eclass->ec_opfamilies,
>                                              sub_member->em_datatype,
>                                              sub_eclass->ec_collation,
>                                              0,
>                                              rel->relids,
>                                              NULL,
>                                              false);
>
>                 /*
>                  * If we don't find a matching EC, sub-pathkey isn't
>                  * interesting to the outer query
>                  */
>                 if (outer_ec)
>                     best_pathkey =
>                         make_canonical_pathkey(root,
>                                                outer_ec,
>                                                sub_pathkey->pk_opfamily,
>                                                sub_pathkey->pk_strategy,
>                                                sub_pathkey->pk_nulls_first);
>             }

I think that convert_subquery_pathkeys() just does not try that hard to
achieve its goal.

The example where it's important to create the EC in the outer query is what I
added to the subselect.sql regression test in the 0004- diff in [1]:

create table tabx as select * from generate_series(1,100) idx;
create table taby as select * from generate_series(1,100) idy;
create unique index on taby using btree (idy);
create view view_barrier with (security_barrier=true) as select * from taby;
analyze tabx, taby;
explain (costs off, verbose on) select * from tabx x left join view_barrier y on idy = idx;

If you modify find_ec_position_matching_expr() to return -1 instead of
creating the EC, you will get this plan

Hash Left Join
   Output: x.idx, taby.idy
   Hash Cond: (x.idx = taby.idy)
   ->  Seq Scan on public.tabx x
         Output: x.idx
   ->  Hash
         Output: taby.idy
         ->  Seq Scan on public.taby
               Output: taby.idy

instead of this

Hash Left Join
   Output: x.idx, taby.idy
   Inner Unique: true
   Hash Cond: (x.idx = taby.idy)
   ->  Seq Scan on public.tabx x
         Output: x.idx
   ->  Hash
         Output: taby.idy
         ->  Seq Scan on public.taby
               Output: taby.idy

> >> > * uniquekey_useful_for_merging()
> >> >
> >> >   How does uniqueness relate to merge join? In README.uniquekey you seem to
> >> >   point out that a single row is always sorted, but I don't think this
> >> >   function is related to that fact. (Instead, I'd expect that pathkeys are
> >> >   added to all paths for a single-row relation, but I'm not sure you do that
> >> >   in the current version of the patch.)
> >>
> >> The merging is for "mergejoinable join clauses", see function
> >> eclass_useful_for_merging. Usually I think it as operator "t1.a = t2.a";
> >
> > My question is: why is the uniqueness important specifically to merge join? I
> > understand that join evaluation can be more efficient if we know that one
> > input relation is unique (i.e. we only scan that relation until we find the
> > first match), but this is not specific to merge join.
>
> So the answer is the "merging" in uniquekey_useful_for_merging() has
> nothing with merge join.

I don't understand. The function comment does mention merge join:

/*
 * uniquekey_useful_for_merging
 *    Check if the uniquekey is useful for mergejoins above the given relation.
 *
 * similar with pathkeys_useful_for_merging.
 */
static bool
uniquekey_useful_for_merging(PlannerInfo *root, UniqueKey * ukey, RelOptInfo *rel)


[1] https://www.postgresql.org/message-id/7971.1713526758%40antos

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

pgsql-hackers by date:

From: Ashutosh Bapat
Date: 04 June 2024, 14:09:46
Subject: Re: Logical Replication of sequences

From: Robert Haas
Date: 04 June 2024, 14:16:36
Subject: Re: DROP OWNED BY fails to clean out pg_init_privs grants

Re: UniqueKey v2 - Mailing list pgsql-hackers

Previous

Next