Thread: Re: UniqueKey on Partitioned table.

Re: UniqueKey on Partitioned table.

From
Dmitry Dolgov
Date:
> On Sat, Feb 20, 2021 at 10:25:59AM +0800, Andy Fan wrote:
>
> The attached is a UnqiueKey with EquivalenceClass patch, I just complete the
> single relation part and may have bugs. I just attached it here for design
> review only. and the not-null-attrs is just v1 which we can continue
> discussing on the original thread[2].

Thanks for the patch. After a short look through it I'm a bit confused
and wanted to clarify, now uniquekeys list could contain both Expr and
EquivalenceClass?



Re: UniqueKey on Partitioned table.

From
Andy Fan
Date:


On Sat, Mar 27, 2021 at 3:07 AM Dmitry Dolgov <9erthalion6@gmail.com> wrote:
> On Sat, Feb 20, 2021 at 10:25:59AM +0800, Andy Fan wrote:
>
> The attached is a UnqiueKey with EquivalenceClass patch, I just complete the
> single relation part and may have bugs. I just attached it here for design
> review only. and the not-null-attrs is just v1 which we can continue
> discussing on the original thread[2].

Thanks for the patch. After a short look through it I'm a bit confused
and wanted to clarify, now uniquekeys list could contain both Expr and
EquivalenceClass?

Yes,  That's because I don't want to create a new EquivalenceClass (which
would make the PlannerInfo->eq_classes longer) if we don't have
one , then I just used one Expr instead for this case.  However during the
test, I found some EquivalenceClass with only 1 EquivalenceMember
unexpectedly. 

--
Best Regards

Re: UniqueKey on Partitioned table.

From
Ashutosh Bapat
Date:
On Sat, Mar 27, 2021 at 11:44 AM Andy Fan <zhihui.fan1213@gmail.com> wrote:
>
>
>
> On Sat, Mar 27, 2021 at 3:07 AM Dmitry Dolgov <9erthalion6@gmail.com> wrote:
>>
>> > On Sat, Feb 20, 2021 at 10:25:59AM +0800, Andy Fan wrote:
>> >
>> > The attached is a UnqiueKey with EquivalenceClass patch, I just complete the
>> > single relation part and may have bugs. I just attached it here for design
>> > review only. and the not-null-attrs is just v1 which we can continue
>> > discussing on the original thread[2].
>>
>> Thanks for the patch. After a short look through it I'm a bit confused
>> and wanted to clarify, now uniquekeys list could contain both Expr and
>> EquivalenceClass?
>
>
> Yes,  That's because I don't want to create a new EquivalenceClass (which
> would make the PlannerInfo->eq_classes longer) if we don't have
> one , then I just used one Expr instead for this case.
> However during the
> test, I found some EquivalenceClass with only 1 EquivalenceMember
> unexpectedly.
>

Pathkeys may induce single member ECs. Why UniqueKeys are an exception?

-- 
Best Wishes,
Ashutosh Bapat



Re: UniqueKey on Partitioned table.

From
David Rowley
Date:
On Tue, 30 Mar 2021 at 02:27, Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> On Sat, Mar 27, 2021 at 11:44 AM Andy Fan <zhihui.fan1213@gmail.com> wrote:
> >
> > On Sat, Mar 27, 2021 at 3:07 AM Dmitry Dolgov <9erthalion6@gmail.com> wrote:
> >> Thanks for the patch. After a short look through it I'm a bit confused
> >> and wanted to clarify, now uniquekeys list could contain both Expr and
> >> EquivalenceClass?
> >
> >
> > Yes,  That's because I don't want to create a new EquivalenceClass (which
> > would make the PlannerInfo->eq_classes longer) if we don't have
> > one , then I just used one Expr instead for this case.
> > However during the
> > test, I found some EquivalenceClass with only 1 EquivalenceMember
> > unexpectedly.
> >
>
> Pathkeys may induce single member ECs. Why UniqueKeys are an exception?

I doubt that it should be. get_eclass_for_sort_expr() makes
single-member ECs for sorts.  I imagine the UniqueKey stuff should
copy that... However, get_eclass_for_sort_expr() can often dominate
the planning effort in queries to partitioned tables with a large
number of partitions when the query has an ORDER BY. Perhaps Andy is
trying to sidestep that issue?

I mentioned a few things in [1] on what I think about this.

David

[1] https://www.postgresql.org/message-id/CAApHDvoDMyw=hTuW-258yqNK4bhW6CpguJU_GZBh4x+rnoem3w@mail.gmail.com



Re: UniqueKey on Partitioned table.

From
Andy Fan
Date:


On Tue, Mar 30, 2021 at 4:16 AM David Rowley <dgrowleyml@gmail.com> wrote:
On Tue, 30 Mar 2021 at 02:27, Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> On Sat, Mar 27, 2021 at 11:44 AM Andy Fan <zhihui.fan1213@gmail.com> wrote:
> >
> > On Sat, Mar 27, 2021 at 3:07 AM Dmitry Dolgov <9erthalion6@gmail.com> wrote:
> >> Thanks for the patch. After a short look through it I'm a bit confused
> >> and wanted to clarify, now uniquekeys list could contain both Expr and
> >> EquivalenceClass?
> >
> >
> > Yes,  That's because I don't want to create a new EquivalenceClass (which
> > would make the PlannerInfo->eq_classes longer) if we don't have
> > one , then I just used one Expr instead for this case.
> > However during the
> > test, I found some EquivalenceClass with only 1 EquivalenceMember
> > unexpectedly.
> >
>
> Pathkeys may induce single member ECs. Why UniqueKeys are an exception?


When working with UniqueKey, I do want to make PlannerInfo.eq_classes short,
so I don't want to create a new EC for UniqueKey only.  After I realized we have
so single-member ECs, I doubt if the "Expr in UniqueKey" will be executed in real.
I still didn't get enough time to do more research about this. 

I doubt that it should be. get_eclass_for_sort_expr() makes
single-member ECs for sorts. 

Thanks for this hint.  I can check more cases like this. 
 
I imagine the UniqueKey stuff should
copy that... However, get_eclass_for_sort_expr() can often dominate
the planning effort in queries to partitioned tables with a large
number of partitions when the query has an ORDER BY. Perhaps Andy is
trying to sidestep that issue?

Yes.  a long PlannerInfo.eq_classes may make some finding slow, and in
my UniqueKey patch,  I am trying to not make it longer. 
 
I mentioned a few things in [1] on what I think about this.

David

[1] https://www.postgresql.org/message-id/CAApHDvoDMyw=hTuW-258yqNK4bhW6CpguJU_GZBh4x+rnoem3w@mail.gmail.com

I appreciate all of the people who helped on this patch and others.  I would
like to share more of my planning.  As for the UniqueKey patch,  there are some
design decisions that need to be made.  In my mind, the order would be:  
a). How to present the notnullattrs probably in [1]  b).  How to present the element
in UniqueKey.  Prue EquivalenceClasses or Mix of Expr and EquivalenceClass as
we just talked about.  c). How to maintain the UniqueKey Partitioned table in the
beginning of this thread.  As for a) & c).  I have my current proposal for discussion.
as for b) I think I need more thinking about this.  Based on the idea above, I am 
not willing to move too fast on the following issue unless the previous issue 
can be addressed.  Any feedback/suggestion about my current planning is welcome.


--
Best Regards

Re: UniqueKey on Partitioned table.

From
Ashutosh Bapat
Date:
> b).  How to present the element
> in UniqueKey.  Prue EquivalenceClasses or Mix of Expr and EquivalenceClass as
> we just talked about.
I think the reason we add ECs for sort expressions is to use
transitive relationship. The EC may start with a single member but
later in the planning that member might find partners which are all
equivalent. Result ordered by one is also ordered by the other. The
same logic applies to UniqueKey as well, isn't it. In a result if a
set of columns makes a row unique, the set of columns represented by
the other EC member should be unique. Though a key will start as a
singleton it might EC partners later and thus thus unique key will
transition to all the members. With that logic UniqueKey should use
just ECs instead of bare expressions.

-- 
Best Wishes,
Ashutosh Bapat



Re: UniqueKey on Partitioned table.

From
Andy Fan
Date:


On Wed, Mar 31, 2021 at 9:12 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:
> b).  How to present the element
> in UniqueKey.  Prue EquivalenceClasses or Mix of Expr and EquivalenceClass as
> we just talked about.
I think the reason we add ECs for sort expressions is to use
transitive relationship. The EC may start with a single member but
later in the planning that member might find partners which are all
equivalent. Result ordered by one is also ordered by the other. The
same logic applies to UniqueKey as well, isn't it. In a result if a
set of columns makes a row unique, the set of columns represented by
the other EC member should be unique. Though a key will start as a
singleton it might EC partners later and thus thus unique key will
transition to all the members. With that logic UniqueKey should use
just ECs instead of bare expressions.

TBH, I haven't thought about this too hard, but I think when we build the
UniqueKey, all the ECs have been built already.  So can you think out an
case we start with an EC with a single member at the beginning and
have more members later for UniqueKey cases? 

--
Best Regards

Re: UniqueKey on Partitioned table.

From
David Rowley
Date:
On Tue, 6 Apr 2021 at 22:31, Andy Fan <zhihui.fan1213@gmail.com> wrote:
> On Wed, Mar 31, 2021 at 9:12 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:
>> I think the reason we add ECs for sort expressions is to use
>> transitive relationship. The EC may start with a single member but
>> later in the planning that member might find partners which are all
>> equivalent. Result ordered by one is also ordered by the other. The
>> same logic applies to UniqueKey as well, isn't it. In a result if a
>> set of columns makes a row unique, the set of columns represented by
>> the other EC member should be unique. Though a key will start as a
>> singleton it might EC partners later and thus thus unique key will
>> transition to all the members. With that logic UniqueKey should use
>> just ECs instead of bare expressions.
>
>
> TBH, I haven't thought about this too hard, but I think when we build the
> UniqueKey, all the ECs have been built already.  So can you think out an
> case we start with an EC with a single member at the beginning and
> have more members later for UniqueKey cases?

I don't really know if it matters which order things happen. We still
end up with a single EC containing {a,b} whether we process ORDER BY b
or WHERE a=b first.

In any case, the reason we want PathKeys to be ECs rather than just
Exprs is to allow cases such as the following to use an index to
perform the sort.

# create table ab (a int, b int);
# create index on ab(a);
# set enable_seqscan=0;
# explain select * from ab where a=b order by b;
                             QUERY PLAN
---------------------------------------------------------------------
 Index Scan using ab_a_idx on ab  (cost=0.15..83.70 rows=11 width=8)
   Filter: (a = b)
(2 rows)

Of course, we couldn't use this index to provide pre-sorted results if
"where a=b" hadn't been there.

David



Re: UniqueKey on Partitioned table.

From
Andy Fan
Date:
On Tue, Apr 6, 2021 at 6:55 PM David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Tue, 6 Apr 2021 at 22:31, Andy Fan <zhihui.fan1213@gmail.com> wrote:
> > On Wed, Mar 31, 2021 at 9:12 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:
> >> I think the reason we add ECs for sort expressions is to use
> >> transitive relationship. The EC may start with a single member but
> >> later in the planning that member might find partners which are all
> >> equivalent. Result ordered by one is also ordered by the other. The
> >> same logic applies to UniqueKey as well, isn't it. In a result if a
> >> set of columns makes a row unique, the set of columns represented by
> >> the other EC member should be unique. Though a key will start as a
> >> singleton it might EC partners later and thus thus unique key will
> >> transition to all the members. With that logic UniqueKey should use
> >> just ECs instead of bare expressions.
> >
> >
> > TBH, I haven't thought about this too hard, but I think when we build the
> > UniqueKey, all the ECs have been built already.  So can you think out an
> > case we start with an EC with a single member at the beginning and
> > have more members later for UniqueKey cases?
>
> I don't really know if it matters which order things happen. We still
> end up with a single EC containing {a,b} whether we process ORDER BY b
> or WHERE a=b first.
>

I think it is time to talk about this again.  Take the below query as example:

SELECT * FROM t1, t2 WHERE t1.pk = t2.pk;

Then when I populate_baserel_uniquekeys for t1, we already have
EC{Members={t1.pk, t2.pk}} in root->eq_classes already. Then I use
this EC directly for t1's UniqueKey.  The result is:

T1's UniqueKey : [ EC{Members={t1.pk, t2.pk}} ].

*Would this be OK since at the baserel level,  the "t1.pk = t2.pk" is not
executed yet?*

I tried the below example to test how PathKey is maintained.
CREATE TABLE t1 (a INT, b INT);
CREATE TABLE t2 (a INT, b INT);
CREATE INDEX ON t1(b);

SELECT * FROM t1, t2 WHERE t1.b = t2.b and t1.b > 3;

then we can get t1's Path:

Index Scan on (b),  PathKey.pk_class include 2 members (t1.b, t2.b}
even before the Join.

So looks the answer for my question should be "yes"? Hope I have
made myself clear.

-- 
Best Regards
Andy Fan (https://www.aliyun.com/)



Re: UniqueKey on Partitioned table.

From
David Rowley
Date:
On Sat, 17 Jul 2021 at 19:32, Andy Fan <zhihui.fan1213@gmail.com> wrote:
> SELECT * FROM t1, t2 WHERE t1.pk = t2.pk;
>
> Then when I populate_baserel_uniquekeys for t1, we already have
> EC{Members={t1.pk, t2.pk}} in root->eq_classes already. Then I use
> this EC directly for t1's UniqueKey.  The result is:
>
> T1's UniqueKey : [ EC{Members={t1.pk, t2.pk}} ].
>
> *Would this be OK since at the baserel level,  the "t1.pk = t2.pk" is not
> executed yet?*
>
> I tried the below example to test how PathKey is maintained.
> CREATE TABLE t1 (a INT, b INT);
> CREATE TABLE t2 (a INT, b INT);
> CREATE INDEX ON t1(b);
>
> SELECT * FROM t1, t2 WHERE t1.b = t2.b and t1.b > 3;
>
> then we can get t1's Path:
>
> Index Scan on (b),  PathKey.pk_class include 2 members (t1.b, t2.b}
> even before the Join.
>
> So looks the answer for my question should be "yes"? Hope I have
> made myself clear.

I don't see the problem. The reason PathKeys use EquivalenceClasses is
so that queries like: SELECT * FROM tab WHERE a=b ORDER BY b; can see
that they're also ordered by a.  This is useful because if there
happens to be an index on tab(a) then we can use it to provide the
required ordering for this query.

We'll want the same with UniqueKeys.  The same thing there looks like:

CREATE TABLE tab (a int primary key, b int not null);

select distinct b from tab where a=b;

Since we have the EquivalenceClass with {a,b} stored in the UniqueKey,
then we should be able to execute this without doing any distinct
operation.

David



Re: UniqueKey on Partitioned table.

From
Andy Fan
Date:
On Sat, Jul 17, 2021 at 3:45 PM David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Sat, 17 Jul 2021 at 19:32, Andy Fan <zhihui.fan1213@gmail.com> wrote:
> > SELECT * FROM t1, t2 WHERE t1.pk = t2.pk;
> >
> > Then when I populate_baserel_uniquekeys for t1, we already have
> > EC{Members={t1.pk, t2.pk}} in root->eq_classes already. Then I use
> > this EC directly for t1's UniqueKey.  The result is:
> >
> > T1's UniqueKey : [ EC{Members={t1.pk, t2.pk}} ].
> >
> > *Would this be OK since at the baserel level,  the "t1.pk = t2.pk" is not
> > executed yet?*
> >
> > I tried the below example to test how PathKey is maintained.
> > CREATE TABLE t1 (a INT, b INT);
> > CREATE TABLE t2 (a INT, b INT);
> > CREATE INDEX ON t1(b);
> >
> > SELECT * FROM t1, t2 WHERE t1.b = t2.b and t1.b > 3;
> >
> > then we can get t1's Path:
> >
> > Index Scan on (b),  PathKey.pk_class include 2 members (t1.b, t2.b}
> > even before the Join.
> >
> > So looks the answer for my question should be "yes"? Hope I have
> > made myself clear.
>
> I don't see the problem.

Thanks for the double check,  that removes a big blocker for my development.
I'd submit a new patch very soon.

> The reason PathKeys use EquivalenceClasses is
> so that queries like: SELECT * FROM tab WHERE a=b ORDER BY b; can see
> that they're also ordered by a.  This is useful because if there
> happens to be an index on tab(a) then we can use it to provide the
> required ordering for this query.
>
> We'll want the same with UniqueKeys.  The same thing there looks like:
>
> CREATE TABLE tab (a int primary key, b int not null);
>
> select distinct b from tab where a=b;
>
> Since we have the EquivalenceClass with {a,b} stored in the UniqueKey,
> then we should be able to execute this without doing any distinct
> operation.
>
> David



-- 
Best Regards
Andy Fan (https://www.aliyun.com/)