Home > mailing lists

Re: [PATCH] Erase the distinctClause if the result is unique by definition - Mailing list pgsql-hackers

From	Andy Fan
Subject	Re: [PATCH] Erase the distinctClause if the result is unique by definition
Date	March 10, 2020 11:19:03
Msg-id	CAKU4AWp8E6En7-Ox0op4v3-yViGhKA4ic7KCOdAbaVcC0s52DQ@mail.gmail.com Whole thread Raw
In response to	Re: [PATCH] Erase the distinctClause if the result is unique by definition (David Rowley <dgrowleyml@gmail.com>)
Responses	Re: [PATCH] Erase the distinctClause if the result is unique by definition Re: [PATCH] Erase the distinctClause if the result is unique by definition
List	pgsql-hackers

Tree view

Hi David:

3. Perhaps in add_paths_to_joinrel(), or maybe when creating the join
rel itself (I've not looked for the best location in detail),
determine if the join can cause rows to be duplicated. If it can't,
then add the UniqueKeys from that rel.

I have some concerns about this method, maybe I misunderstand

something, if so, please advise.

In my current implementation, it calculates the uniqueness for each

BaseRel only, but in your way, looks we need to calculate the

UniquePathKey for both BaseRel and JoinRel. This makes more

difference for multi table join. Another concern is UniquePathKey

is designed for a general purpose, we need to maintain it no matter

distinctClause/groupbyClause.

For example: SELECT * FROM t1
INNER JOIN t2 ON t1.unique = t2.not_unique; would have the joinrel for
{t1,t2} only take the unique keys from t2 (t1 can't duplicate t2 rows
since it's an eqijoin and t1.unique has a unique index).

Thanks for raising this. My current rule requires *every* relation yields a

unique result and *no matter* with the join method. Actually I want to make

the rule less strict, for example, we may just need 1 table yields unique result

and the final result will be unique as well under some join type.

As for the t1 INNER JOIN t2 ON t1.unique = t2.not_unique; looks we can't

remove the distinct based on this.

create table m1(a int primary key, b int);

create table m2(a int primary key, b int);

insert into m1 values(1, 1), (2, 1);

insert into m2 values(1, 1), (2, 1);

select distinct m1.a from m1, m2 where m1.a = m2.b;

SELECT DISTINCT max(non_unique) FROM t1; to skip doing the DISTINCT part.

Actually I want to keep the distinct for this case now. One reason is there are only 1

row returned, so the distinct erasing can't help much. The more important reason is

Query->hasAggs is true for "select distinct (select count(*) filter (where t2.c2 = 6

and t2.c1 < 10) from ft1 t1 where t1.c1 = 6) from ft2 t2 where t2.c2 % 6 = 0 order by 1;"

(this sql came from postgres_fdw.sql).

There's a separate effort in
https://commitfest.postgresql.org/27/1741/ to implement some parts of
the uniquekeys idea. However the implementation currently only covers
adding the unique keys to Paths, not to RelOptInfos

Thanks for this info. I guess this patch is not merged so far, but looks the cfbot

for my patch [1] failed due to this :( search

"explain (costs off) select distinct on(pk1) pk1, pk2 from select_distinct_a;"

I also believe that the existing code in analyzejoins.c should be
edited to make use of unique keys rather than how it looks at unique
indexes and group by / distinct clauses.

I can do this after we have agreement on the UniquePath.

For my cbbot failure, another strange thing is "A" appear ahead of "a" after

the order by.. Still didn't find out why.

[1] https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.83298

Regards

Andy Fan

pgsql-hackers by date:

From: Michael Paquier
Date: 10 March 2020, 11:01:43
Subject: Re: reindex concurrently and two toast indexes

From: Peter Eisentraut
Date: 10 March 2020, 11:52:29
Subject: Re: adding partitioned tables to publications

Re: [PATCH] Erase the distinctClause if the result is unique by definition - Mailing list pgsql-hackers

Previous

Next