Re: [PATCH] Erase the distinctClause if the result is unique by definition - Mailing list pgsql-hackers
From | Andy Fan |
---|---|
Subject | Re: [PATCH] Erase the distinctClause if the result is unique by definition |
Date | |
Msg-id | CAKU4AWp8E6En7-Ox0op4v3-yViGhKA4ic7KCOdAbaVcC0s52DQ@mail.gmail.com Whole thread Raw |
In response to | Re: [PATCH] Erase the distinctClause if the result is unique by definition (David Rowley <dgrowleyml@gmail.com>) |
Responses |
Re: [PATCH] Erase the distinctClause if the result is unique by definition
Re: [PATCH] Erase the distinctClause if the result is unique by definition |
List | pgsql-hackers |
Hi David:
3. Perhaps in add_paths_to_joinrel(), or maybe when creating the join
rel itself (I've not looked for the best location in detail),
determine if the join can cause rows to be duplicated. If it can't,
then add the UniqueKeys from that rel.
I have some concerns about this method, maybe I misunderstand
something, if so, please advise.
In my current implementation, it calculates the uniqueness for each
BaseRel only, but in your way, looks we need to calculate the
UniquePathKey for both BaseRel and JoinRel. This makes more
difference for multi table join. Another concern is UniquePathKey
is designed for a general purpose, we need to maintain it no matter
distinctClause/groupbyClause.
For example: SELECT * FROM t1
INNER JOIN t2 ON t1.unique = t2.not_unique; would have the joinrel for
{t1,t2} only take the unique keys from t2 (t1 can't duplicate t2 rows
since it's an eqijoin and t1.unique has a unique index).
Thanks for raising this. My current rule requires *every* relation yields a
unique result and *no matter* with the join method. Actually I want to make
the rule less strict, for example, we may just need 1 table yields unique result
and the final result will be unique as well under some join type.
As for the t1 INNER JOIN t2 ON t1.unique = t2.not_unique; looks we can't
remove the distinct based on this.
create table m1(a int primary key, b int);
create table m2(a int primary key, b int);
insert into m1 values(1, 1), (2, 1);
insert into m2 values(1, 1), (2, 1);
select distinct m1.a from m1, m2 where m1.a = m2.b;
SELECT DISTINCT max(non_unique) FROM t1; to skip doing the DISTINCT part.
Actually I want to keep the distinct for this case now. One reason is there are only 1
row returned, so the distinct erasing can't help much. The more important reason is
Query->hasAggs is true for "select distinct (select count(*) filter (where t2.c2 = 6
and t2.c1 < 10) from ft1 t1 where t1.c1 = 6) from ft2 t2 where t2.c2 % 6 = 0 order by 1;"
(this sql came from postgres_fdw.sql).
There's a separate effort in
https://commitfest.postgresql.org/27/1741/ to implement some parts of
the uniquekeys idea. However the implementation currently only covers
adding the unique keys to Paths, not to RelOptInfos
Thanks for this info. I guess this patch is not merged so far, but looks the cfbot
for my patch [1] failed due to this :( search
"explain (costs off) select distinct on(pk1) pk1, pk2 from select_distinct_a;"
I also believe that the existing code in analyzejoins.c should be
edited to make use of unique keys rather than how it looks at unique
indexes and group by / distinct clauses.
I can do this after we have agreement on the UniquePath.
For my cbbot failure, another strange thing is "A" appear ahead of "a" after
the order by.. Still didn't find out why.
Regards
Andy Fan
pgsql-hackers by date: