Re: Remove useless GROUP BY columns considering unique index - Mailing list pgsql-hackers
From | jian he |
---|---|
Subject | Re: Remove useless GROUP BY columns considering unique index |
Date | |
Msg-id | CACJufxEdavJhkUDhJ1jraXnZ9ayNQU+TvjuQjzQbuGS06oNZEQ@mail.gmail.com Whole thread Raw |
In response to | Remove useless GROUP BY columns considering unique index (Zhang Mingli <zmlpostgres@gmail.com>) |
List | pgsql-hackers |
On Fri, Dec 29, 2023 at 11:05 PM Zhang Mingli <zmlpostgres@gmail.com> wrote: > > Hi, > > This idea first came from remove_useless_groupby_columns does not need to record constraint dependencie[0] which pointsout that > unique index whose columns all have NOT NULL constraints could also take the work with primary key when removing uselessGROUP BY columns. > I study it and implement the idea. > > [0]https://www.postgresql.org/message-id/flat/CAApHDvrdYa%3DVhOoMe4ZZjZ-G4ALnD-xuAeUNCRTL%2BPYMVN8OnQ%40mail.gmail.com > cosmetic issues: +-- +-- Test removal of redundant GROUP BY columns using unique not null index. +-- materialized view +-- +create temp table t1 (a int, b int, c int, primary key (a, b), unique(c)); +create temp table t2 (a int, b int, c int not null, primary key (a, b), unique(c)); +create temp table t3 (a int, b int, c int not null, d int not null, primary key (a, b), unique(c, d)); +create temp table t4 (a int, b int not null, c int not null, d int not null, primary key (a, b), unique(b, c), unique(d)); +create temp table t5 (a int not null, b int not null, c int not null, d int not null, unique (a, b), unique(b, c), unique(a, c, d)); + +-- Test unique index without not null constraint should not be used. +explain (costs off) select * from t1 group by a,b,c; + +-- Test unique not null index beats primary key. +explain (costs off) select * from t2 group by a,b,c; + +-- Test primary key beats unique not null index. +explain (costs off) select * from t3 group by a,b,c,d; + +-- Test unique not null index with less columns wins. +explain (costs off) select * from t4 group by a,b,c,d; + +-- Test unique not null indices have overlap. +explain (costs off) select * from t5 group by a,b,c,d; + +drop table t1; +drop table t2; +drop table t3; +drop table t4; + line `materailzed view` is unnecessary? you forgot to drop table t5. create temp table p_t2 ( a int not null, b int not null, c int not null, d int, unique(a), unique(a,b),unique(a,b,c) ) partition by list(a); create temp table p_t2_1 partition of p_t2 for values in(1); create temp table p_t2_2 partition of p_t2 for values in(2); explain (costs off, verbose) select * from p_t2 group by a,b,c,d; your patch output: QUERY PLAN -------------------------------------------------------------- HashAggregate Output: p_t2.a, p_t2.b, p_t2.c, p_t2.d Group Key: p_t2.a -> Append -> Seq Scan on pg_temp.p_t2_1 Output: p_t2_1.a, p_t2_1.b, p_t2_1.c, p_t2_1.d -> Seq Scan on pg_temp.p_t2_2 Output: p_t2_2.a, p_t2_2.b, p_t2_2.c, p_t2_2.d (8 rows) so it seems to work with partitioned tables, maybe you should add some test cases with partition tables. - * XXX This is only used to create derived tables, so NO INHERIT constraints - * are always skipped. + * XXX When used to create derived tables, set skip_no_inherit tp true, + * so that NO INHERIT constraints will be skipped. */ List * -RelationGetNotNullConstraints(Oid relid, bool cooked) +RelationGetNotNullConstraints(Oid relid, bool cooked, bool skip_no_inherit) { List *notnulls = NIL; Relation constrRel; @@ -815,7 +815,7 @@ RelationGetNotNullConstraints(Oid relid, bool cooked) if (conForm->contype != CONSTRAINT_NOTNULL) continue; - if (conForm->connoinherit) + if (skip_no_inherit && conForm->connoinherit) continue; I don't think you need to refactor RelationGetNotNullConstraints. however i found it hard to explain it, (which one is parent, which one is children is very confusing for me). so i use the following script to test it: DROP TABLE ATACC1, ATACC2; CREATE TABLE ATACC1 (a int); CREATE TABLE ATACC2 (b int,c int, unique(c)) INHERITS (ATACC1); ALTER TABLE ATACC1 ADD NOT NULL a; -- ALTER TABLE ATACC1 ADD NOT NULL a NO INHERIT; ALTER TABLE ATACC2 ADD unique(a); explain (costs off, verbose) select * from ATACC2 group by a,b,c; ------------------------- create table t_0_3 (a int not null, b int not null, c int not null, d int not null, unique (a, b), unique(b, c) DEFERRABLE, unique(d)); explain (costs off, verbose) select * from t_0_3 group by a,b,c,d; QUERY PLAN -------------------------------- HashAggregate Output: a, b, c, d Group Key: t_0_3.a, t_0_3.b -> Seq Scan on public.t_0_3 Output: a, b, c, d (5 rows) the above part seems not right, it should be `Group Key: t_0_3.d` so i changed to: /* Skip constraints that are not UNIQUE */ + if (con->contype != CONSTRAINT_UNIQUE) + continue; + + /* Skip unique constraints that are condeferred */ + if (con->condeferrable && con->condeferred) + continue; I guess you probably have noticed, in the function remove_useless_groupby_columns, get_primary_key_attnos followed by get_min_unique_not_null_attnos. Also, get_min_unique_not_null_attnos main code is very similar to get_primary_key_attnos. They both do `pg_constraint = table_open(ConstraintRelationId, AccessShareLock);` you might need to explain why we must open pg_constraint twice. either it's cheap to do it or another reason. If scanning twice is expensive, we may need to refactor get_primary_key_attnos. get_primary_key_attnos only called in check_functional_grouping, remove_useless_groupby_columns. I added the patch to commitfest: https://commitfest.postgresql.org/46/4751/
pgsql-hackers by date: