Re: Remove useless GROUP BY columns considering unique index - Mailing list pgsql-hackers

From jian he
Subject Re: Remove useless GROUP BY columns considering unique index
Date
Msg-id CACJufxEdavJhkUDhJ1jraXnZ9ayNQU+TvjuQjzQbuGS06oNZEQ@mail.gmail.com
Whole thread Raw
In response to Remove useless GROUP BY columns considering unique index  (Zhang Mingli <zmlpostgres@gmail.com>)
List pgsql-hackers
On Fri, Dec 29, 2023 at 11:05 PM Zhang Mingli <zmlpostgres@gmail.com> wrote:
>
> Hi,
>
> This idea first came from remove_useless_groupby_columns does not need to record constraint dependencie[0] which
pointsout that 
> unique index whose columns all have NOT NULL constraints  could also take the work with primary key when removing
uselessGROUP BY columns. 
> I study it and implement the idea.
>
>
[0]https://www.postgresql.org/message-id/flat/CAApHDvrdYa%3DVhOoMe4ZZjZ-G4ALnD-xuAeUNCRTL%2BPYMVN8OnQ%40mail.gmail.com
>

cosmetic issues:
+--
+-- Test removal of redundant GROUP BY columns using unique not null index.
+-- materialized view
+--
+create temp table t1 (a int, b int, c int, primary key (a, b), unique(c));
+create temp table t2 (a int, b int, c int not null, primary key (a,
b), unique(c));
+create temp table t3 (a int, b int, c int not null, d int not null,
primary key (a, b), unique(c, d));
+create temp table t4 (a int, b int not null, c int not null, d int
not null, primary key (a, b), unique(b, c), unique(d));
+create temp table t5 (a int not null, b int not null, c int not null,
d int not null, unique (a, b), unique(b, c), unique(a, c, d));
+
+-- Test unique index without not null constraint should not be used.
+explain (costs off) select * from t1 group by a,b,c;
+
+-- Test unique not null index beats primary key.
+explain (costs off) select * from t2 group by a,b,c;
+
+-- Test primary key beats unique not null index.
+explain (costs off) select * from t3 group by a,b,c,d;
+
+-- Test unique not null index with less columns wins.
+explain (costs off) select * from t4 group by a,b,c,d;
+
+-- Test unique not null indices have overlap.
+explain (costs off) select * from t5 group by a,b,c,d;
+
+drop table t1;
+drop table t2;
+drop table t3;
+drop table t4;
+

line `materailzed view` is unnecessary?
you forgot to drop table t5.

create temp table p_t2 (
  a int not null,
  b int not null,
  c int not null,
  d int,
  unique(a), unique(a,b),unique(a,b,c)
) partition by list(a);
create temp table p_t2_1 partition of p_t2 for values in(1);
create temp table p_t2_2 partition of p_t2 for values in(2);
explain (costs off, verbose) select * from p_t2 group by a,b,c,d;
your patch output:
                          QUERY PLAN
--------------------------------------------------------------
 HashAggregate
   Output: p_t2.a, p_t2.b, p_t2.c, p_t2.d
   Group Key: p_t2.a
   ->  Append
         ->  Seq Scan on pg_temp.p_t2_1
               Output: p_t2_1.a, p_t2_1.b, p_t2_1.c, p_t2_1.d
         ->  Seq Scan on pg_temp.p_t2_2
               Output: p_t2_2.a, p_t2_2.b, p_t2_2.c, p_t2_2.d
(8 rows)

so it seems to work with partitioned tables, maybe you should add some
test cases with partition tables.


- * XXX This is only used to create derived tables, so NO INHERIT constraints
- * are always skipped.
+ * XXX When used to create derived tables, set skip_no_inherit tp true,
+ * so that NO INHERIT constraints will be skipped.
  */
 List *
-RelationGetNotNullConstraints(Oid relid, bool cooked)
+RelationGetNotNullConstraints(Oid relid, bool cooked, bool skip_no_inherit)
 {
  List   *notnulls = NIL;
  Relation constrRel;
@@ -815,7 +815,7 @@ RelationGetNotNullConstraints(Oid relid, bool cooked)

  if (conForm->contype != CONSTRAINT_NOTNULL)
  continue;
- if (conForm->connoinherit)
+ if (skip_no_inherit && conForm->connoinherit)
  continue;

I don't think you need to refactor RelationGetNotNullConstraints.
however i found it hard to explain it, (which one is parent, which one
is children is very confusing for me).
so i use the following script to test it:
DROP TABLE ATACC1, ATACC2;
CREATE TABLE ATACC1 (a int);
CREATE TABLE ATACC2 (b int,c int, unique(c)) INHERITS (ATACC1);
ALTER TABLE ATACC1 ADD NOT NULL a;
-- ALTER TABLE ATACC1 ADD NOT NULL a NO INHERIT;
ALTER TABLE ATACC2 ADD unique(a);
explain (costs off, verbose) select * from ATACC2 group by a,b,c;
-------------------------

create table t_0_3 (a int not null, b int not null, c int not null, d
int not null, unique (a, b), unique(b, c) DEFERRABLE, unique(d));
explain (costs off, verbose) select * from t_0_3 group by a,b,c,d;
           QUERY PLAN
--------------------------------
 HashAggregate
   Output: a, b, c, d
   Group Key: t_0_3.a, t_0_3.b
   ->  Seq Scan on public.t_0_3
         Output: a, b, c, d
(5 rows)

the above part seems not right, it should be `Group Key: t_0_3.d`
so i changed to:

/* Skip constraints that are not UNIQUE */
+ if (con->contype != CONSTRAINT_UNIQUE)
+ continue;
+
+ /* Skip unique constraints that are condeferred */
+ if (con->condeferrable && con->condeferred)
+ continue;

I guess you probably have noticed, in the function
remove_useless_groupby_columns,
get_primary_key_attnos followed by get_min_unique_not_null_attnos.
Also, get_min_unique_not_null_attnos main code is very similar to
get_primary_key_attnos.

They both do `pg_constraint = table_open(ConstraintRelationId,
AccessShareLock);`
you might need to explain why we must open pg_constraint twice.
either it's cheap to do it or another reason.

If scanning twice is expensive, we may need to refactor get_primary_key_attnos.
get_primary_key_attnos only called in check_functional_grouping,
remove_useless_groupby_columns.

I added the patch to commitfest:
https://commitfest.postgresql.org/46/4751/



pgsql-hackers by date:

Previous
From: Ivan Kush
Date:
Subject: Re: Autonomous transactions 2023, WIP
Next
From: Magnus Hagander
Date:
Subject: Re: Password leakage avoidance