Re: [PATCH] Erase the distinctClause if the result is unique by definition - Mailing list pgsql-hackers
From | Andy Fan |
---|---|
Subject | Re: [PATCH] Erase the distinctClause if the result is unique by definition |
Date | |
Msg-id | CAKU4AWqzHG-4X4R85BcwD+QDD6u6OBOj65F69FBToPK4KUo0rQ@mail.gmail.com Whole thread Raw |
In response to | Re: [PATCH] Erase the distinctClause if the result is unique bydefinition (Julien Rouhaud <rjuju123@gmail.com>) |
List | pgsql-hackers |
On Tue, Feb 11, 2020 at 3:56 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Tue, Feb 11, 2020 at 10:57:26AM +0800, Andy Fan wrote:
> On Tue, Feb 11, 2020 at 12:22 AM Ashutosh Bapat <
> ashutosh.bapat.oss@gmail.com> wrote:
>
> > I forgot to mention this in the last round of comments. Your patch was
> > actually removing distictClause from the Query structure. Please avoid
> > doing that. If you remove it, you are also removing the evidence that this
> > Query had a DISTINCT clause in it.
> >
>
> Yes, I removed it because it is the easiest way to do it. what is the
> purpose of keeping the evidence?
>
> >> However the patch as presented has some problems
> >> 1. What happens if the primary key constraint or NOT NULL constraint gets
> >> dropped between a prepare and execute? The plan will no more be valid and
> >> thus execution may produce non-distinct results.
>
> > But that doesn't matter since a query can be prepared outside a
> > transaction and executed within one or more subsequent transactions.
> >
>
> Suppose after a DDL, the prepared statement need to be re-parsed/planned
> if it is not executed or it will prevent the DDL to happen.
>
> The following is my test.
>
> postgres=# create table t (a int primary key, b int not null, c int);
> CREATE TABLE
> postgres=# insert into t values(1, 1, 1), (2, 2, 2);
> INSERT 0 2
> postgres=# create unique index t_idx1 on t(b);
> CREATE INDEX
>
> postgres=# prepare st as select distinct b from t where c = $1;
> PREPARE
> postgres=# explain execute st(1);
> QUERY PLAN
> -------------------------------------------------
> Seq Scan on t (cost=0.00..1.02 rows=1 width=4)
> Filter: (c = 1)
> (2 rows)
> ...
> postgres=# explain execute st(1);
> QUERY PLAN
> -------------------------------------------------
> Seq Scan on t (cost=0.00..1.02 rows=1 width=4)
> Filter: (c = $1)
> (2 rows)
>
> -- session 2
> postgres=# alter table t alter column b drop not null;
> ALTER TABLE
>
> -- session 1:
> postgres=# explain execute st(1);
> QUERY PLAN
> -------------------------------------------------------------
> Unique (cost=1.03..1.04 rows=1 width=4)
> -> Sort (cost=1.03..1.04 rows=1 width=4)
> Sort Key: b
> -> Seq Scan on t (cost=0.00..1.02 rows=1 width=4)
> Filter: (c = $1)
> (5 rows)
>
> -- session 2
> postgres=# insert into t values (3, null, 3), (4, null, 3);
> INSERT 0 2
>
> -- session 1
> postgres=# execute st(3);
> b
> ---
>
> (1 row)
>
> and if we prepare sql outside a transaction, and execute it in the
> transaction, the other session can't drop the constraint until the
> transaction is ended.
And what if you create a view on top of a query containing a distinct clause
rather than using prepared statements? FWIW your patch doesn't handle such
case at all, without even needing to drop constraints:
CREATE TABLE t (a int primary key, b int not null, c int);
INSERT INTO t VALUEs(1, 1, 1), (2, 2, 2);
CREATE UNIQUE INDEX t_idx1 on t(b);
CREATE VIEW v1 AS SELECT DISTINCT b FROM t;
EXPLAIN SELECT * FROM v1;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
Thanks for pointing it out. This is unexpected based on my current knowledge, I
will check that.
I also think this is not the right way to handle this optimization.
I started to check query_is_distinct_for when Tom point it out, but still doesn't
understand the context fully. I will take your finding with this as well.
pgsql-hackers by date: