Re: [PATCH] Erase the distinctClause if the result is unique bydefinition - Mailing list pgsql-hackers
From | Julien Rouhaud |
---|---|
Subject | Re: [PATCH] Erase the distinctClause if the result is unique bydefinition |
Date | |
Msg-id | 20200211075751.GA2139@nol Whole thread Raw |
In response to | Re: [PATCH] Erase the distinctClause if the result is unique by definition (Andy Fan <zhihui.fan1213@gmail.com>) |
Responses |
Re: [PATCH] Erase the distinctClause if the result is unique by definition
Re: [PATCH] Erase the distinctClause if the result is unique by definition |
List | pgsql-hackers |
On Tue, Feb 11, 2020 at 10:57:26AM +0800, Andy Fan wrote: > On Tue, Feb 11, 2020 at 12:22 AM Ashutosh Bapat < > ashutosh.bapat.oss@gmail.com> wrote: > > > I forgot to mention this in the last round of comments. Your patch was > > actually removing distictClause from the Query structure. Please avoid > > doing that. If you remove it, you are also removing the evidence that this > > Query had a DISTINCT clause in it. > > > > Yes, I removed it because it is the easiest way to do it. what is the > purpose of keeping the evidence? > > >> However the patch as presented has some problems > >> 1. What happens if the primary key constraint or NOT NULL constraint gets > >> dropped between a prepare and execute? The plan will no more be valid and > >> thus execution may produce non-distinct results. > > > But that doesn't matter since a query can be prepared outside a > > transaction and executed within one or more subsequent transactions. > > > > Suppose after a DDL, the prepared statement need to be re-parsed/planned > if it is not executed or it will prevent the DDL to happen. > > The following is my test. > > postgres=# create table t (a int primary key, b int not null, c int); > CREATE TABLE > postgres=# insert into t values(1, 1, 1), (2, 2, 2); > INSERT 0 2 > postgres=# create unique index t_idx1 on t(b); > CREATE INDEX > > postgres=# prepare st as select distinct b from t where c = $1; > PREPARE > postgres=# explain execute st(1); > QUERY PLAN > ------------------------------------------------- > Seq Scan on t (cost=0.00..1.02 rows=1 width=4) > Filter: (c = 1) > (2 rows) > ... > postgres=# explain execute st(1); > QUERY PLAN > ------------------------------------------------- > Seq Scan on t (cost=0.00..1.02 rows=1 width=4) > Filter: (c = $1) > (2 rows) > > -- session 2 > postgres=# alter table t alter column b drop not null; > ALTER TABLE > > -- session 1: > postgres=# explain execute st(1); > QUERY PLAN > ------------------------------------------------------------- > Unique (cost=1.03..1.04 rows=1 width=4) > -> Sort (cost=1.03..1.04 rows=1 width=4) > Sort Key: b > -> Seq Scan on t (cost=0.00..1.02 rows=1 width=4) > Filter: (c = $1) > (5 rows) > > -- session 2 > postgres=# insert into t values (3, null, 3), (4, null, 3); > INSERT 0 2 > > -- session 1 > postgres=# execute st(3); > b > --- > > (1 row) > > and if we prepare sql outside a transaction, and execute it in the > transaction, the other session can't drop the constraint until the > transaction is ended. And what if you create a view on top of a query containing a distinct clause rather than using prepared statements? FWIW your patch doesn't handle such case at all, without even needing to drop constraints: CREATE TABLE t (a int primary key, b int not null, c int); INSERT INTO t VALUEs(1, 1, 1), (2, 2, 2); CREATE UNIQUE INDEX t_idx1 on t(b); CREATE VIEW v1 AS SELECT DISTINCT b FROM t; EXPLAIN SELECT * FROM v1; server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. I also think this is not the right way to handle this optimization.
pgsql-hackers by date: