Re: [PATCH] Remove useless distinct clauses - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: [PATCH] Remove useless distinct clauses
Date
Msg-id 20200920004649.GB22746@momjian.us
Whole thread Raw
In response to Re: [PATCH] Remove useless distinct clauses  (David Rowley <dgrowleyml@gmail.com>)
List pgsql-hackers
On Tue, Sep 15, 2020 at 10:57:04PM +1200, David Rowley wrote:
> On Fri, 31 Jul 2020 at 20:41, Pierre Ducroquet <p.psql@pinaraf.info> wrote:
> >
> > In a recent audit, I noticed that application developers have a tendency to
> > abuse the distinct clause. For instance they use an ORM and add a distinct at
> > the top level just because they don't know the cost it has, or they don't know
> > that using EXISTS is a better way to express their queries than doing JOINs
> > (or worse, they can't do better).
> >
> > They thus have this kind of queries (considering tbl1 has a PK of course):
> > SELECT DISTINCT * FROM tbl1;
> > SELECT DISTINCT * FROM tbl1 ORDER BY a;
> > SELECT DISTINCT tbl1.* FROM tbl1
> >         JOIN tbl2 ON tbl2.a = tbl1.id;
> 
> This is a common anti-pattern that I used to see a couple of jobs ago.
> What seemed to happen was that someone would modify some query or a
> view to join in an additional table to fetch some information that was
> now required.  At some later time, there'd be a bug report to say that
> the query is returning certain records more than once.  The
> developer's solution was to add DISTINCT, instead of figuring out that
> the join that was previously added missed some column from the join
> clause.

I can 100% imagine that happening.  :-(

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee




pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: Handing off SLRU fsyncs to the checkpointer
Next
From: Julien Rouhaud
Date:
Subject: Re: Collation versioning